Data Mining is one of the fastest growing fields due to the huge need for added value from large-scale databases that are accumulating more and more in line with the growth of information technology. The general definition of DM itself is a series of processes to explore the added value of knowledge that has so far not been known manually from a data set. Data mining is a logical combination of data knowledge and statistical analysis developed in business knowledge or a process that uses statistical, mathematical, artificial, artificial and machine-learning techniques to extract and identify useful information for related knowledge from multiple databases big.
Data mining includes tasks known as knowledge extraction, data archeology, exploration in data pattern processing and information harvesting. All of these activities are done automatically and allow for quick discovery even by non-programmers. Intelligent data mining finds information in a data warehouse where reports and queries cannot be effectively expressed. Data mining tools find patterns in the data and even conclude the rules of the data The general definition of Data Mining is the process of searching for patterns of interest (hidden pattern) in the form of knowledge (knowledge) that is not known before from a collection of data where the data can be in a database, data warehouse, or other information storage medium. Data Mining is a process of analysis of data with the emphasis of finding hidden information on a large amount of data that is stored when running a company's business.
Relational database
Today, almost all business data is stored in a relational database. A relational database model is constructed from a series of tables, each table stored as a file. A relational table consists of rows and columns. Most current relational database models are built on an OLTP environment. OLTP (Online Transaction Processing) is an access type used by businesses that require large amounts of concurrent transactions. The form of data stored in this relational database can be processed by data mining system.
Data extraction
The data collected in the transaction process is often placed in different locations. Therefore it takes the ability of the system to collect data quickly. If the data is stored in a regional office, often the data is uploaded to a more centralized server. This can be done daily, weekly or monthly depending on the amount of data, security and costs. The data can be summarized before being sent to central storage.
Data transformation
The data transformation performs data summary by assuming that the data has been stored in a single storage. In the last step, the data has been extracted from many databases into a single database. The type of summarization performed in this step is similar to a summary performed during the extraction phase. Some companies choose to summarize data in a single storage area. The functions of Aggregate functions that are often used include: summarizations, averages, minimum, maximum, and count.
Data cleaning
The data that has been collected will then undergo a cleaning process.
The data cleaning process is performed to remove erroneous records, standardize attributes, rationalize data structures, and control lost data. Inconsistent data and many errors make the data mining results inaccurate. It is very important to make the data consistent and uniform. Data cleaning can also help the company to consolidate records. this is very useful when a company has many records for a customer. Each record or customer file has the same customer ID, but the information in each file is different.
Standard form
Furthermore after the data undergoes a cleaning process than the data transfer into the standard form. The standard form is the data form that will be accessed by the data mining algorithm. This standard form is usually in spreadsheet-like form. The spreadsheet form works well because the line represents the case and the column represents the feature.
Reduction and Feature
Once the data is in the form of a spreadsheet standard it is necessary to consider reducing the number of features. There are several reasons to reduce the number of features in our spreadsheet. A bank may have hundreds of features when it comes to predicting credit risk. This means the company has huge amounts of data. Working with this much data makes predictive algorithms decline in performance.
Running Algorithm
After all the above process is done, then the data mining algorithm is ready to run.
Data mining includes tasks known as knowledge extraction, data archeology, exploration in data pattern processing and information harvesting. All of these activities are done automatically and allow for quick discovery even by non-programmers. Intelligent data mining finds information in a data warehouse where reports and queries cannot be effectively expressed. Data mining tools find patterns in the data and even conclude the rules of the data The general definition of Data Mining is the process of searching for patterns of interest (hidden pattern) in the form of knowledge (knowledge) that is not known before from a collection of data where the data can be in a database, data warehouse, or other information storage medium. Data Mining is a process of analysis of data with the emphasis of finding hidden information on a large amount of data that is stored when running a company's business.
Relational database
Today, almost all business data is stored in a relational database. A relational database model is constructed from a series of tables, each table stored as a file. A relational table consists of rows and columns. Most current relational database models are built on an OLTP environment. OLTP (Online Transaction Processing) is an access type used by businesses that require large amounts of concurrent transactions. The form of data stored in this relational database can be processed by data mining system.
Data extraction
The data collected in the transaction process is often placed in different locations. Therefore it takes the ability of the system to collect data quickly. If the data is stored in a regional office, often the data is uploaded to a more centralized server. This can be done daily, weekly or monthly depending on the amount of data, security and costs. The data can be summarized before being sent to central storage.
Data transformation
The data transformation performs data summary by assuming that the data has been stored in a single storage. In the last step, the data has been extracted from many databases into a single database. The type of summarization performed in this step is similar to a summary performed during the extraction phase. Some companies choose to summarize data in a single storage area. The functions of Aggregate functions that are often used include: summarizations, averages, minimum, maximum, and count.
Data cleaning
The data that has been collected will then undergo a cleaning process.
The data cleaning process is performed to remove erroneous records, standardize attributes, rationalize data structures, and control lost data. Inconsistent data and many errors make the data mining results inaccurate. It is very important to make the data consistent and uniform. Data cleaning can also help the company to consolidate records. this is very useful when a company has many records for a customer. Each record or customer file has the same customer ID, but the information in each file is different.
Standard form
Furthermore after the data undergoes a cleaning process than the data transfer into the standard form. The standard form is the data form that will be accessed by the data mining algorithm. This standard form is usually in spreadsheet-like form. The spreadsheet form works well because the line represents the case and the column represents the feature.
Reduction and Feature
Once the data is in the form of a spreadsheet standard it is necessary to consider reducing the number of features. There are several reasons to reduce the number of features in our spreadsheet. A bank may have hundreds of features when it comes to predicting credit risk. This means the company has huge amounts of data. Working with this much data makes predictive algorithms decline in performance.
Running Algorithm
After all the above process is done, then the data mining algorithm is ready to run.
Advertisement
No comments