-->
Methodology in Data Mining for Big Data Optimization

Methodology in Data Mining for Big Data Optimization

Data Mining is a series of processes to explore the added value of information that has not been known manually from a database. The resulting information is obtained by extracting and recognizing the important or interesting patterns of data contained in the database. Data mining is mainly used to search for the knowledge contained in large databases so often called Knowledge Discovery Databases (KDD). In general, data mining methods are grouped into 2 categories, namely: descriptive and predictive. The descriptive method aims to find patterns that can be understood by humans who explain the characteristics of the data. Predictive methods use certain characteristics of data to make predictions. Some of the methods used in data mining are described below.
Methodology in Data Mining for Big Data Optimization

Predictive modeling
The purpose of this moteode is to build a model to predict a value that has a certain cirri. This model can be further grouped into 2 sub categories, namely: classification and regression. Classification is used to predict the value of discrete variables (such as predicting online users who will buy on a web site). While regression is used to predict the value of continuous variables (such as forecasting stock prices in the future).

Association analysis
The purpose of this method is to generate a number of rules that explain a number of data that are connected strongly with each other. As an associaton analysis, it can be used to determine which products are often purchased simultaneously by many customers, often referred to as market basket analysis.

Clustering
The purpose of this model is to group the data homogeneous / similar so that the data residing in the same cluster has many similarities than the fata in different clusters. Examples of clustering such as grouping documents based on the topic.

Anomaly detection
The purpose of this method is to find anomaly or outlier, which is very different data with other data. An example is finding an attack on a computer network.
Advertisement

Related Content:

Show Comment
Blogger
Disqus
Pilih Sistem Komentar

No comments