-->
Data Collection in Data Mining

Data Collection in Data Mining

Sampling is a common approach used for selecting parts of an object or data as a whole to be analyzed. In statistics, sampling is also useful. However, the reasons for the use of sampling in statistics with data mining are usually different. Statisticians use sampling because the use of all parts of the data will be too heavy and takes a long time, whereas data mining experts argue that the use of all data makes the process that data mining algorithms have to do long.
Data Collection in Data Mining

The primary key in sampling is that the data sample will work almost the same as all data if the sample is capable of representing all data. Usually measured by mean on the sample and the original data. If the same or very close, the sample can be said good. However, the use of a good sample also does not guarantee that the data mining processing results in the sample are as good as the processing of all the original data.

The easiest approach in sampling is simple random sampling. There are two types of mixing of this type of sampling without returns and sampling with returns. In the first sampling technique, any data that has been retrieved for use as a sample is not returned to the original data, while in the second technique any data that has been taken for use as a sample is returned to the original data. As a result, a data has the possibility to appear more than once in the sample.
Advertisement

Related Content:

Show Comment
Blogger
Disqus
Pilih Sistem Komentar

No comments