Dataset Characteristics
The first characteristic in the data set is the dimension. Dimensions can be defined as the number of features in each row of data in the dataset. Data with a small number of dimensions (low) certainly qualitatively different from the data in the same context, but with the number of dimensions more (high). Although high-dimensional data provide typically better quality in data mining processes, the cost of computing is also expensive. And not infrequently, there are some of the features that do not have a big impact on data mining work so it requires an initial process, ie dimensionality reduction.The second characteristic is sparsity. For data sets with asymmetric features (the number of features filled with values is not the same between one data with other data), many attribute data has a zero value in it, in most cases less than 1% of the data is nonzero. In practice, this is certainly advantageous because computing becomes lighter and data storage capacity is also less. A data set containing lots of sparsity is a shopping cart data.
For data depicted in a graphical form that require spatial cate- late, the resolution characteristics used will also have an effect. For example, for earth surface data that visualizes the movement of clouds and other weather systems every hour, with too high a resolution, the pattern may be invisible or the noise will be wider. However, with dimensions that are too narrow, patterns can also become invisible.
Advertisement
No comments