-->
Data Sets in Data Mining

Data Sets in Data Mining

There are several ways to re-register the data. For example, the attribute used to describe the object type, the data set can have different characteristics, for example, there is a dataset that uses the time series value or a numeric value, even an object with a special relationship in it. Thus, given the different ways in which data representation, the tools and techniques used to analyze are also different. For this reason, data mining attempts to accommodate the different ways in which different representations can be generalized and can be processed in a universal way in data mining.

In addition to the way different representations, the quality of the dataset itself is also often the thing to be considered in advance before the information-mining process. Problems that often arise in raw data are data duplication, inconsistency or redundancy of data, abnormalities or outliers, false data, and so on. For that matter, before the data set is processed in the main process of data mining, initial processing of data becomes important to be done so that the quality of data becomes better, better data quality will provide higher quality data mining output value as well.
Data Sets in Data Mining
Data Sets in Data Mining


The data type in the dataset

The dataset can be viewed as a collection of data objects. Other commonly used names are record, point, vector, pattern, event, observation, case, or even data. While data objects are depicted with a number of attributes that capture the basic character of data objects, for example the height that gives the quantitative value of a person's height, the time that captures when an event occurs. Attributes are sometimes called variables, characteristics, fields, features, or dimensions.

Data attributes are properties or properties or characteristics of data objects whose value can vary from one object to another, from one time to another. For example, the color of a person's skin can be different from other people's skin color, one's weight can also change from time to time. The color of the skin can have symbolic values such as black, white, yellow, langsat, brown, while the weight can be numeric value numerals, for example 35,50,70,85, and so on.

There are four important properties of attributes in general, namely:
  • Distinctness
  • Order
  • Addition
  • Multiplication

Three attributes that have the four properties above

  • Category (qualitative)
    • Nominal is the value of the nominal type attribute giving the value of the name. It is with this name that an attribute distinguishes itself from one data to another. For example Postcode, ID number, Student parent number, gender.
    • Ordinal is the value of ordinal type attribute has the value of a name that has the meaning of information sorted. For example Graduation rate (cum laude, very satisfying, satisfying).
  • Numerical (qualitative)
    • Interval is an attribute value where the difference between two values has meaning. For example the date, temperature.
    • Ratio is an attribute value where the difference between two values and the ratio of two values has meaning. For example temperature, age, length, height.
The nominal and ordinal attributes are category types, qualitative values, such as the zip code number, ID card number. The value is actually a symbolic value, it is not possible to perform artimatika operations as in the numeric type. While the interval and ratio attributes are both numerical types, quantitative values, arithmetic operations, can be represented by integer or continuous values.

While based on the number of values, attributes can be divided into two, namely discrete and continuous. An attribute can be discrete if it has a value in a finite set of numbers. This type can be encountered in categorical attributes that have only a few variations of value, such as the temperature in the previous example has only three possible values of cold, normal, heat. While attributes of continuous value will have a real value range. Like the variable length, height, the value usually uses a floating point or real representation. However, despite using real representation. the precision size of the number of numbers behind the comma remains in use.
Advertisement

Related Content:

Show Comment
Blogger
Disqus
Pilih Sistem Komentar

No comments