Algorithm C4.5 is an algorithm used to form decision trees that are formed based on the criteria - the criteria of decision makers. The decision tree is a very powerful and well-known method of classification and prediction. The method of satisfaction trees turns a very big fact into a decision tree that presents the rules. Rules can be easily understood with natural language. And they can also be expressed in the form of a database like Structured Query Language to search for records in certain categories.Decision trees are also useful for exploring data, finding a hidden relationship between a number of potential input variables with a target variable. Because decision trees combine data exploration and modeling, it is great as a first step in the modeling process even when used as an end model of some other technique.
A decision tree is a structure that can be used to divide large datasets into smaller record sets by applying a set of decision rules. With each set of divisions, members of the result set become similar to each other. Inside the decision tree there are several elements :
A decision tree is a structure that can be used to divide large datasets into smaller record sets by applying a set of decision rules. With each set of divisions, members of the result set become similar to each other. Inside the decision tree there are several elements :
- Root
- Node
- Relationship
In the case of solving a case using C4.5 algorithm there are two elements that must be understood:
- Entropy
- Gain
In general, the C4.5 algorithm for building decision trees is as follows.
- Select attribute as root
- Create a branch for each value
- For the case in the branch
- Repeat the process for each branch until all the cases on the branch have the same class.
To select an attribute as a root, based on the highest gain value of the existing attributes. To calculate the gain is used the formula as in the following equation.
Information :
S: The set of cases
A: Attribute
n: Number of Partitions Attribute A
|Si| : Number of cases on i partition
|S| : Number of cases in S
Meanwhile, the calculation of entropy value can be seen in the following equation.
Information :
S: The set of cases
A: Features
n: Number of partitions S
pi: The proportion of Si to S
Advertisement
No comments