Decision Trees Algorithm
Decision Tree is a Supervised machine learning algorithm used for performing Classification and Regression tasks.
How Decision Trees Work
Suppose we have an classification dataset of some Yes and No Values.
This dataset contains 14 observation of weather report which contains output as to play tennis or not.
Now we make a tree of the given observation.
From the above diagram we will calculate the Entropy and Gini Coefficient of the dataset.
Entropy
It is used to find out the entropy of the dataset, with entropy we can find out whether to split the node further or not.
Entropy = 0 (pure split) , Entropy > = 1 (impure split)
Here H(s) = Entropy of a node. -p+ = Probability of Yes . -p- = Probability of No
Calculate entropy for Outlook node
Probability of Yes = Yes/total no of observations = 9/14.
Probability of No = No/Total no. of observations = 5/14.
H(s) = -9/14 log(9/14) - 5/14 log(5/14) = 0.94
In here entropy is 0.94 hence node is impure. When the node is impure we can perform more splits
Entropy for Overcast Node
H(s) = -4/4 log(4/4) - 0/4 log(0/4) = 0
In here the entropy is zero hence this node is called pure node. When the node is pure we cannot perform more splits.
Gini Impurity and Information Gain
Gini Impurity and Information Gain are used to find out which node to decide to make a split.
Gini Impurity Formula
Information Gain Formula
Information Gain (IG) = Entropy (Parent) - Weighted Average Entropy (Children)
Let's calculate information Gain from the tree given below.
H(s) for C1 = -9/14 log(9/14) - 5/14 log(5/14) = 0.94
H(s) for c1 = -6/8 log(6/8) - 2/8 log(2/8) = 0.81
H(s) for c2 = -3/6 log(3/6) - 3/6 log (3/6) = 1
Gain = 0.94 - [8/14 *0.81 + 6/14 * 1]
Gain = 0.049
Above IG ang GI, GI is more faster to calculate the results.
Note
The explanation of this blog is inspired from the Youtube video given in the link below.
You can find out about the regression task decision tree in the video.
Subscribe to my newsletter
Read articles from Meemansha Priyadarshini directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Meemansha Priyadarshini
Meemansha Priyadarshini
I am a certified TensorFlow Developer and enjoy writing blogs to share my knowledge and assist others.