Decision Trees Algorithm

Decision Tree is a Supervised machine learning algorithm used for performing Classification and Regression tasks.

How Decision Trees Work

Suppose we have an classification dataset of some Yes and No Values.

This dataset contains 14 observation of weather report which contains output as to play tennis or not.

Now we make a tree of the given observation.

From the above diagram we will calculate the Entropy and Gini Coefficient of the dataset.

Entropy

It is used to find out the entropy of the dataset, with entropy we can find out whether to split the node further or not.

Entropy = 0 (pure split) , Entropy > = 1 (impure split)

Here H(s) = Entropy of a node. -p+ = Probability of Yes . -p- = Probability of No

Calculate entropy for Outlook node

Probability of Yes = Yes/total no of observations = 9/14.

Probability of No = No/Total no. of observations = 5/14.

H(s) = -9/14 log(9/14) - 5/14 log(5/14) = 0.94

In here entropy is 0.94 hence node is impure. When the node is impure we can perform more splits

Entropy for Overcast Node

H(s) = -4/4 log(4/4) - 0/4 log(0/4) = 0

In here the entropy is zero hence this node is called pure node. When the node is pure we cannot perform more splits.

Gini Impurity and Information Gain

Gini Impurity and Information Gain are used to find out which node to decide to make a split.

Gini Impurity Formula

Information Gain Formula

Information Gain (IG) = Entropy (Parent) - Weighted Average Entropy (Children)

Let's calculate information Gain from the tree given below.

H(s) for C1 = -9/14 log(9/14) - 5/14 log(5/14) = 0.94

H(s) for c1 = -6/8 log(6/8) - 2/8 log(2/8) = 0.81

H(s) for c2 = -3/6 log(3/6) - 3/6 log (3/6) = 1

Gain = 0.94 - [8/14 *0.81 + 6/14 * 1]

Gain = 0.049

Above IG ang GI, GI is more faster to calculate the results.

Note

The explanation of this blog is inspired from the Youtube video given in the link below.

You can find out about the regression task decision tree in the video.

0
Subscribe to my newsletter

Read articles from Meemansha Priyadarshini directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Meemansha Priyadarshini
Meemansha Priyadarshini

I am a certified TensorFlow Developer and enjoy writing blogs to share my knowledge and assist others.