Decision Tree is a Supervised machine learning algorithm used for performing Classification and Regression tasks.

Suppose we have an classification dataset of some Yes and No Values.

This dataset contains 14 observation of weather report which contains output as to play tennis or not.

Now we make a tree of the given observation.

From the above diagram we will calculate the Entropy and Gini Coefficient of the dataset.

It is used to find out the entropy of the dataset, with entropy we can find out whether to split the node further or not.

Entropy = 0 (pure split) , Entropy > = 1 (impure split)

Here H(s) = Entropy of a node. -p+ = Probability of Yes . -p- = Probability of No

Calculate entropy for Outlook node

Probability of Yes = Yes/total no of observations = 9/14.

Probability of No = No/Total no. of observations = 5/14.

H(s) = -9/14 log(9/14) - 5/14 log(5/14) = 0.94

In here entropy is 0.94 hence node is impure. When the node is impure we can perform more splits

Entropy for Overcast Node

H(s) = -4/4 log(4/4) - 0/4 log(0/4) = 0

In here the entropy is zero hence this node is called pure node. When the node is pure we cannot perform more splits.

Gini Impurity and Information Gain are used to find out which node to decide to make a split.

Information Gain (IG) = Entropy (Parent) - Weighted Average Entropy (Children)

Let's calculate information Gain from the tree given below.

H(s) for C1 = -9/14 log(9/14) - 5/14 log(5/14) = 0.94

H(s) for c1 = -6/8 log(6/8) - 2/8 log(2/8) = 0.81

H(s) for c2 = -3/6 log(3/6) - 3/6 log (3/6) = 1

Gain = 0.94 - [8/14 *0.81 + 6/14 * 1]

Gain = 0.049

Above IG ang GI, GI is more faster to calculate the results.

The explanation of this blog is inspired from the Youtube video given in the link below.

You can find out about the regression task decision tree in the video.

Decision Trees Algorithm