A Decision Tree is a supervised machine learning algorithm used for classification and regression tasks. It works like a flowchart to make decisions based on input features by splitting data into subsets based on feature values.

Terminologies :

Root Node:
The topmost node that represents the entire dataset. It is split into subsets based on a feature.
Decision Nodes:
Internal nodes where decisions are made to split further.
Leaf Nodes (Terminal Nodes):
Nodes that represent the final output or prediction.
Branches:
The outcome of a decision, leading to another node.

Advantages :

Minimal data preparation is required.
Intuitive and easy to understand.
The cost of using the tree for inference is logarithmic in the number data points used to train the tree.

Disadvantages :

Overfitting of the model is a big problem in the training of decision tree of the model.
Prone to errors for imbalanced datasets.

Entropy :

Entropy is nothing but the measure of disorder. Or you can also call it the measure of purity/impurity. More reduces the entropy or randomness in the data.

Where Pi is simply the frequentist probability of an element/class i in our data.

More the uncertainty more is entropy
For a 2 class problem the min. entropy is 0 and max. entropy is 1. Entropy is 0 when all the observation belong to one class and Entropy is 1 when observation are exactly 50-50 between both observations.
For a more than 2 class problem the min. is 0 but max. can be more than 1.
Both log base 2 and log base e can be used to calculate entropy.

Entropy vs Probability :

Information Gain :

Information Gain measures how much "information" a feature gives us about the class. It’s based on the concept of entropy, which measures the impurity or uncertainty in a dataset. Information Gain (IG) is a key concept used in Decision Tree algorithms to decide which attribute to split the data on at each step while building the tree.

Where:

D = parent dataset
Di = subset after split on an attribute
∣Di∣/∣D∣ = weight of subset

Gini impurity :

Gini Impurity is another metric (besides entropy) used to decide the best attribute to split the data at each node in a Decision Tree, especially in algorithms like CART (Classification and Regression Trees). Gini Impurity measures the probability that a randomly chosen sample would be incorrectly classified if it was labeled according to the class distribution in the dataset.