Bagging Classifier: Boost Your Model's Accuracy

What is a Bagging Classifier?

Ensemble learning is a supervised machine learning technique that combine multiple models to make a reliable and accurate model. It’s main focus is to combine the strengths of multiple models and create a model that is robust and prevents overfitting.

Bagging Classifier

Bagging Classifier is an ensemble technique in which multiple base models are trained parallelly with the random samples of the dataset. Bootstrap sampling is use to generate the subsets in which data points are randomly picked up with replacement. In case of Classification Problems Bagging depends on the majority votes like if the result of 2 of 3 models are same then it’ll be taken as a final prediction. In case of Regression problems average is taken for all the prediction of the base models also known as bagging regression.

Bagging Classifier fig 1.1

The training sample (original dataset) containing multiple datapoints. The original dataset is randomly sampled with replacement multiple times. This means in each bootstrap sample a data can be selected multiple times or not at all. These bootstrap samples are the multiple subsets of the original data.

Every bootstrap sample uses a different classifier model as you can see in the diagram classifier 1 is using RandomForest, classifier 2 is using Decision Trees and Classifier n is using extra trees and so on. They all will make different predictions according to the samples they are using.
The final prediction will be the predictions from all individual classifier are combined to form a final prediction. It is based on the majority votes in case of classifier and average in case of Regression.

Use Case:

Problem

Credit card fraud detection is a challenging task because:

Fraud cases are rare compared to normal transactions (highly imbalanced dataset).
Fraudulent activity patterns change frequently.
A wrong prediction (false negative) can result in huge financial loss.

How Bagging Classifier Helps

Bagging (Bootstrap Aggregating) builds multiple decision trees on different random subsets of the data and averages their predictions.
This reduces variance and prevents overfitting — which is especially helpful for noisy, imbalanced, and high-dimensional data.

Comparison between a single DecisionTree and a BaggingClassifier

model1 = DecisionTreeClassifier(random_state = 42)
model1.fit(X_train, y_train)
y_pred = model1.predict(X_test)
print(classification_report(y_test, y_pred))

model2 = BaggingClassifier(
                          estimator = DecisionTreeClassifier(),
                          random_state = 42,
                          n_estimators = 50
                          )
model2.fit(X_train, y_train)
y_pred = model2.predict(X_test)
print(classification_report(y_test, y_pred))

Observation

Recall and Precision improved significantly with using the bagging. This is important as catching as many fraudulent cases as possible is more important than reducing the false positives in cases like fraud detection.
I avoids overfitting and shows balanced precision and recall.
Accuracy alone can be misleading in imbalanced datasets like these. Hence, improvements in recall and f1-score are more meaningful.

Conclusion

A single decision tree is easy to interpret but also can overfit the data. In case of the Bagging Classifier, which combines predictions from multiple trees trained on different random subsets of the data, reduces variance, improves stability, and is better suited for detecting rare events like fraud.

Bagging Classifier

Table of contents

What is a Bagging Classifier?