Understanding Boosting in Machine Learning with AdaBoost Example

Boosting is one of the most widely used classes of algorithms in machine learning, applied globally to tackle a variety of complex problems. It involves combining multiple weak learners—typically simple models that perform just slightly better than random guessing—to collectively form a strong predictive model. Each learner is trained sequentially, with a focus on correcting the mistakes made by its predecessors. Misclassified or hard-to-learn data points are given more importance in subsequent rounds. In this blog, we’ll explore one of the simplest and most well-known boosting algorithms, AdaBoost (Adaptive Boosting). We'll also implement it from scratch and demonstrate how it can significantly improve model accuracy.

Before diving into AdaBoost, let’s first understand the core principle of boosting through the following illustration.

In the illustration above, Boxes 1, 2, and 3 represent classifications made by individual models—D1, D2, and D3—each of which is a weak learner and performs poorly when used alone. However, when these models are combined, as shown in Box 4, they work together to make highly accurate predictions.

This is the core idea behind AdaBoost as well. In AdaBoost, each model’s prediction is weighted, and these weights are updated during training based on whether the model classifies each instance correctly or not. Incorrectly classified instances are given higher weights, making them more influential in the next iteration.

We’ll walk through each step of the AdaBoost algorithm and its implementation. For practical understanding, we’ll use the Breast Cancer dataset from the sklearn library. Let's start by loading and visualizing the dataset.

df = pd.DataFrame(load_breast_cancer().data, columns=load_breast_cancer().feature_names)
df['label'] = load_breast_cancer().target

copy = df.copy()

df.head()

Since the dataset contains a large number of features, we’ll primarily focus on the label column along with the new columns that will be appended next to it during the boosting process.

To begin, let’s use a Decision Tree as our initial weak learner to make a first-round prediction and observe the accuracy achieved.

dt = DecisionTreeClassifier(random_state=50, min_samples_split=100)
dt.fit(df.iloc[:,:-1],df.iloc[:,-1])
df['prediction'] = dt.predict(df.iloc[:,:-1])
df.iloc[:5,-2:]

Accuracy = 0.945518453427065
We are getting an accuracy of around 95%.

Let’s now apply AdaBoost and see whether it improves the model's accuracy.

Step 1

Assign an initial weight of 1\m to each data point, where mmm is the total number of data points in the dataset.

df['weight'] = 1/len(df)
df.iloc[:5,-3:]

Step 2

Calculate the number of incorrectly classified data points.


no_of_errors = len(df[df.label != df.prediction1])
no_of_errors

There are 31 incorrectly classified data points.

Next, let's calculate the total error, where the total error is given by:

total error = no_of_errors/total number of data points

total_errors = no_of_errors/len(df)
total_errors

The total error is calculated to be 0.0545.

Step 3

Calculating the amount of say (α)

alpha = 0.5 * np.log((1-total_errors)/total_errors)
alpha

We get 1.423 as the value for the amount of say.

Step 4

The weight update is performed using the following rules:

For a correct prediction:

updated weight = old weight X e^–α
For an incorrect prediction:

updated weight = old weight X e^α

df['weight_updated'] = df.loc[df.label != df.prediction].weight * np.exp(alpha)
df.weight_updated = df['weight_updated'].fillna(df[df.label == df.prediction].weight * np.exp(-alpha))
df.iloc[:5,-4:]

The updated weights are then normalized, ensuring that the sum of all the weights in the column equals 1.

df.weight_updated = df.weight_updated/df.weight_updated.sum()
df.iloc[:5,-4:]

Now, we can observe that in the updated weights column, the values for the incorrectly classified data points are significantly higher compared to those for the correctly classified ones.

Step 5

Next, ranges are created for each updated weight, which represent the cumulative sum of the values in the column. For example, the range for index 0 is '0 to 0.016129', the range for index 1 is '0.016123 to (0.016129 + 0.000929)', the range for index 2 is '(0.016129 + 0.000929) to ((0.016129 + 0.000929) + 0.000929)', and so on.

In this manner, the range values for the last index will sum up to 1, as the weights have been normalized, ensuring their total is equal to 1.

Step 6

Resampling of data is performed by selecting a random number between 0 and 1. The range in which this number falls determines which index from the 'df' dataframe is included in the new resampled dataframe. Since the weights of the incorrectly predicted data points are higher, the corresponding ranges will also be larger. As a result, many of the randomly chosen numbers will fall within the ranges of the incorrectly predicted data points. Consequently, these data points will be repeated more frequently in the resampled dataframe, giving them more priority in subsequent iterations.

resampled = pd.DataFrame(columns=df.columns[:31])
for i in range(len(df)):
    index = df[df.ranges == df[np.random.rand()<df.ranges].ranges.min()].index
    resampled.loc[i] = list(df.iloc[index,:31].values[0])

resampled.head()

The resampled data is then processed in the same way as in step 1. These 6 steps are repeated iteratively until the total error becomes zero, or the number of iterations reaches infinity. To automate this process, let's build a function that accumulates all the above steps to perform the iterations.

def adaboost(df):
    dt = DecisionTreeClassifier(random_state=50, min_samples_split=100)
    dt.fit(df.iloc[:,:30],df.iloc[:,30])
    df['prediction'] = dt.predict(df.iloc[:,:30])

    df['weight'] = 1/len(df)

    no_of_errors = len(df[df.label != df.prediction])

    total_errors = no_of_errors/len(df)

    alpha = 0.5 * np.log((1-total_errors)/total_errors)

    df['weight_updated'] = df.loc[df.label != df.prediction].weight * np.exp(alpha)
    df.weight_updated = df['weight_updated'].fillna(df[df.label == df.prediction].weight * np.exp(-alpha))

    df.weight_updated = df.weight_updated/df.weight_updated.sum()

    p = 0
    for i in range(len(df)):
        df.loc[i,'ranges'] = df.loc[i,'weight_updated'] + p
        p = df.loc[i,'ranges']

    resampled = pd.DataFrame(columns=df.columns[:31])
    for i in range(len(df)):
        index = df[df.ranges == df[np.random.rand()<df.ranges].ranges.min()].index
        resampled.loc[i] = list(df.iloc[index,:31].values[0])  

    df = resampled

    return [df, dt]

The above function returns the resampled DataFrame and the trained model from each iteration. Upon execution, it stores the final resampled DataFrame along with the list of trained models across all iterations.

df = copy.copy()

models = []    

try:
    for iter in range(20):        
        ada = adaboost(df)
        df = ada[0]    
        models.append(ada[1])
        print('Decision stamp {0}'.format(iter+1))

except Exception:
    pass

We have obtained 10 decision stumps (weak learners), which will now be used collectively to make future predictions. These models will be applied to the same dataset on which we initially observed an accuracy of 95%. Let's aggregate the outputs from all these models to evaluate the performance of the boosted ensemble.

pred = np.zeros(len(df))
for i in range(len(models)):    
    pred += models[i].predict(copy.iloc[:,:-1])

pred

These values represent the aggregated predictions from all the models. Since each model outputs either a 0 or 1, a value like 2 in the array indicates that 2 models classified the instance as class 1, while a value of 6 means 6 out of the 10 models predicted class 1 for that particular instance.

Based on the number of models used, a threshold is set at half that number. If the aggregate output for any data point exceeds this threshold, it is classified as 1; otherwise, it is classified as 0. In our case, since we used 10 models, any output value greater than 5 is considered class 1, and the rest are classified as class 0.

threshold = len(models)/2
vec = np.vectorize(lambda x: 1 if x>threshold else 0)
final_prediction = vec(pred)
final_prediction

Now, using the output above, we calculate the accuracy.

copy['final_prediction'] = final_prediction

print('Accuracy =',accuracy_score(copy.label, copy.final_prediction))

Accuracy = 0.9753954305799648
This time we obtained an accuracy of 98%.

Thus, we can see that we have successfully improved the accuracy using Boosting.

Point to note: This entire process is purely for demonstration and conceptual understanding. It is not recommended to use this manual implementation for solving real-world problems. For practical purposes, you should use the automated and optimized AdaBoostClassifier provided by the sklearn library.

I hope I was able to explain the algorithm clearly enough for you to understand and experiment with.
Your valuable feedback is always appreciated!

💡

For more insights, projects, and articles, visit my portfolio at www.tuhindutta.com.

Boosting Machine Learning Adaboost Guide