Bank Marketing Classifier Comparison – A Machine Learning Project

KUSHAL CHHABRAKUSHAL CHHABRA
3 min read

Introduction

This project explores the performance of various classifiers—K Nearest Neighbor, Logistic Regression, Decision Trees, and Support Vector Machines (SVM)—to predict whether a client will subscribe to a term deposit using the Bank Marketing Dataset.

You can check out work out at: https://github.com/kushchhabra0/BankMarketingClassifer


Dataset

We use the Bank Marketing Dataset from the UCI Machine Learning Repository.

This dataset contains data related to direct marketing campaigns of a Portuguese banking institution. The goal is to predict whether a client will subscribe to a term deposit.


Usage

  1. Load the dataset:

  2. Open and run Bank_Marketing_Classifier_Comparison.ipynb.

  3. Follow step-by-step:

    • Preprocessing

    • Model building

    • Evaluation

    • Visualization


Data Understanding and Preprocessing

Feature Engineering:

  • Handle missing values

  • Encode categorical variables

  • Scale numerical features

Train/Test Split:

Split dataset into training and testing subsets.

Baseline Model:

Use the most frequent class to set a baseline accuracy.

Logistic Regression:

Implement logistic regression and evaluate its performance.

Model Comparisons:

Compare Logistic Regression, KNN, Decision Tree, and SVM using:

  • Accuracy

  • Training time


Visualizations

To understand relationships in the dataset, we used:

  • Age distribution plot

  • Box plot of Age by Subscription Status

  • Categorical plots:

    • Job distribution

    • Marital status

    • Education levels

    • Subscription rates by category


Model Comparison

We compare the performance of:

  • K Nearest Neighbors (KNN)

  • Logistic Regression

  • Decision Tree

  • Support Vector Machine (SVM)

    ⚠️ Note: It takes approximately 5–7 minutes to run the model comparison part.

ModelTrain Time (s)Train AccuracyTest Accuracy
KNN0.07160.91430.8881
Logistic Regression0.19500.90130.8971
Decision Tree0.23550.99540.8361
SVM34.34150.90480.8969

Outcomes

Baseline Accuracy: 0.8876

Logistic Regression

  • Train Accuracy: 0.9013

  • Test Accuracy: 0.8971

Best Results

  • Best model overall: Logistic Regression

  • Best tuned Decision Tree achieved:

    • Max Depth: 5

    • Min Samples Split: 2

    • CV Score: 0.9012

    • Test Accuracy: 0.8967

Improving the Model

We used hyperparameter tuning to improve model performance:

  • Best parameters found:
    {'classifier__max_depth': 5, 'classifier__min_samples_split': 2}

  • Best cross-validation score:
    0.9012

  • Best Decision Tree Test Accuracy after tuning:
    0.8967

Conclusion

In this project, we evaluated several machine learning models to solve a binary classification problem using the Bank Marketing Dataset. After preprocessing, visual exploration, and modeling:

  • Logistic Regression delivered the best balance between accuracy and training time.

  • SVM performed similarly in terms of test accuracy but took significantly longer to train.

  • Decision Trees, while achieving high training accuracy, suffered from overfitting.

Through hyperparameter tuning, the Decision Tree model's test accuracy significantly improved and approached Logistic Regression performance.

→ This shows the importance of model tuning and choosing the right algorithm based on dataset characteristics and computational resources.

Project Members

Kavish Jain (23UCS614)

Kshitij Sharma (23UCS628)

Kushal Chhabra(23UCS630)



0
Subscribe to my newsletter

Read articles from KUSHAL CHHABRA directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

KUSHAL CHHABRA
KUSHAL CHHABRA