🌫️Beyond Delhi: Unmasking India’s Hidden Pollution Capitals

Arpita GargArpita Garg
4 min read

Introduction

India’s urban air quality has become a pressing issue over the past decade. While Delhi, as the national capital, often dominates headlines as the country’s pollution capital but the broader environmental crisis is far from limited to just one city. Behind the lens lie more shocking revelations that only data-driven analysis can uncover—for instance, Patna occasionally surpasses Delhi in AQI levels, challenging the usual narrative.

Using Python and a public dataset on urban air pollution, I conducted a comprehensive analysis of pollutant levels across major Indian cities. This case study walks you through my process, visualisations, and key insights.

Let’s explore what the numbers reveal about how and where we breathe.

Why this Analysis?

Air pollution - a major component of environmental degradation - is a widespread issue. According to the WHO, poor air quality contributes to over 1.67 million deaths per year in India.

Most conversations are centred around Delhi, but many other cities suffer similarly—or worse. Through this project, I wanted to:

  • Compare pollutant levels city-wise and year-wise

  • Spot seasonal or long-term patterns

  • Highlight any overlooked pollution hotspots

Dataset Overview

Source : 🔗Air Quality Data in India (2015-20) | Kaggle

  • File: city_day.csv

  • Columns used: Date, City, PM2.5, PM10, NO, NO2

  • Cities Selected: Delhi, Mumbai, Kolkata, Chennai

Data ranges from 2015 to 2020.

This allowed me to compute yearly and monthly pollutant averages for meaningful analysis.

Project Objective

Goal: Analyse air quality data from major Indian cities to uncover trends, seasonal patterns, and unexpected insights.

Step 1: Data Cleaning and Preprocessing

Like any real-world dataset, this one needed cleaning. Here’s what I did:

  • Date Conversion: Ensured all date columns were in proper date time format.

  • Missing Values: Some pollutant readings were missing. I dropped rows with excessive NaNs to maintain data quality.

  • Grouping Data: For trend analysis, I grouped data to calculate monthly averages, yearly averages and city-wise pollutant means.

Step 2: Exploratory Data Analysis

📊City-Wise Averages

I compared the overall average PM2.5 levels by city across the dataset. This helps identify consistently high-pollution areas.

✅This chart highlights that:

  • The worst air quality isn't exclusive to Delhi.
    Higher average of cities like Patna and Lucknow suggests they may often be more polluted than the national capital but receive less attention in public discourse and policy action.

  • Geography plays a role.
    The cities with the highest PM2.5 are landlocked and densely populated, while cities near the coast benefit from better natural ventilation.

  • Policy blind spots may exist.
    A disproportionate focus on Delhi might result in underreported pollution crises in other urban areas.

Patna often records high PM2.5 levels due to its location in the Indo-Gangetic plain, where stagnant winter air traps pollutants. Combined with vehicular emissions, construction dust, and limited pollution controls, this can push Patna’s air quality above Delhi’s in certain years.

📈Seasonal Patterns

Air pollution in India exhibits dramatic seasonal shifts. Analysing monthly averages revealed how:

  • Pollution peaks between October and January due to:

    • Stubble burning in northern India

    • Temperature inversions trapping pollutants

    • Festive activities like firecrackers and increased traffic

  • Air quality improves during monsoon months (June to September) thanks to rainfall.

🦠Pre vs Post Covid Comparison in Delhi

To analyse the impact, I calculated the average PM2.5 levels city-wise for the periods:

  • Pre-COVID: 1st Oct 2019 - 24th Mar 2020

  • COVID/Post-COVID: 25th Mar 2020 - 30 Jun 2020

Calculated average PM2.5 levels for each period and Used a t-test to check whether the means are significantly different.

This shows:

  • Delhi saw a drop in average PM2.5 in 2020–21 compared to earlier years driven by:

    • Lower vehicle emissions

    • Industrial slowdowns

    • Fewer construction activities

🌤️Forecasting Future PM2.5 Levels Using ARIMA

To predict how PM2.5 might trend if no additional interventions are made, I implemented an ARIMA time-series model using Delhi’s PM2.5 data as an example.

  • Forecast drops from ~54 to ~50.5 µg/m³

  • Predicts stable pollution for the next month

  • Reflects possible lockdown effect or seasonal cleansing

  • PM2.5 still above safe limits, even after decline

🖇️Correlation Between Pollutants

I also explored how different pollutants relate to one another using a correlation matrix.

Insight:

  • PM2.5 and PM10 show strong correlation, as they often originate from similar sources like vehicles, construction, and industrial emissions.

  • NO₂ and PM2.5 have moderate correlation, pointing to significant vehicular pollution contributions.

Conclusion

India’s air pollution crisis goes far beyond Delhi proving that the problem is widespread and complex. While temporary improvements appeared during the COVID-19 lockdowns, pollution levels quickly rebounded, highlighting the need for long-term, data-driven solutions to protect public health across all urban areas.

Full notebook and code

🔗Air Quality Data Analysis on Github

0
Subscribe to my newsletter

Read articles from Arpita Garg directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arpita Garg
Arpita Garg