Imagine if every traffic camera in your city could double as an air quality sensor. That's precisely what we've achieved using computer vision and machine learning—reaching an impressive 97.2% accuracy in predicting PM10 pollution levels simply by analyzing traffic patterns.

Air pollution is the second-leading risk factor for mortality globally, causing over 8 million deaths annually [1]. The World Health Organization reports that 99% of the global population breathes air exceeding recommended pollution limits [2], making effective monitoring crucial for public health.

However, traditional air quality monitoring is both costly and limited in coverage. While cities struggle with expensive sensor deployments, they already operate extensive traffic camera networks for security and traffic management. Since transportation is a major pollution source, why not utilize this existing infrastructure for environmental monitoring?

This is the story of how we transformed ordinary traffic cameras into a comprehensive city-wide pollution prediction network.

Air pollution remains a critical urban challenge, yet most cities have massive blind spots in their monitoring. Here's why traditional approaches fall short:

High costs and limited coverage: Traditional monitoring stations are expensive to deploy and maintain
Reactive approach: You only know pollution levels after they've already spiked
Maintenance complexity: Specialized equipment requires frequent servicing

Meanwhile, transportation is a major contributor to urban air pollution, and cities already have extensive camera networks watching every major intersection. The data is right there—we just needed to teach machines how to read it.

Our Approach: Traffic Patterns Predict Pollution

Our hypothesis was simple: if we can accurately count and classify vehicles in real-time, we can predict pollution levels with remarkable precision. Different vehicle types contribute differently to emissions:

Heavy trucks: High per-vehicle emissions, but operate off-peak
Taxis: Moderate emissions but constant urban operation
Private cars: Lower individual impact but highest volume
Buses: High emissions during rush hours

The key insight was that when and what type of vehicles are on the road is crucial for accurate pollution prediction.

Technical Architecture Overview

Here's the high-level pipeline we built:

python# Simplified pollution prediction pipeline
def pollution_prediction_pipeline():
    video_stream = capture_traffic_video()
    vehicles = detect_and_classify(video_stream)  # YOLO11s
    traffic_counts = track_vehicles(vehicles)     # SORT tracking
    features = engineer_features(traffic_counts)  # Cyclical encoding + lags
    prediction = model.predict(features)         # ExtraTrees ensemble
    return prediction

Building the Detection System

Step 1: Smart Vehicle Recognition

We used YOLO11s [3], the latest evolution of the "You Only Look Once" neural network, to identify and classify vehicles into six categories in real-time:

Auto: Private passenger vehicles
Taxi: Licensed taxis and ride-sharing
Commercial: Delivery vans and light commercial vehicles
Medium Truck: Urban freight vehicles
Heavy Truck: Large freight vehicles (≥15 tons)
Bus: Public transit and school buses

The model accurately identifies and classifies vehicles in real-time across different weather and lighting conditions.

Real-time vehicle classification in dense urban traffic

Figure 1: Real-time vehicle classification in dense urban traffic

Step 2: Solving the Double-Counting Problem

Counting vehicles accurately isn't trivial—the same truck moving through a frame can't be counted multiple times. We implemented SORT (Simple Online and Realtime Tracking), which assigns unique IDs to each vehicle and tracks them across video frames.

This was crucial for getting accurate hourly traffic counts by vehicle type, which became the foundation of our pollution predictions.

Step 3: The Magic of Feature Engineering

Now that we have the detection system in place, let's look at how we process this data for pollution prediction.

The key improvements came from how we handled time and traffic data:

🔄 Cyclical Time Encoding: We treated time as circular rather than linear. This means 11 PM and 1 AM are recognized as close times (only 2 hours apart), not 22 hours apart as traditional linear encoding would suggest.

python# Cyclical encoding for temporal features
def cyclical_encode(value, max_value):
    sin_val = np.sin(2 * np.pi * value / max_value)
    cos_val = np.cos(2 * np.pi * value / max_value)
    return sin_val, cos_val

# Example: encoding hour of day
hour_sin, hour_cos = cyclical_encode(hour, 24)

⏱️ Lag Variables: We included previous hour's data because pollution doesn't disappear instantly—it takes time to disperse in the atmosphere.

🚗 Traffic Sequence Patterns: We found that the order of traffic matters. A rush of heavy trucks followed by congestion creates different pollution than steady mixed traffic.

The Machine Learning Pipeline

For the prediction engine, we chose ExtraTrees (Extremely Randomized Trees) [4] with multi-output regression to simultaneously predict both PM2.5 and PM10 concentrations.

Why ExtraTrees over other algorithms?

✅ Handles non-linear relationships between traffic and pollution
✅ Robust to outliers (important for real-world noisy data)
✅ Fast predictions (essential for real-time systems)
✅ Provides feature importance (helps understand what drives pollution)

The model considers:

Current traffic counts for all 6 vehicle types
Time patterns (hour, day, month) encoded cyclically
Previous hour's traffic and pollution data
Weather conditions and seasonal patterns

Data and Validation

We tested our approach using traffic and air quality data from New York City spanning 2016-2024. The dataset included:

Traffic data: Hourly vehicle counts from NYC Open Data platform [5]
Air quality data: PM2.5 and PM10 measurements from OpenAQ monitoring stations [6]
Total observations: 52,500 paired data points

Testing on this dataset, we achieved the following results:

Figure 3: PM10 predictions: R² = 0.972 (97.2% accuracy)

Figure 4: PM2.5 predictions: R² = 0.776 (77.6% accuracy)

Key Performance Metrics:

PM10: 97.2% variance explained
PM2.5: 77.6% variance explained

The higher accuracy for PM10 makes sense—larger particles have a more direct relationship with traffic emissions, while PM2.5 involves complex atmospheric chemistry.

Unexpected Discoveries

The feature analysis revealed some surprising insights:

🚕 Taxis are pollution kings: Despite being smaller than trucks, taxis and commercial vehicles had the highest impact on pollution levels. They're constantly moving during business hours and idling in traffic.

🚛 Heavy trucks aren't the main culprit: While trucks have high per-vehicle emissions, they operate mostly during off-peak hours when there's less congestion and better air circulation.

⏰ Time patterns are crucial: Hour of day and day of week were among the most important features in the model.

Impact Beyond Pollution Monitoring

This technology enables several important applications:

🚦 Smart Traffic Management: Real-time pollution predictions can trigger dynamic traffic light optimization during high-pollution periods.

📱 Public Health Alerts: Automatic warnings when pollution exceeds safe levels, especially for vulnerable populations.

🏙️ Urban Planning: Data-driven insights for city planners making transportation and zoning decisions.

⚖️ Environmental Justice: Comprehensive monitoring ensures all neighborhoods get equal environmental protection.

Conclusion

This research demonstrates how AI can transform existing urban infrastructure into powerful environmental monitoring networks. By combining computer vision with machine learning, cities can achieve comprehensive pollution monitoring while leveraging what they already have instead of spending millions on new sensors.

The methodology works anywhere with traffic cameras—from small towns to megacities. This democratizes access to advanced environmental monitoring, especially benefiting communities that couldn't afford comprehensive air quality systems.

As urban populations continue growing, innovative approaches like this become essential for maintaining livable cities. The future of urban sustainability lies in intelligently leveraging existing assets with advanced analytics.

References

[1] Health Effects Institute. (2024) State of Global Air 2024. Available at: https://www.stateofglobalair.org/sites/default/files/documents/2024-06/soga-2024-report_0.pdf

[2] WHO. (2022) Billions of people still breathe unhealthy air: new WHO data. Available at: https://www.who.int/news/item/04-04-2022-billions-of-people-still-breathe-unhealthy-air-new-who-data

[3] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Available at: https://doi.org/10.1109/CVPR.2016.91

[4] Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3-42. Available at: https://doi.org/10.1007/s10994-006-6226-1

[5] NYC Open Data. (2024) Available at: https://opendata.cityofnewyork.us/data/

[6] OpenAQ. (2024) Available at: https://openaq.org/

How AI Transforms Traffic Cameras into Air Quality Sensors

Table of contents

The Problem: Cities Are Blind to Air Pollution