One Cloud, Two Clouds- How I Built a Pollution Dashboard (and Almost Broke Everything)

Soham WaghSoham Wagh
4 min read

aka how I tried 14 different things, broke everything, fixed everything, and somehow made it look clean by the time of the expo

Let’s start with a very real moment...

It was past midnight, I had QuickSight graphs that looked like straight lines, Athena was yelling at me about “HIVE_BAD_DATA,” and somewhere deep inside, I was still thinking:
“Wait—this is kinda fun”

This blog isn’t just about what I built — it’s about how. And more importantly, how many times I failed while trying to make it work

The Idea

We wanted to build something real — something we could imagine running in an actual smart city. So the project idea was:

Can we simulate pollution sensor data, process it entirely through the cloud, and visualize meaningful trends that could help monitor or even reduce pollution?

Simple idea. Complicated journey

Phase 1: Simulate It Till You Make It (Azure IoT + Blob Storage)

We started on Azure. Why? Because real sensors are expensive and I like simulating HUGE data

  • Created a container in Azure Blob Storage

  • Then, built a simulated IoT setup using Azure IoT Hub

  • Pushed random but somewhat realistic values: AQI, PM2.5, Temp, Humidity

This part was... surprisingly smooth? But obviously, I knew something would break soon

Phase 2: Multi-Cloud Drama Begins (Azure → AWS S3)

Okay, now we’re transferring the simulated data to AWS S3.

  • Set up the bucket, moved data in .csv format

  • Thought “Cool Time to process it.”

But the first problem hit hard:
Glue didn’t want to recognize my file.
Glue: “I can’t crawl this.”
Me: “But you said you were serverless and smart??”

Turns out, I had to carefully manage the CSV structure, add headers, and keep formats consistent. Also — permissions, oh god the permissions. Glue wouldn’t write, Athena wouldn’t read. S3 was like “403 LOL.”

Phase 3: Glue, Glue, Everywhere (ETL stage)

Once the crawler worked (after trial #37), I wrote a Glue job to:

  • Convert CSV to Parquet

  • Clean the field names

  • Standardize timestamps

Fun fact: Athena is VERY picky about column types
Typed aqi as a string by mistake? Congrats, query fails forever.

Phase 4: Athena, You're Supposed to Be Cool

Now came the queries:

sqlCopyEditSELECT date, aqi FROM iot_analysis WHERE aqi > 100;

Athena: “HIVE_BAD_DATA”
Me: “No YOU’RE bad data”

Fixed the schema. Re-partitioned. Cleaned column types in Glue
Eventually... it ran. I cheered. Alone. At 2 AM TT

Phase 5: Let’s Make It Pretty (Frontend + QuickSight)

We built a website — clean, lightweight, showing:

  • Live AQI from an API

  • Historical data (Plan A: fetched from Athena, Plan B: simulated)

Then came QuickSight. At first, it showed a line chart where all points looked... flat

Realized: our simulated data was too random and short (3 days). So I regenerated 1 month of pollution data, with a massive spike on Diwali (Oct 24–26) - and the charts came alive

Final Dashboard Highlights

  • Heatmaps showing hourly pollution trends

  • Multi-line charts for temp, AQI, humidity

  • Scatter plots showing PM2.5’s relationship with AQI

  • And bar charts that made me go, “Okay, this looks like real analysis now”

Expo Day

All of this came together just in time

I was running on 3 hours of sleep, nerves, and excitement.
And when the panel asked, “So how is this data from the cloud?”
I confidently walked them through every twist and turn

And yeah - it worked.

The Takeaways

  • Multi-cloud = cool but messy. Learn your IAM roles

  • Simulate like a mad scientist, but structure like a backend dev

  • Nothing will work on first try. That’s normal

  • Keep going. Break it. Fix it. Ship it.

What’s Next?

  • Adding real-time streaming

  • Building an ML model to predict AQI trends

  • Maybe connecting it to alert systems or a mobile dashboard

P.S.

If you're reading this and building something like it — I’ve been there
DM me Happy to help (or debug Athena errors with you at 2 AM)

Written by me — with a little help from AI to shape the chaos into a proper blog :)

2
Subscribe to my newsletter

Read articles from Soham Wagh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Soham Wagh
Soham Wagh