One Cloud, Two Clouds- How I Built a Pollution Dashboard (and Almost Broke Everything)

Soham WaghSoham Wagh
4 min read

aka how I tried 14 different things, broke everything, fixed everything, and somehow made it look clean by the time of the expo

Let’s start with a very real moment...

It was past midnight, I had QuickSight graphs that looked like straight lines, Athena was yelling at me about “HIVE_BAD_DATA,” and somewhere deep inside, I was still thinking:
“Wait—this is kinda fun”

This blog isn’t just about what I built — it’s about how. And more importantly, how many times I failed while trying to make it work

The Idea

We wanted to build something real — something we could imagine running in an actual smart city. So the project idea was:

Can we simulate pollution sensor data, process it entirely through the cloud, and visualize meaningful trends that could help monitor or even reduce pollution?

Simple idea. Complicated journey

Phase 1: Simulate It Till You Make It (Azure IoT + Blob Storage)

We started on Azure. Why? Because real sensors are expensive and I like simulating HUGE data

  • Created a container in Azure Blob Storage

  • Then, built a simulated IoT setup using Azure IoT Hub

  • Pushed random but somewhat realistic values: AQI, PM2.5, Temp, Humidity

This part was... surprisingly smooth? But obviously, I knew something would break soon

Phase 2: Multi-Cloud Drama Begins (Azure → AWS S3)

Okay, now we’re transferring the simulated data to AWS S3.

  • Set up the bucket, moved data in .csv format

  • Thought “Cool Time to process it.”

But the first problem hit hard:
Glue didn’t want to recognize my file.
Glue: “I can’t crawl this.”
Me: “But you said you were serverless and smart??”

Turns out, I had to carefully manage the CSV structure, add headers, and keep formats consistent. Also — permissions, oh god the permissions. Glue wouldn’t write, Athena wouldn’t read. S3 was like “403 LOL.”

Phase 3: Glue, Glue, Everywhere (ETL stage)

Once the crawler worked (after trial #37), I wrote a Glue job to:

  • Convert CSV to Parquet

  • Clean the field names

  • Standardize timestamps

Fun fact: Athena is VERY picky about column types
Typed aqi as a string by mistake? Congrats, query fails forever.

Phase 4: Athena, You're Supposed to Be Cool

Now came the queries:

sqlCopyEditSELECT date, aqi FROM iot_analysis WHERE aqi > 100;

Athena: “HIVE_BAD_DATA”
Me: “No YOU’RE bad data”

Fixed the schema. Re-partitioned. Cleaned column types in Glue
Eventually... it ran. I cheered. Alone. At 2 AM TT

Phase 5: Let’s Make It Pretty (Frontend + QuickSight)

We built a website — clean, lightweight, showing:

  • Live AQI from an API

  • Historical data (Plan A: fetched from Athena, Plan B: simulated)

Then came QuickSight. At first, it showed a line chart where all points looked... flat

Realized: our simulated data was too random and short (3 days). So I regenerated 1 month of pollution data, with a massive spike on Diwali (Oct 24–26) - and the charts came alive

Final Dashboard Highlights

  • Heatmaps showing hourly pollution trends

  • Multi-line charts for temp, AQI, humidity

  • Scatter plots showing PM2.5’s relationship with AQI

  • And bar charts that made me go, “Okay, this looks like real analysis now”

Expo Day

All of this came together just in time

I was running on 3 hours of sleep, nerves, and excitement.
And when the panel asked, “So how is this data from the cloud?”
I confidently walked them through every twist and turn

And yeah - it worked.

The Takeaways

  • Multi-cloud = cool but messy. Learn your IAM roles

  • Simulate like a mad scientist, but structure like a backend dev

  • Nothing will work on first try. That’s normal

  • Keep going. Break it. Fix it. Ship it.

What’s Next?

  • Adding real-time streaming

  • Building an ML model to predict AQI trends

  • Maybe connecting it to alert systems or a mobile dashboard

P.S.

If you're reading this and building something like it — I’ve been there
DM me Happy to help (or debug Athena errors with you at 2 AM)

Written by me — with a little help from AI to shape the chaos into a proper blog :)

2
Subscribe to my newsletter

Read articles from Soham Wagh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Soham Wagh
Soham Wagh

Passionate Computer Science student (SRM Institute of Science and Technology, 2026) exploring the intersection of Generative AI, Cloud Computing, and IoT. I enjoy building impactful projects, writing about my learning journey, and sharing experiments that bridge academic knowledge with real-world applications.