🌇 At a Startup somewhere Between Chaos and Coffee Machines

It was just another Tuesday morning.

The fan whirred. The Slack pings piled up. And Jigyās — backend engineer, tmux warrior, known procrastinator — was sipping his second cup of North Indian filter coffee.

That’s when it hit him.

💬 “Hey Jigyās, can you take ownership of our new data lake infra?”

His tech lead had dropped the message casually, as if building a data lake was like spinning up a Redis pod.

“Of course!” he typed back, fingers moving faster than his neurons.

🤔 Who is Jigyās?

Jigyās (yes, like curiosity in Sanskrit) is that developer we all know — ambitious, restless, and forever asking “but why though?”

He’s the guy who reads RFCs on a weekend, then forgets what he read by Monday.
He’s the guy who once Dockerized his rice cooker just to automate dinner.
He’s not foolish — but he asks foolish questions. The good kind.

Because Jigyās believes that asking dumb questions is how you get to smart answers.

🎙️ The Confused Monologue of Jigyās

Scene: Jigyās' room, 11:42 PM. Under 156 browser tabs, neatly stacked into 8 groups—each overflowing with data lake jargon and Spark diagrams that looked like ancient rituals*. Mind melting.*

🧑‍💻 “Okay, so let me get this straight...”
“A data lake is not a lake. A data lakehouse is not a house. And Apache Hudi has nothing to do with hoodies?”

Jigyās paces around the room.

“Apparently, I need a data ingestion layer… maybe Spark Streaming? Or should I use Flink? Wait, what even is DeltaStreamer? Sounds like a failed gaming channel.”

“And then there’s COW and MOR… Copy-On-Write? Merge-On-Read? Why does my data sound like it’s mooing?”

“Someone said I need a table format — like Iceberg or Delta or Hudi. Then someone else told me Hudi has something called a commit timeline. What is this, Git for data?”

“And I read something about metadata tables, compaction strategies, partition pruning, incremental queries, Upserts vs Inserts, schema evolution, Z-ordering, column pruning, data skipping—”

He collapses into his chair.

🧘‍♂️ Enter Jñānesh, the Data Whisperer

Just as Jigyās was about to question his entire existence (and whether his rice cooker was a more manageable system than Apache Spark)...

A voice emerged — clear, calm, almost too serene for a room filled with terminal tabs and mental breakdowns.

🧙‍♂️ “Confused, are we?”

Jigyās looked up.
He didn’t remember joining a Google Meet.
Or opening Zoom.
But there he was — Jñānesh. Draped in a data-neutral linen kurta. Calm as a cache hit.

🧙 Who is Jñānesh?

Some say he used to be a staff data engineer.
Some say he once optimized a Presto query so hard, it started giving life advice. No one really knows.

What’s certain is — when Jñānesh speaks, bytes listen.

He doesn’t chase tech trends; he questions them.
He prefers clarity over cleverness.
And when it comes to data systems, he sees through the noise.

🎯 Jargon Dump, Round 2

🧑‍💻 “Are you real?”
🧙‍♂️ “As real as your production bugs.”

🧑‍💻 “Okay listen... I’ve been trying to wrap my head around this lakehouse thing for 7 hours now.”

🧑‍💻 “Is it a warehouse sitting on a lake? Or a lake pretending to be a warehouse? Or a bunch of files pretending to be tables?”

🧑‍💻 “And why does Hudi need commit timelines? What is Trino doing in all of this? Is Iceberg better than Delta? What about catalog sync? Streaming ingestion? Instant rollbacks? Is MOR even production-ready?”

🧑‍💻 “And why... why does everyone say schema evolution like it’s some Darwinian prophecy?”

Jigyās, exhausted, stares at his terminal — which, at this point, is just blinking judgmentally.

🧙 Jñānesh Smiles

🧙‍♂️ “You’ve walked into a temple mid-ritual and are confused why people are chanting.”

🧙‍♂️ “Before you understand the Lakehouse, you must understand why it exists.”

🧙‍♂️ “Let’s go back to the beginning. Not Hadoop. Not S3. Not even Spark.”

🧙‍♂️ “Let’s begin with what you already know — the humble, dependable relational database.*”*

🧑‍💻 “Databases? Really?”

🧙‍♂️ “Yes, my young engineer. The answers you seek are downstream from the questions you’ve skipped.”

🧱 Back to Basics — The Relational Awakening

Jñānesh walks slowly toward Jigyās’s desk (which is mostly imaginary). He gestures at the terminal like a monk gesturing to a scroll.

🧙‍♂️ “Let’s begin where all data began...
With rows, columns, and a very familiar friend: the relational database.”

🧑‍💻 “Postgres? MySQL? Yeah, I’ve used them a hundred times.”

🧙‍♂️ “Indeed. Tools that store data in tables, with schemas, and follow the sacred rules of ACID.”
Atomicity. Consistency. Isolation. Durability.
“Not just buzzwords — foundations.”

Jñānesh draws a table on the whiteboard of Jigyās’s mind:

id	name	age	created_at
1	Alice	28	2023-09-12 10:00:00
2	Bob	35	2023-09-13 15:22:47

🧙‍♂️ “You query, you filter, you join.
You insert, update, delete.
And in most cases, it works beautifully.”

🧑‍💻 “Yeah, I mean... I’ve built APIs, dashboards — even ran analytics queries on small tables. What’s the problem?”

⚠️ The First Crack: Volume

Jñānesh slowly circles the table.

🧙‍♂️ “What if instead of 1,000 users… you had 10 million?”
“What if each user generates events, logs, clicks, sessions — every second of the day?”

Jigyās nods, starting to feel the weight.

🧙‍♂️ “Relational databases were built for transactions, not terabytes. They’re optimized for small, consistent reads and writes — not massive scans and aggregations across millions of rows.”

🧑‍💻 “So that’s when people started using… warehouses?”

🧙‍♂️ “Patience. First — tell me — what do you do when a system starts to slow down?”

🧑‍💻 “I… optimize queries, add indexes, maybe denormalize…”

🧙‍♂️ “Exactly. You start bending the system to behave like something it was never meant to be.”

🧠 The Evolution of Need

Jñānesh continues:

🧙‍♂️ “When systems shifted from storing records to analyzing patterns, the relational model started to break down.”

🧙‍♂️ “That’s when engineers asked — what if we build something optimized for analytical queries instead of transactional ones?”

🧙‍♂️ “Something that doesn’t get cranky when scanning billions of rows.”

🧑‍💻 “So that’s the warehouse?”

🧙‍♂️ “Yes. The data warehouse was born to solve the scale and speed problem of relational databases — especially for OLAP (Online Analytical Processing).”

🧑‍💻 “Wait, but I read something about columnar databases too... Like ClickHouse, Apache Doris, stuff like that. Don’t they solve this?”

Jñānesh smiles.

🧙‍♂️ “Good catch. Yes, columnar databases store data by column instead of by row — making them ideal for OLAP-style queries like summing all sales amounts, or filtering by one column across millions of records.”

🧙‍♂️ “But even they were part of the same shift — from row-oriented, transactional systems to analytical, column-based systems.”

🧙‍♂️ “And while some evolved into fast columnar stores, others became part of the architecture that gave birth to the data warehouse — the first real attempt to organize big data for business intelligence.”

🧑‍💻 “So… we went from transactional databases → columnar systems → full-blown warehouses?”

🧙‍♂️ “In essence, yes. Each step was a response to scale, performance, and changing expectations.”

🧘‍♂️ A Pause Before the Warehouse

Jñānesh stood still, as if time had slowed.

🧙‍♂️ “You’ve taken the first step, Jigyās. You’ve understood why relational databases, though powerful, weren’t built for what modern data needs.”

🧙‍♂️ “Before we dive into warehouses and lakes, take a moment. Sit with these questions.”

He raised a hand and spoke, softly but with intent:

🧙‍♂️ “What kind of queries did your app run last month?
Were they transactional… or analytical?”

🧙‍♂️ “Have you seen the performance cost of joining large tables?
Or the limits of indexing when data hits hundreds of millions of rows?”

🧙‍♂️ “What does your business need: fast inserts… or fast insights?”

Jigyās blinked. For once, he wasn’t overwhelmed — just curious.

🧑‍💻 “I think I get it. I need to look at what the system is meant to do, not just how cool the tools are.”

Jñānesh nodded.

🧙‍♂️ “Exactly. Tech is not a stack — it’s a story. We’ll continue ours tomorrow.”

He turned to leave — probably to meditate on Spark commit logs.

🪑 And as Jigyās leaned back in his chair...

He wasn’t any less confused.

But for the first time in hours… he wasn’t panicking.

🔜 Coming Up Next:

Episode 2: Of Warehouses and Cubes — When Data Outgrew Databases
Where Jñānesh returns to talk about:

Batch processing
Star schemas
Why warehouses were better — but still not enough

Jigyās Gets the Task of a Lifetime

Table of contents