What the Hell Is Observability and Why Do I Pretend to Understand It?

Walter SotoWalter Soto
3 min read

At some point in the last five years, you’ve nodded along in a meeting while some chump said, “We need to improve observability,” and you had no idea if they meant better logs, a new dashboard, or installing a third Prometheus instance just to feel something.

Well, me too.

Observability is just another tech buzzwords we all use with the confidence of a toddler wearing a Batman cape fully convinced we are invincible, until reality hits. We’re sure it’s important. We’re pretty sure it involves metrics. And we’re sure nobody will call us out because they don’t get it either.

So let’s rip the bandage off and figure out what the hell observability actually is, before we add “observability engineer” to our LinkedIn profile and get exposed by someone who actually knows.

TL;DR: Observability ≠ Monitoring

Monitoring is asking “Is this thing on?”

Observability is asking “Why is this thing is beeping at 3AM?”

The classic definition (borrowed from control theory, because we’re all pretending to understand math) is:

Observability is the ability to infer the internal state or condition of a system based on its outputs.

In English: If your app’s on fire and your logs, metrics, and traces can pinpoint the arsonist, the crime scene, and how long it’s been burning, congratulations. You’re observable.

The Unholy Trinity: Logs, Metrics, and Traces

Ah yes, the holy trinity of observability (or three pillars, whatever). Or as I call them:

  • Logs: The app’s internal monologue. Sometimes helpful. Sometimes just “ERROR: Something went wrong. ¯\(ツ)/¯”

  • Metrics: Numbers that tell you how fast, slow, bloated, or dead something is. Think CPU usage, request latency, or how many Docker containers are gasping for air.

  • Traces: The family tree of a request. It tells you where it went, how long it took, and which microservice mess up.

So Why Do We Suck at It?

Because most of us duct-taped Grafana to Prometheus (or Victoria Metrics :-p), slapped in a log aggregator, and called it a day. Then we pretend it’s “observability” because we’re tired, and we’ve already burned out two SREs this quarter.

True observability is proactive. It’s the difference between knowing your system is collapsing and knowing exactly why it collapsed—before it sends an SRE into early retirement.

The Lie We All Tell

Every dashboard is a lie. Especially the ones that look good.

We build them to look impressive during demos, but half the time they’re just static graphs that say “everything’s fine” while the database isn’t logging errors because it’s too busy bluffing its way through a poker game with your customer data.

If you’ve ever heard, “It didn’t alert, so it must be fine,” just know you’ve witnessed observability malpractice.

How to Fake It Better

Until you truly invest in observability (and not just more dashboards), here’s how to fake it with style:

  1. Make logs structured: JSON, please. We’re not cavemen.

  2. Use labels in your metrics: Otherwise, it’s just a number doing cosplay as insight.

  3. Adopt tracing before your microservices unionize because when they multiply without proper tracing, chaos is inevitable.

  4. Actually look at your dashboards before they’re needed in an incident post-mortem.


Final Thoughts

Observability is like flossing. Everyone says they do it. Almost nobody does it right. And the only time we care is when something hurts.

So yeah, maybe I am still pretending. But at least now I know what I’m pretending about.

0
Subscribe to my newsletter

Read articles from Walter Soto directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Walter Soto
Walter Soto