Introduction

In today’s fast-paced, cloud-driven world, understanding what happens inside your applications and infrastructure is more critical than ever. Observability is not just a buzzword—it's a necessity for delivering reliable, performant, and user-friendly software. But what does observability look like in practice? In this article, I’ll walk you through a hands-on, real-world example of implementing observability using two of the most popular open-source platforms: Prometheus and Grafana.

Whether you’re a backend developer, DevOps engineer, or just getting started with site reliability engineering (SRE), this article will help you understand how to instrument your code, collect metrics, and visualize them in beautiful dashboards that deliver actionable insights.

What is Observability?

Observability refers to how well you can understand the internal state of a system based on the data it produces—typically logs, metrics, and traces. A highly observable system lets you quickly detect, diagnose, and resolve issues, even those you didn’t anticipate.

Key components:

Metrics: Numeric values that represent the health or performance of your system (e.g., request count, latency, error rates).
Logs: Text records of discrete events.
Traces: Information about the flow of requests through various components.

In this article, we’ll focus on metrics using Prometheus and Grafana.

Real-World Scenario: Monitoring a Python Application

Imagine you have a Python web application serving users. You want to know:

How many requests are being processed?
How long do they take?
What’s the current temperature in your server room (or any custom metric)?

Let’s make your app observable!

Step 1: Instrumenting the Application with Prometheus Client

We'll use the prometheus_client library to expose metrics. Here’s a minimal example:

from prometheus_client import start_http_server, Summary, Counter, Gauge
import random
import time

# Metrics
REQUEST_COUNT = Counter('request_count_total', 'Número de peticiones procesadas')
REQUEST_TIME = Summary('request_processing_seconds', 'Tiempo de procesamiento de la solicitud')
ROOM_TEMP = Gauge('room_temperature_celsius', 'Temperatura de la habitación en Celsius')

def process_request():
    """Simulate request processing"""
    REQUEST_COUNT.inc()
    with REQUEST_TIME.time():
        ROOM_TEMP.set(20 + random.random() * 5)
        time.sleep(random.random())

if __name__ == "__main__":
    start_http_server(8000)
    print("Prometheus metrics available on http://localhost:8000/metrics")
    while True:
        process_request()
        time.sleep(1)

This snippet exposes /metrics on port 8000, which Prometheus can scrape.

Step 2: Setting Up Prometheus

You can run Prometheus locally by downloading it from prometheus.io.
Your prometheus.yml should include:

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: 'python_app'
    static_configs:
      - targets: ['localhost:8000']

Start Prometheus and visit http://localhost:9090.

Step 3: Visualizing Metrics in Grafana Cloud

Instead of running Grafana locally, let’s use Grafana Cloud for simplicity and scalability.

Sign up for a free Grafana Cloud account.
Set up a Prometheus data source in Grafana Cloud using the remote_write configuration.
In your local prometheus.yml, add:

remote_write:
  - url: "<your-grafana-cloud-prometheus-remote-write-url>"
    basic_auth:
      username: "<your-username>"
      password: "<your-api-key>"

Now, metrics from your local Prometheus will flow to Grafana Cloud!

Step 4: Creating Beautiful Dashboards

In Grafana Cloud:

Go to Dashboards > New Dashboard.
Add a panel for your metric, e.g., request_count_total.
Try visualizing the rate of requests:
Add panels for request_count_total.

You can now monitor your app's health, performance, and even custom business metrics in real time!

Step 5: Automating Observability Checks with GitHub Actions

Monitoring is only valuable if you can trust that your observability pipeline is always working. In a manual workflow, you would have to:

Install dependencies
Run code linting to check for style errors
Launch your application
Test that the /metrics endpoint is live and returns the expected metrics

Doing this manually every time you make a change is tedious and error-prone. This is where automation saves the day.

What gets automated?

With GitHub Actions, you can automate the entire validation process. Every time you push code or open a pull request, GitHub Actions will:

Check out the latest version of your code
Set up the Python environment
Install all dependencies from requirements.txt
Run linting checks with flake8
Launch your application and make a request to /metrics to ensure metrics are exposed correctly

If any step fails, you get immediate feedback, ensuring that your codebase always remains observable.

Example: GitHub Actions Workflow

Here’s a sample workflow you can add to .github/workflows/ci.yml in your repository:

name: CI - Python Metrics App

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Lint code
        run: |
          pip install flake8
          flake8 app.py

      - name: Run app and test /metrics endpoint
        run: |
          nohup python app.py &
          sleep 5
          curl http://localhost:8000/metrics

Why automate this?

No manual validation needed: Every code change is automatically validated.
Prevents regressions: If someone breaks the /metrics endpoint, the workflow fails and you know right away.
Ensures code quality: Linting is enforced as part of the pipeline.
Builds confidence: Your observability solution remains reliable as your code evolves.

Why Is This Powerful?

Proactive Monitoring: Spot slowdowns or errors before users complain.
Custom Metrics: Track what actually matters to your business.
Open-Source and Cloud-Ready: Start locally, scale globally.

Source Code

You can find the complete source code for this project on GitHub:
observability-python-prometheus-grafana (GitHub Repo)

Real Demo

Here’s a YouTube video where I walk through this entire process, from code to dashboard!

Conclusion

Observability isn’t just for big tech companies—anyone can start today with open-source tools like Prometheus and Grafana. By instrumenting your code and visualizing metrics, you gain deep insights into your system, improve reliability, and deliver better user experiences.

Ready to level up your monitoring game?
Try out this example, and let me know your thoughts or questions in the comments!

Observability Practices in Action: Real-Time Monitoring with Prometheus and Grafana

Table of contents