Getting Started with Prometheus: A Guide

Table of contents
- Prometheus Architecture:
- Who Does What?
- π’ 1. Prometheus Server (Core Component)
- π‘ 2. Targets & Exporters (Where Data Comes From)
- ποΈ 3. Time-Series Database (TSDB)
- π 4. PromQL (Query Engine)
- π‘ 5. Service Discovery Mechanism (Finder of Targets)
- π¨ 6. Alertmanager (Sends Notifications)
- πΊ 7. Grafana (Visualization)
- π How Everything Works Together
- Prometheus Fundamentals:
- Metrics in Prometheus:
- What is PromQL?
- PromQL:

I hope you read my previous blog on Prometheus, which covers the basics. Let me know what you found most helpful in it. In this blog, we will cover the fundamentals of Prometheus, PromQL, dashboarding and visualization, service discovery, Push Gateway, and monitoring Kubernetes. So, let's get started without wasting any time!
Prometheus Architecture:
Who Does What?
Prometheus works like a data collection and alerting system that continuously pulls metrics from various sources and stores them for monitoring and querying.
πΉ Key Components:
1οΈβ£ Prometheus Server β The core brain π§
2οΈβ£ Targets/Exporters β The data providers π‘
3οΈβ£ Time-Series Database (TSDB) β The storage ποΈ
4οΈβ£ PromQL (Prometheus Query Language) β The data analyzer π
5οΈβ£ Alertmanager β The notifier π¨
6οΈβ£ Grafana β The dashboard viewer πΊ
π’ 1. Prometheus Server (Core Component)
Function: Pulls data from targets, stores it, and processes queries.
Contains:
Scraper: Fetches data from predefined targets (like your apps, servers, or containers).
Storage (TSDB): Stores data as time-series (timestamped records).
Query Engine (PromQL): Allows data analysis using queries.
π Example:
Every 15 seconds, Prometheus asks an app: "Hey, howβs your CPU usage?"
The app replies: "Right now, it's 30%."
Prometheus saves this info as:
cpu_usage{instance="server1"} 30 # Metric value 30 at timestamp
π‘ 2. Targets & Exporters (Where Data Comes From)
Function: Expose metrics in a format Prometheus understands.
Types:
1οΈβ£ Direct Targets β Apps that expose/metrics
(e.g., a Go app with Prometheus client library).
2οΈβ£ Exporters β Convert third-party data into Prometheus format.Node Exporter β Monitors Linux system metrics.
cAdvisor β Monitors Docker container metrics.
Kube-State-Metrics β Monitors Kubernetes workloads.
π Example:
A web app runs on
http://app:8000/metrics
, and Prometheus scrapes it every 10s.If the app doesnβt expose
/metrics
, we use an Exporter to bridge the gap.
ποΈ 3. Time-Series Database (TSDB)
Function: Stores collected metrics efficiently.
Structure:
Timestamp (When the data was recorded).
Metric Name (What is being measured).
Labels (Extra details like
instance="server1"
).Value (The actual measurement).
π Example (TSDB Entry):
http_requests_total{method="GET", status="200"} 1250 # 1250 GET requests recorded
π 4. PromQL (Query Engine)
Function: Allows analysis of collected metrics.
Queries:
up
β Shows which targets are working.rate(http_requests_total[5m])
β Requests per second in the last 5 mins.
π Example Query & Output:
Query:
sum(rate(cpu_usage_seconds_total[1m])) by (instance)
Output:
instance="server1" β 0.5 (50% CPU usage in the last 1 min)
π‘ 5. Service Discovery Mechanism (Finder of Targets)
Function: Keeps track of which instances (pods, containers, or VMs) are running.
Where does it look?
Kubernetes API βΈοΈ
AWS EC2 API βοΈ
Docker Swarm API π³
Consul, Etcd, Zookeeper π
Custom HTTP SD API π
π Example:
If a new Pod starts in Kubernetes, Prometheus automatically detects and starts scraping it.
If a server shuts down on AWS, it stops scraping it.
π¨ 6. Alertmanager (Sends Notifications)
Function: Sends alerts when something goes wrong.
Example Alert Rule (CPU Usage High):
groups: - name: high_cpu_alerts rules: - alert: HighCPU expr: sum(rate(cpu_usage_seconds_total[5m])) > 0.9 for: 2m labels: severity: critical annotations: description: "High CPU usage detected!"
Alerts can be sent to: Slack, Email, PagerDuty, etc.
πΊ 7. Grafana (Visualization)
Function: Displays Prometheus data on dashboards.
Example Dashboard:
CPU, Memory, Network usage graphs.
Alerts when server load is too high.
π How Everything Works Together
Targets/Exporters expose data β
Prometheus Server scrapes data every X seconds β
TSDB stores the metrics β
PromQL is used to query & analyze data β
Alertmanager triggers alerts if needed β
Grafana visualizes the data π
Prometheus pulls data from targets (it doesnβt wait for data to be pushed).
It stores metrics as time-series and allows powerful queries.
It can trigger alerts when something is wrong.
Grafana can be used to display dashboards beautifully.
Prometheus Fundamentals:
Node Exporter π₯οΈ:
Node Exporter is a Prometheus exporter that collects system-level metrics (CPU, memory, disk, network, etc.) from a machine (server, VM, or local system) and exposes them to Prometheus for monitoring.
By default, Prometheus cannot directly monitor system metrics like CPU usage or memory consumption. Node Exporter solves this by exposing those metrics in a Prometheus-compatible format.
πΉ System Monitoring β Tracks CPU, memory, disk, network, and more
πΉ Lightweight & Efficient β Runs as a small background process
πΉ Prometheus-Compatible β Exposes metrics via HTTP (localhost:9100/metrics
)Authentication and Encryption:
By default, when Prometheus is set up to scrape data from a node, it does not enforce authentication. This means that anyone with access to the target can retrieve the exposed metrics, which could lead to unauthorized data access. To prevent this, itβs crucial to implement authentication and encryption.
Metrics in Prometheus:
Metric Name
A descriptive name for the thing being measured (e.g.,http_requests_total
,cpu_usage_seconds_total
).Labels (Optional)
Key-value pairs used to differentiate different dimensions of the same metric (e.g.,method="GET"
,status="200"
).Timestamp
When the data point was recorded (often automatically managed by Prometheus).Value
The actual numeric value of the metric at that time.
Example:
http_requests_total{method="GET", handler="/api", status="200"} 1287
Metric name:
http_requests_total
Labels:
method="GET"
,handler="/api"
,status="200"
Value:
1287
(number of successful GET requests to /api)Timestamp: (implicitly recorded by Prometheus when scraped)
What is PromQL?
PromQL stands for Prometheus Query Language β it's the powerful and flexible language you use to query, filter, and analyze metrics stored in Prometheus.
πΉ What You Can Do with PromQL:
Select metrics (e.g.,
http_requests_total
)Filter by labels (e.g.,
method="GET"
)Perform calculations (e.g., rate of change, averages, percentages)
Aggregate data (e.g., by instance, job, or other labels)
Generate graphs and alerts
πΈ Basic Syntax Examples
π 1. Select a Metric
promqlCopyEdithttp_requests_total
Shows all time series with that metric name.
π― 2. Filter by Label
promqlCopyEdithttp_requests_total{method="GET", status="200"}
Filters only the GET requests with status 200.
β±οΈ 3. Calculate Rate of Increase
rate(http_requests_total[1m])
Shows the per-second rate of requests over the last 1 minute.
π 4. Aggregate by Label
sum(rate(http_requests_total[5m])) by (job)
Shows total request rate per job (like frontend, backend, etc.)
π§ 5. Calculate CPU Usage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Shows CPU usage (as a %) per instance by subtracting idle time.
PromQL:
π 1. Selectors
Selectors are used to specify which time series (metrics) you want to query. There are two main types:
π Instant Vector Selector
Selects the latest sample for each matching time series.
http_requests_total{method="GET", status="200"}
Metric:
http_requests_total
Labels:
method="GET"
,status="200"
This returns the current value of all time series that match the labels.
π Range Vector Selector
Selects time series data over a time range.
rate(http_requests_total[5m])
http_requests_total[5m]
= all data points from the past 5 minutes.Used inside functions like
rate()
oravg_over_time()
.
π·οΈ 2. Matchers
Matchers are used inside label selectors (the {}
block) to filter which time series you want.
Common Matchers:
Matcher | Meaning | Example |
= | Equals | job="api-server" |
!= | Not equal | method!="POST" |
=~ | Regex match | instance=~"server.*" |
!~ | Regex does not match | status!~"4.." (not 4xx errors) |
Example:
http_requests_total{job="frontend", status=~"5.."}
Selects all 5xx status codes from the frontend
job.
π οΈ 3. Modifiers
Modifiers change how a function behaves or how results are grouped.
a. by
/ without
(Aggregation Modifiers)
Used with aggregations like sum
, avg
, etc.
by(...)
: keep these labelswithout(...)
: drop these labels
sum(rate(http_requests_total[5m])) by (job)
Sum the rate, but group by job
.
sum(rate(http_requests_total[5m])) without (instance)
Sum the rate and ignore instance
label.
b. on
/ ignoring
(Binary Operator Modifiers)
Used when combining two metrics.
http_requests_total / on(instance) up
Joins metrics only where instance
matches.
http_requests_total / ignoring(job) up
Joins metrics but ignores job
label during match.
c. offset
offset
shifts the evaluation time backward by a given duration. βShow me the value of this metric X time ago.β
<metric_name>[<range>] offset <duration>
OR (for instant vectors):
<metric_name> offset <duration>
β± Prometheus Time Units Table
Unit | Suffix | Meaning | Example Usage |
Seconds | s | 1 second | offset 30s |
Minutes | m | 60 seconds | offset 5m |
Hours | h | 60 minutes | offset 1h |
Days | d | 24 hours | offset 2d |
Weeks | w | 7 days | offset 1w |
Years | y | 365 days (not leap) | offset 1y (rare) |
β Examples in Context
rate(http_requests_total[5m] offset 1h)
β Rate of requests 1 hour ago over a 5-minute window
up offset 2d
β Status of targets exactly 2 days ago
avg_over_time(cpu_usage[10m] offset 7d)
β CPU usage last week at this time, averaged over 10 minutes
Subscribe to my newsletter
Read articles from Sahil Naik directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sahil Naik
Sahil Naik
π» Sahil learns, codes, and automates, documenting his journey every step of the way. π