Welcome to Day 3 of the Observability Series! In this installment, we’ll focus on PromQL (Prometheus Query Language), the tool that makes Prometheus a powerful monitoring solution. If you're diving into Prometheus, PromQL is your gateway to querying, analyzing, and gaining insights into your system's metrics.

🛠 What is PromQL?

PromQL is a flexible and powerful query language designed to work with time-series data stored in Prometheus. It allows you to:

Retrieve data from specific metrics.
Perform mathematical operations for analysis.
Aggregate and manipulate data based on labels or dimensions.
Build complex queries to monitor system behavior effectively.

🏗 Structure of a PromQL Query

A PromQL query typically includes:

Metric Name: The specific measurement (e.g., http_requests_total).
Labels: Filters for narrowing down results (e.g., {method="POST", status="500"}).
Range Selectors: Time ranges for fetching historical data (e.g., [10m]).
Functions: Built-in operations to process data (e.g., rate(), sum()).

🔑 Basic PromQL Commands

Single Metric Query

http_requests_total

Fetches all time series data for the metric http_requests_total.

Label Filtering

http_requests_total{method="GET", status="200"}

Retrieves time series data for successful GET requests.

Time Range Query

http_requests_total{status="404"}[5m]

Fetches data for all 404 responses in the last 5 minutes.

⚙️ Aggregation in PromQL

Aggregation combines multiple time series into meaningful summaries.

Summing Time Series

sum(rate(container_cpu_usage_seconds_total[5m]))

Calculates the total CPU usage rate across containers over the past 5 minutes.

Grouping by Labels

avg(node_memory_Active_bytes) by (instance)

Returns the average active memory usage grouped by instance.

Maximum and Minimum

max_over_time(node_memory_MemAvailable_bytes[1h])
min_over_time(node_memory_MemAvailable_bytes[1h])

Finds the maximum and minimum memory available over the last hour.

🔄 Advanced PromQL Functions

PromQL’s advanced functions enable deep analysis of metrics.

Rate

rate(http_requests_total[1m])

Computes the per-second increase in http_requests_total over 1 minute.

Increase

increase(kube_pod_container_status_restarts_total[1h])

Calculates the total number of container restarts in the past hour.

Histogram Quantile

histogram_quantile(0.90, sum(rate(request_duration_seconds_bucket[5m])) by (le))

Finds the 90th percentile of request durations.

Predict Linear

predict_linear(node_network_receive_bytes_total[30m], 3600)

Forecasts the network bytes received in the next hour based on the last 30 minutes.

🧪 Additional Commands for Real-World Use Cases

Error Analysis

rate(http_requests_total{status=~"5.."}[10m])

Tracks the rate of server errors (5xx) over the last 10 minutes.

Top Resource Consumers

topk(3, rate(container_memory_usage_bytes[5m]))

Finds the top 3 containers consuming the most memory over 5 minutes.

Disk Usage Trends

delta(node_filesystem_free_bytes[1h])

Calculates the change in available disk space over an hour.

📈 PromQL in Action: Monitoring and Alerting

Kubernetes Pod Metrics

sum(rate(container_cpu_usage_seconds_total{namespace="prod"}[1m])) by (pod)

Aggregates CPU usage across pods in the prod namespace.

Service Latency Analysis

avg_over_time(http_request_duration_seconds{job="web"}[10m])

Calculates the average response time for a web service over 10 minutes.

Alert for High Memory Usage

container_memory_usage_bytes > 1e+09

Triggers an alert when container memory usage exceeds 1 GB.

💡 Tips for Writing Effective PromQL Queries

Start Simple: Begin with basic queries to understand the metrics.
Layer Functions: Combine functions like rate() and sum() for deeper insights.
Test and Iterate: Use the Prometheus UI or Grafana to validate your queries.
Optimize Filters: Leverage labels to fine-tune queries and reduce unnecessary data retrieval.

🌟 Conclusion

PromQL is a game-changer for monitoring and observability, transforming raw metrics into actionable insights. By mastering its commands and functions, you can monitor complex systems effectively, analyze trends, and set up meaningful alerts.

As part of this Observability Series, we’ve explored PromQL fundamentals and advanced queries. Stay tuned for Day 4, where we’ll dive into setting up Grafana dashboards for Prometheus metrics!

What’s your favorite PromQL query? Share it in the comments below!

Day 3: Observability Series – Mastering PromQL in Prometheus