Node Level, Cluster Level, and Application-Level Metrics in Prometheus

Saurabh AdhauSaurabh Adhau
4 min read

Introduction

Prometheus is widely used for monitoring and alerting in cloud-native environments, particularly in Kubernetes. When collecting metrics, they can generally be categorized into three levels:

  • Node-Level Metrics (System-level metrics)

  • Cluster-Level Metrics (Kubernetes-level metrics)

  • Application-Level Metrics (Application-specific metrics)

Understanding these levels helps in designing an efficient monitoring setup that provides insights into infrastructure health, cluster performance, and application behavior.

1. Node-Level Metrics

What Are Node-Level Metrics?

Node-level metrics provide insights into the health and performance of the underlying infrastructure (VMs, physical machines, or Kubernetes nodes). These metrics are essential for understanding system resource utilization and detecting performance bottlenecks.

Examples of Node-Level Metrics:

MetricDescriptionPromQL Query Example
node_cpu_seconds_totalTotal CPU usage per mode (user/system/idle)rate(node_cpu_seconds_total[5m])
node_memory_MemAvailable_bytesAvailable memory in bytesnode_memory_MemAvailable_bytes
node_disk_io_time_seconds_totalDisk I/O operations timerate(node_disk_io_time_seconds_total[5m])
node_network_receive_bytes_totalNetwork received bytesrate(node_network_receive_bytes_total[5m])

How to Collect Node-Level Metrics?

The most commonly used exporter for collecting node-level metrics is Node Exporter.

Install Node Exporter on a Linux Node:

wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-linux-amd64.tar.gz
tar -xvzf node_exporter-linux-amd64.tar.gz
./node_exporter &

By default, Node Exporter exposes metrics at http://localhost:9100/metrics, which Prometheus can scrape.

Configure Prometheus to Scrape Node Exporter:

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

2. Cluster-Level Metrics

What Are Cluster-Level Metrics?

Cluster-level metrics provide insights into Kubernetes resource utilization. These metrics help in monitoring the health and performance of the Kubernetes cluster, including nodes, pods, deployments, and API server interactions.

Examples of Cluster-Level Metrics:

MetricDescriptionPromQL Query Example
kube_node_status_readyIndicates if a node is readykube_node_status_ready
kube_pod_status_phaseNumber of running/pending podscount(kube_pod_status_phase{phase="Running"})
kube_deployment_status_replicas_availableAvailable replicas in deploymentskube_deployment_status_replicas_available
kubelet_volume_stats_used_bytesStorage volume usage per podkubelet_volume_stats_used_bytes

How to Collect Cluster-Level Metrics?

For Kubernetes environments, the Kube State Metrics exporter is commonly used to gather cluster-level metrics.

Deploy Kube State Metrics in Kubernetes:

kubectl apply -f https://github.com/kubernetes/kube-state-metrics/releases/latest/download/kube-state-metrics.yaml

This exposes Kubernetes resource metrics at /metrics, which Prometheus can scrape.

Configure Prometheus to Scrape Kube State Metrics:

scrape_configs:
  - job_name: "kube-state-metrics"
    static_configs:
      - targets: ["kube-state-metrics.kube-system.svc.cluster.local:8080"]

3. Application-Level Metrics

What Are Application-Level Metrics?

Application-level metrics provide insights into the performance and behavior of individual applications running inside the cluster. These metrics can include HTTP request counts, response times, error rates, database queries, and business-specific KPIs.

Examples of Application-Level Metrics:

MetricDescriptionPromQL Query Example
http_requests_totalTotal number of HTTP requests receivedrate(http_requests_total[5m])
http_request_duration_secondsDuration of HTTP requestshistogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
db_query_duration_secondsDatabase query durationavg(rate(db_query_duration_seconds[5m]))
redis_commands_totalNumber of Redis commands executedrate(redis_commands_total[5m])

How to Collect Application-Level Metrics?

Applications must expose their own metrics using instrumentation libraries or exporters.

Example: Exposing Metrics in a Python Flask App

from flask import Flask
from prometheus_flask_exporter import PrometheusMetrics

app = Flask(__name__)
metrics = PrometheusMetrics(app)

@app.route('/')
def home():
    return "Hello, World!"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

This exposes metrics at http://localhost:8080/metrics.

Configure Prometheus to Scrape the Application Metrics:

scrape_configs:
  - job_name: "flask-app"
    static_configs:
      - targets: ["localhost:8080"]

Summary

LevelDescriptionExample MetricsExporters/Tools
Node LevelSystem-level metrics (CPU, memory, disk, network)node_cpu_seconds_total, node_memory_MemAvailable_bytesNode Exporter
Cluster LevelKubernetes resource-level metricskube_node_status_ready, kube_pod_status_phaseKube State Metrics
Application LevelApplication-specific performance metricshttp_requests_total, http_request_duration_secondsCustom instrumentation, Flask, JMX Exporter

Conclusion

Understanding Node-Level, Cluster-Level, and Application-Level Metrics is crucial for effectively monitoring a Kubernetes-based system.

  • Node-level metrics help in tracking infrastructure health.

  • Cluster-level metrics give insights into Kubernetes resources and workloads.

  • Application-level metrics monitor app performance and business KPIs.

Each level requires the right exporters and Prometheus configurations to ensure a comprehensive monitoring setup.

10
Subscribe to my newsletter

Read articles from Saurabh Adhau directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Saurabh Adhau
Saurabh Adhau

As a DevOps Engineer, I thrive in the cloud and command a vast arsenal of tools and technologies: โ˜๏ธ AWS and Azure Cloud: Where the sky is the limit, I ensure applications soar. ๐Ÿ”จ DevOps Toolbelt: Git, GitHub, GitLab โ€“ I master them all for smooth development workflows. ๐Ÿงฑ Infrastructure as Code: Terraform and Ansible sculpt infrastructure like a masterpiece. ๐Ÿณ Containerization: With Docker, I package applications for effortless deployment. ๐Ÿš€ Orchestration: Kubernetes conducts my application symphonies. ๐ŸŒ Web Servers: Nginx and Apache, my trusted gatekeepers of the web.