๐Ÿ“ŒPart 4 Optimize Kubernetes Monitoring: A Complete Guide to Prometheus and Grafana Integration

Vikas SurveVikas Surve
4 min read

๐Ÿ“Œ Advanced Kubernetes Monitoring with Prometheus, Grafana, Node Exporter & cAdvisor

1๏ธโƒฃ Overview

Monitoring Kubernetes effectively requires collecting metrics from both the cluster and individual nodes. This guide extends our monitoring setup by adding:

โœ… Prometheus โ€“ Collects Kubernetes cluster metrics
โœ… Grafana โ€“ Visualizes metrics in dashboards
โœ… Node Exporter โ€“ Captures node-level CPU, memory, and disk metrics
โœ… cAdvisor โ€“ Monitors container-level resource usage

By the end of this guide, you'll have deep visibility into your Kubernetes environment. ๐Ÿš€


2๏ธโƒฃ Deploying Prometheus

๐Ÿ“Œ Ensure you have created a monitoring namespace:

kubectl create namespace monitoring

โœ… Sample Output:

namespace/monitoring created

๐Ÿ”น Apply Prometheus Configurations

curl -o prometheus-config.yaml https://raw.githubusercontent.com/Vikas-DevOpsPractice/EasyShop/feature/kindcluster/K8s/14-prometheus-config.yaml
curl -o prometheus-deployment.yaml https://raw.githubusercontent.com/Vikas-DevOpsPractice/EasyShop/feature/kindcluster/K8s/15-prometheus-deployment.yaml

kubectl apply -f prometheus-config.yaml -n monitoring
kubectl apply -f prometheus-deployment.yaml -n monitoring

โœ… Check Prometheus Deployment:

kubectl get pods -n monitoring

โœ… Sample Output:

NAME                         READY   STATUS    RESTARTS   AGE
prometheus-5f9d77c86f-xyz12  1/1     Running   0          1m

3๏ธโƒฃ Deploying Node Exporter

๐Ÿ”น Why Node Exporter?

๐Ÿ”น Collects CPU, Memory, Disk, and Network usage of each Kubernetes node
๐Ÿ”น Provides hardware and OS metrics

๐Ÿ”น Step 1: Create a DaemonSet for Node Exporter

# node-exporter-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:v1.5.0
        ports:
        - containerPort: 9100
          hostPort: 9100

โœ… Apply the DaemonSet:

kubectl apply -f node-exporter-daemonset.yaml -n monitoring

โœ… Verify Deployment:

kubectl get pods -n monitoring

โœ… Sample Output:

NAME                         READY   STATUS    RESTARTS   AGE
node-exporter-xyz12          1/1     Running   0          1m

4๏ธโƒฃ Deploying cAdvisor

๐Ÿ”น Why cAdvisor?

๐Ÿ”น Provides per-container resource usage (CPU, memory, disk, network)
๐Ÿ”น Helps in troubleshooting slow or resource-hungry containers

๐Ÿ”น Step 1: Create a DaemonSet for cAdvisor

# cadvisor-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: cadvisor
  template:
    metadata:
      labels:
        app: cadvisor
    spec:
      hostNetwork: true
      containers:
      - name: cadvisor
        image: gcr.io/cadvisor/cadvisor:v0.47.0
        ports:
        - containerPort: 8080
          hostPort: 8080

โœ… Apply the DaemonSet:

kubectl apply -f cadvisor-daemonset.yaml -n monitoring

โœ… Verify Deployment:

kubectl get pods -n monitoring

โœ… Sample Output:

NAME                         READY   STATUS    RESTARTS   AGE
cadvisor-xyz12               1/1     Running   0          1m

5๏ธโƒฃ Integrating Node Exporter & cAdvisor with Prometheus

๐Ÿ”น Update Prometheus Configuration

Add the following scrape jobs to prometheus-config.yaml:

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter.monitoring.svc.cluster.local:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor.monitoring.svc.cluster.local:8080']

โœ… Reapply Prometheus Configuration:

kubectl apply -f prometheus-config.yaml -n monitoring
kubectl rollout restart deployment prometheus -n monitoring

6๏ธโƒฃ Deploying Grafana

curl -o grafana-deployment.yaml https://raw.githubusercontent.com/Vikas-DevOpsPractice/EasyShop/feature/kindcluster/K8s/16-grafana-deployment.yaml
kubectl apply -f grafana-deployment.yaml -n monitoring

โœ… Verify Grafana:

kubectl get pods -n monitoring

โœ… Sample Output:

NAME                         READY   STATUS    RESTARTS   AGE
grafana-78b6c9c76f-xyz12     1/1     Running   0          1m

7๏ธโƒฃ Setting Up Dashboards in Grafana

๐Ÿ”น Add Prometheus as a Data Source

๐Ÿ“Œ Go to Grafana โ†’ Configuration โ†’ Add Data Source
๐Ÿ”น Select Prometheus
๐Ÿ”น Set URL to:

http://prometheus.monitoring.svc.cluster.local:9090

๐Ÿ”น Click Save & Test

โœ… Integration Successful!


๐Ÿ”น Import Prebuilt Kubernetes Dashboards

๐Ÿ“Œ Go to Grafana Dashboard โ†’ Click Dashboards โ†’ Import
๐Ÿ”น Use Dashboard ID: 11074 (Node Exporter)
๐Ÿ”น Use Dashboard ID: 13689 (cAdvisor)
๐Ÿ”น Select Prometheus as the data source โ†’ Click Import

โœ… Sample Node Metrics Dashboard:

Node Metrics

โœ… Sample Container Metrics Dashboard:

Container Metrics


8๏ธโƒฃ Setting Up Alerts in Grafana

๐Ÿ“Œ Open Grafana โ†’ Click Alerts โ†’ Create Alert Rule
๐Ÿ”น Condition: Alert when CPU Usage > 80% for 5 minutes
๐Ÿ”น Notification: Email, Slack, PagerDuty
๐Ÿ”น Click Save & Enable Alerting

โœ… Now, alerts will trigger on resource spikes!


9๏ธโƒฃ Troubleshooting & Best Practices

๐Ÿ”น Prometheus Not Collecting Metrics?

kubectl logs -l app=prometheus -n monitoring

๐Ÿ“Œ Ensure the scrape configs in prometheus-config.yaml are correct


๐Ÿ”น Node Exporter Not Running?

kubectl describe pod node-exporter-xyz12 -n monitoring

๐Ÿ“Œ Ensure hostPort 9100 is available


๐Ÿ”น cAdvisor Metrics Not Appearing?

kubectl logs -l app=cadvisor -n monitoring

๐Ÿ“Œ Ensure cadvisor.monitoring.svc.cluster.local:8080 is correct in Prometheus config


๐ŸŽฏ Conclusion

๐Ÿš€ Advanced Kubernetes Monitoring is now fully set up!
โœ… Prometheus collects Kubernetes & node metrics
โœ… Node Exporter tracks hardware performance
โœ… cAdvisor monitors per-container usage
โœ… Grafana visualizes & alerts on key metrics

๐Ÿ“Œ Next Step: End-to-End CI/CD Automation for Kubernetes Using Jenkins, GitLab, AWS CodePipeline & Azure DevOps

0
Subscribe to my newsletter

Read articles from Vikas Surve directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vikas Surve
Vikas Surve

I am an ๐— ๐—ฆ ๐—–๐—ฒ๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ and ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฑ๐—บ๐—ถ๐—ป๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ผ๐—ฟ ๐—”๐˜€๐˜€๐—ผ๐—ฐ๐—ถ๐—ฎ๐˜๐—ฒ with over ๐Ÿญ๐Ÿฌ ๐˜†๐—ฒ๐—ฎ๐—ฟ๐˜€ ๐—ผ๐—ณ ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ in designing, implementing, and optimizing DevOps solutions. My expertise includes ๐—–๐—œ/๐—–๐—— ๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐—š๐—ถ๐˜๐—Ÿ๐—ฎ๐—ฏ, ๐—๐—ฒ๐—ป๐—ธ๐—ถ๐—ป๐˜€, ๐—ฎ๐—ป๐—ฑ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€, as well as ๐—–๐—ผ๐—ป๐˜๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฟ ๐—ผ๐—ฟ๐—ฐ๐—ต๐—ฒ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜„๐—ถ๐˜๐—ต ๐——๐—ผ๐—ฐ๐—ธ๐—ฒ๐—ฟ ๐—ฎ๐—ป๐—ฑ ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€. ๐Ÿ”น ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป ๐—ฑ๐—ฒ๐˜€๐—ถ๐—ด๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—ป๐—ฑ ๐—บ๐—ฎ๐—ป๐—ฎ๐—ด๐—ถ๐—ป๐—ด ๐—ฒ๐—ป๐—ฑ-๐˜๐—ผ-๐—ฒ๐—ป๐—ฑ ๐—–๐—œ/๐—–๐—— ๐—ฝ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ๐˜€ ๐Ÿ”น ๐—›๐—ฎ๐—ป๐—ฑ๐˜€-๐—ผ๐—ป ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐˜„๐—ถ๐˜๐—ต ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ, ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€ (๐—”๐—ž๐—ฆ), ๐—ฎ๐—ป๐—ฑ ๐—ง๐—ฒ๐—ฟ๐—ฟ๐—ฎ๐—ณ๐—ผ๐—ฟ๐—บ ๐—ณ๐—ผ๐—ฟ ๐˜€๐—ฐ๐—ฎ๐—น๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ฑ๐—ฒ๐—ฝ๐—น๐—ผ๐˜†๐—บ๐—ฒ๐—ป๐˜๐˜€ ๐Ÿ”น ๐—ฃ๐—ฎ๐˜€๐˜€๐—ถ๐—ผ๐—ป๐—ฎ๐˜๐—ฒ ๐—ฎ๐—ฏ๐—ผ๐˜‚๐˜ ๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป, ๐˜€๐—ฒ๐—ฐ๐˜‚๐—ฟ๐—ถ๐˜๐˜†, ๐—ฎ๐—ป๐—ฑ ๐—ฐ๐—น๐—ผ๐˜‚๐—ฑ-๐—ป๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐˜๐—ฒ๐—ฐ๐—ต๐—ป๐—ผ๐—น๐—ผ๐—ด๐—ถ๐—ฒ๐˜€ ๐Ÿ›  ๐—ฆ๐—ธ๐—ถ๐—น๐—น๐˜€ & ๐—ง๐—ผ๐—ผ๐—น๐˜€ โœ… ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€ & ๐—–๐—œ/๐—–๐——: Azure DevOps, GitLab, Jenkins โœ… ๐—–๐—น๐—ผ๐˜‚๐—ฑ & ๐—œ๐—ป๐—ณ๐—ฟ๐—ฎ๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ: Azure, AWS โœ… ๐—œ๐—ป๐—ณ๐—ฟ๐—ฎ๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฎ๐˜€ ๐—–๐—ผ๐—ฑ๐—ฒ (๐—œ๐—ฎ๐—–): Terraform, Bicep โœ… ๐—–๐—ผ๐—ป๐˜๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฟ๐˜€ & ๐—ข๐—ฟ๐—ฐ๐—ต๐—ฒ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Docker, Kubernetes (AKS) โœ… ๐—–๐—ผ๐—ป๐—ณ๐—ถ๐—ด ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: PowerShell, Shell Scripting โœ… ๐— ๐—ผ๐—ป๐—ถ๐˜๐—ผ๐—ฟ๐—ถ๐—ป๐—ด & ๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†: Grafana, Prometheus, Azure Monitor โœ… ๐—ฆ๐—ฒ๐—ฐ๐˜‚๐—ฟ๐—ถ๐˜๐˜† & ๐—ก๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ๐—ถ๐—ป๐—ด: Load Balancers, Firewalls, ClusterIP โœ… ๐—ข๐—ฝ๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ๐˜€: Linux, Mac ๐Ÿ’ก ๐—ž๐—ฒ๐˜† ๐—ฆ๐˜๐—ฟ๐—ฒ๐—ป๐—ด๐˜๐—ต๐˜€ โœ” ๐—–๐—น๐—ผ๐˜‚๐—ฑ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ & ๐—”๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป โ€“ Designing and managing scalable cloud solutions โœ” ๐—–๐—œ/๐—–๐—— & ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€ ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐˜€๐—ต๐—ถ๐—ฝ โ€“ Implementing robust and automated software delivery pipelines โœ” ๐—ง๐—ฒ๐—ฎ๐—บ ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐˜€๐—ต๐—ถ๐—ฝ & ๐— ๐—ฒ๐—ป๐˜๐—ผ๐—ฟ๐˜€๐—ต๐—ถ๐—ฝ โ€“ Leading a 5-member team, fostering collaboration and growth โœ” ๐—ฆ๐—ฒ๐—ฐ๐˜‚๐—ฟ๐—ถ๐˜๐˜† & ๐—–๐—ผ๐—บ๐—ฝ๐—น๐—ถ๐—ฎ๐—ป๐—ฐ๐—ฒ โ€“ Ensuring cloud security, compliance, and best practices โœ” ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ ๐—ฆ๐—ผ๐—น๐˜ƒ๐—ถ๐—ป๐—ด & ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป โ€“ Driving efficiency through automation and DevOps practices โœ” ๐—–๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด & ๐—œ๐—ป๐—ป๐—ผ๐˜ƒ๐—ฎ๐˜๐—ถ๐—ผ๐—ป โ€“ Exploring emerging technologies and best practices