Monitoring and Logging with Prometheus and Grafana in Kubernetes

Table of contents
- Why Prometheus and Grafana?
- Architecture Overview
- Step 1: Setting Up Prometheus in Kubernetes
- Step 2: Deploying Grafana and Creating Dashboards
- Step 3: Setting Up Alerting with Alertmanager
- Step 4: Monitoring Logs
- Step 5: Integrating with Cloud Monitoring Systems
- Security Considerations
- Tips for Scaling
- Conclusion
Introduction
Modern application environments, especially those powered by Kubernetes, demand dynamic, resilient, and observable systems. Monitoring and logging are no longer luxury add-ons; they are mission-critical for production readiness. In the Kubernetes ecosystem, Prometheus and Grafana have emerged as the de facto open-source tools for monitoring and visualization, respectively. This article walks you through deploying Prometheus and Grafana in a Kubernetes cluster, setting up alerting using Alertmanager, and integrating with existing cloud monitoring systems to build a robust observability stack.
Why Prometheus and Grafana?
Prometheus is a powerful metrics-based monitoring system originally built at SoundCloud. It scrapes time-series data from configured endpoints and provides a powerful query language, PromQL, for analysis. It fits Kubernetes like a glove, with native support for service discovery and label-based data collection.
Grafana is a leading open-source visualization tool that integrates seamlessly with Prometheus, allowing users to create rich, interactive dashboards to monitor the health of applications, nodes, and infrastructure.
Architecture Overview
Here's how the stack works:
Prometheus scrapes metrics from Kubernetes nodes, pods, services, and applications.
Alertmanager handles alerts generated by Prometheus rules.
Grafana visualizes Prometheus metrics through dashboards.
Optional integration with cloud monitoring systems (e.g., CloudWatch, Stackdriver) bridges your cloud-native and Kubernetes workloads.
Step 1: Setting Up Prometheus in Kubernetes
You can either set up Prometheus manually or use the Prometheus Operator (highly recommended).
Using the Prometheus Operator (via kube-prometheus-stack)
Install the Helm chart:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack
This installs:
Prometheus
Alertmanager
Node Exporter
kube-state-metrics
Grafana (optional but included)
Customizing Prometheus Configuration
Modify the values.yaml
or use --set
flags to tweak configurations:
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
ruleSelectorNilUsesHelmValues: false
resources:
requests:
memory: 400Mi
cpu: 200m
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Apply changes with:
helm upgrade prometheus prometheus-community/kube-prometheus-stack -f values.yaml
Step 2: Deploying Grafana and Creating Dashboards
Accessing Grafana
After deploying kube-prometheus-stack
, Grafana is typically exposed as a service:
kubectl port-forward svc/prometheus-grafana 3000:80
Visit http://localhost:3000 and log in with default credentials:
Username: admin
Password: prom-operator
Adding Dashboards
Grafana includes several out-of-the-box dashboards for Kubernetes. You can import more from Grafana Labs Dashboards:
Kubernetes Cluster Monitoring - ID:
315
Node Exporter Full - ID:
1860
Kubelet Metrics - ID:
3070
Creating a Custom Dashboard
Go to + Create > Dashboard > Add new panel
Use a PromQL query like:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (namespace)
- Customize visualizations (graph, gauge, bar, etc.)
Step 3: Setting Up Alerting with Alertmanager
Define Alerting Rules
Create a PrometheusRule custom resource:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-alerts
namespace: monitoring
spec:
groups:
- name: example.rules
rules:
- alert: HighCPUUsage
expr: sum(rate(container_cpu_usage_seconds_total{container!=""}[2m])) by (pod) > 0.5
for: 1m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "Pod {{ $labels.pod }} has high CPU usage."
Apply it:
kubectl apply -f high-cpu-rule.yaml
Configure Alertmanager
Alertmanager routes alerts via email, Slack, PagerDuty, etc.
Here’s an example alertmanager.yaml
:
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: true
api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
route:
receiver: 'slack-notifications'
group_wait: 10s
group_interval: 30s
repeat_interval: 1h
Update Alertmanager config:
kubectl edit secret alertmanager-prometheus-kube-prometheus-alertmanager
Reload or restart Alertmanager pods.
Step 4: Monitoring Logs
While Prometheus excels at metrics, it doesn’t handle logs. Pair it with tools like Loki, Fluent Bit, or Elastic Stack.
Option: Loki + Promtail + Grafana
Deploy with Helm:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki-stack
Use Grafana to query logs with LogQL:
{job="kubernetes-pods"} |= "error"
Step 5: Integrating with Cloud Monitoring Systems
You can bridge your Kubernetes monitoring with your cloud provider’s observability tools.
AWS CloudWatch Integration
Use the CloudWatch Agent or Prometheus Remote Write:
remoteWrite:
- url: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-XXXX/api/v1/remote_write
sigv4:
region: us-east-1
queue_config:
capacity: 500
max_shards: 200
min_shards: 1
Google Cloud Monitoring (formerly Stackdriver)
Use Google Cloud’s Prometheus integration:
remoteWrite:
- url: https://monitoring.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/prometheus/api/v1/write
headers:
Authorization: Bearer YOUR_ACCESS_TOKEN
Security Considerations
Restrict access to Prometheus and Grafana using Kubernetes RBAC and ingress rules.
Always change default credentials.
Use TLS/SSL on external access.
Audit access logs via Grafana’s internal logs or external SIEM tools.
Tips for Scaling
Federation: Use Prometheus federation to aggregate metrics across clusters.
Long-term storage: Integrate with Thanos or Cortex for durable storage.
Sharding: Horizontal scaling of Prometheus using Thanos/Cortex.
Conclusion
Combining Prometheus and Grafana provides a production-grade, open-source monitoring solution for Kubernetes. By leveraging service discovery, PromQL, and Alertmanager, you can build a highly observable infrastructure. Add Grafana’s visualization power and integrate logs with Loki or Elastic, and you have a complete observability platform. Further, integrating with cloud monitoring systems ensures centralized monitoring and compliance.
Start small by deploying the kube-prometheus-stack, experiment with alert rules, and customize dashboards for your team’s needs. With proper setup, you’ll not only detect issues faster but also proactively address them before they impact your users.
Further Reading & Resources
Subscribe to my newsletter
Read articles from The DevOps Dojo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
