Introduction

Modern application environments, especially those powered by Kubernetes, demand dynamic, resilient, and observable systems. Monitoring and logging are no longer luxury add-ons; they are mission-critical for production readiness. In the Kubernetes ecosystem, Prometheus and Grafana have emerged as the de facto open-source tools for monitoring and visualization, respectively. This article walks you through deploying Prometheus and Grafana in a Kubernetes cluster, setting up alerting using Alertmanager, and integrating with existing cloud monitoring systems to build a robust observability stack.

Why Prometheus and Grafana?

Prometheus is a powerful metrics-based monitoring system originally built at SoundCloud. It scrapes time-series data from configured endpoints and provides a powerful query language, PromQL, for analysis. It fits Kubernetes like a glove, with native support for service discovery and label-based data collection.

Grafana is a leading open-source visualization tool that integrates seamlessly with Prometheus, allowing users to create rich, interactive dashboards to monitor the health of applications, nodes, and infrastructure.

Architecture Overview

Here's how the stack works:

Prometheus scrapes metrics from Kubernetes nodes, pods, services, and applications.
Alertmanager handles alerts generated by Prometheus rules.
Grafana visualizes Prometheus metrics through dashboards.
Optional integration with cloud monitoring systems (e.g., CloudWatch, Stackdriver) bridges your cloud-native and Kubernetes workloads.

Step 1: Setting Up Prometheus in Kubernetes

You can either set up Prometheus manually or use the Prometheus Operator (highly recommended).

Using the Prometheus Operator (via kube-prometheus-stack)

Install the Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack

This installs:

Prometheus
Alertmanager
Node Exporter
kube-state-metrics
Grafana (optional but included)

Customizing Prometheus Configuration

Modify the values.yaml or use --set flags to tweak configurations:

prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    resources:
      requests:
        memory: 400Mi
        cpu: 200m
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

Apply changes with:

helm upgrade prometheus prometheus-community/kube-prometheus-stack -f values.yaml

Step 2: Deploying Grafana and Creating Dashboards

Accessing Grafana

After deploying kube-prometheus-stack, Grafana is typically exposed as a service:

kubectl port-forward svc/prometheus-grafana 3000:80

Visit http://localhost:3000 and log in with default credentials:

Username: admin
Password: prom-operator

Adding Dashboards

Grafana includes several out-of-the-box dashboards for Kubernetes. You can import more from Grafana Labs Dashboards:

Kubernetes Cluster Monitoring - ID: 315
Node Exporter Full - ID: 1860
Kubelet Metrics - ID: 3070

Creating a Custom Dashboard

Go to + Create > Dashboard > Add new panel
Use a PromQL query like:

sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (namespace)

Customize visualizations (graph, gauge, bar, etc.)

Step 3: Setting Up Alerting with Alertmanager

Define Alerting Rules

Create a PrometheusRule custom resource:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-alerts
  namespace: monitoring
spec:
  groups:
  - name: example.rules
    rules:
    - alert: HighCPUUsage
      expr: sum(rate(container_cpu_usage_seconds_total{container!=""}[2m])) by (pod) > 0.5
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "Pod {{ $labels.pod }} has high CPU usage."

Apply it:

kubectl apply -f high-cpu-rule.yaml

Configure Alertmanager

Alertmanager routes alerts via email, Slack, PagerDuty, etc.

Here’s an example alertmanager.yaml:

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts'
        send_resolved: true
        api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
route:
  receiver: 'slack-notifications'
  group_wait: 10s
  group_interval: 30s
  repeat_interval: 1h

Update Alertmanager config:

kubectl edit secret alertmanager-prometheus-kube-prometheus-alertmanager

Reload or restart Alertmanager pods.

Step 4: Monitoring Logs

While Prometheus excels at metrics, it doesn’t handle logs. Pair it with tools like Loki, Fluent Bit, or Elastic Stack.

Option: Loki + Promtail + Grafana

Deploy with Helm:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki-stack

Use Grafana to query logs with LogQL:

{job="kubernetes-pods"} |= "error"

Step 5: Integrating with Cloud Monitoring Systems

You can bridge your Kubernetes monitoring with your cloud provider’s observability tools.

AWS CloudWatch Integration

Use the CloudWatch Agent or Prometheus Remote Write:

remoteWrite:
  - url: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-XXXX/api/v1/remote_write
    sigv4:
      region: us-east-1
    queue_config:
      capacity: 500
      max_shards: 200
      min_shards: 1

Google Cloud Monitoring (formerly Stackdriver)

Use Google Cloud’s Prometheus integration:

remoteWrite:
  - url: https://monitoring.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/prometheus/api/v1/write
    headers:
      Authorization: Bearer YOUR_ACCESS_TOKEN

Security Considerations

Restrict access to Prometheus and Grafana using Kubernetes RBAC and ingress rules.
Always change default credentials.
Use TLS/SSL on external access.
Audit access logs via Grafana’s internal logs or external SIEM tools.

Tips for Scaling

Federation: Use Prometheus federation to aggregate metrics across clusters.
Long-term storage: Integrate with Thanos or Cortex for durable storage.
Sharding: Horizontal scaling of Prometheus using Thanos/Cortex.

Conclusion

Combining Prometheus and Grafana provides a production-grade, open-source monitoring solution for Kubernetes. By leveraging service discovery, PromQL, and Alertmanager, you can build a highly observable infrastructure. Add Grafana’s visualization power and integrate logs with Loki or Elastic, and you have a complete observability platform. Further, integrating with cloud monitoring systems ensures centralized monitoring and compliance.

Start small by deploying the kube-prometheus-stack, experiment with alert rules, and customize dashboards for your team’s needs. With proper setup, you’ll not only detect issues faster but also proactively address them before they impact your users.

Further Reading & Resources

Monitoring and Logging with Prometheus and Grafana in Kubernetes

Table of contents