Centralized Monitoring for Multi-Cluster OpenShift with Prometheus, Thanos & Grafana

AKSHAY SIVAKSHAY SIV
3 min read

πŸš€ Overview

Monitoring OpenShift clusters is critical for visibility, reliability, and performance. OpenShift includes Prometheus and Thanos out-of-the-box, which can be leveraged to create a powerful multi-cluster monitoring dashboard using Grafana.

This guide walks you through step-by-step, from exposing Thanos to building a centralized Grafana dashboard.


🧱 Architecture Diagram

Cluster A         Cluster B         Cluster C
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Thanos Querierβ”‚  β”‚ Thanos Querierβ”‚  β”‚ Thanos Querierβ”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β–Ό                  β–Ό                  β–Ό
   HTTPS Route A      HTTPS Route B      HTTPS Route C
        \                |                /
         \               |               /
          \              β–Ό              /
               🌐 External Grafana Dashboard

πŸ”§ Prerequisites

  • OpenShift clusters with monitoring enabled (openshift-monitoring namespace present)

  • Access to each cluster via oc

  • External Grafana instance (VM or container)

  • Cluster-admin or monitoring-view access on OpenShift


βœ… Step 1: Expose Thanos Querier in Each Cluster

Check if the Thanos route exists:

oc get route thanos-querier -n openshift-monitoring

If missing, create one:

oc expose svc thanos-querier -n openshift-monitoring --name=thanos-querier-route

Ensure TLS is set to reencrypt:

oc patch route thanos-querier-route -n openshift-monitoring \
  -p '{"spec":{"tls":{"termination":"reencrypt"}}}'

πŸ” Step 2: Create a Service Account with Access

Create a minimal-privilege service account:

oc create sa grafana-sa -n openshift-monitoring
oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-sa

Generate token:

oc create token grafana-sa -n openshift-monitoring

Save this token β€” you’ll use it in Grafana.


πŸ“‘ Step 3: Set Up Grafana and Add Thanos as Data Sources

In Grafana UI > Configuration > Data Sources > Add Prometheus:

Repeat for each cluster:

  • Name: Thanos - Cluster X

  • URL: https://thanos-querier-openshift-monitoring.apps.<cluster>.com

  • Auth: Bearer Token (paste token)

  • Skip TLS Verify: βœ… if using self-signed certs


πŸ“Š Step 4: Create a Dynamic Multi-Cluster Dashboard

Create a template variable:

Dashboard Settings > Variables > Add Variable:

Name: datasource
Type: Datasource
Datasource type: Prometheus

Then, in each panel:

  • Use ${datasource} as the data source.

  • Query example: up{job="kubelet"}

  • Add label filters like cluster, namespace, pod

Now the dashboard dynamically switches clusters from a dropdown.


🧠 Bonus: Use Label Filters for Cluster-Specific Views

Many metrics include cluster label by default. Use it in PromQL:

sum(up{job="kubelet"}) by (cluster)

You can also add namespace or workload-level filters.


πŸ”’ Security Tips

  • Use dedicated grafana-sa with only cluster-monitoring-view role

  • Store tokens securely (e.g., in secrets manager)

  • Use HTTPS for all endpoints

  • Monitor for expired tokens


πŸŽ‰ Conclusion

With Thanos and Grafana, you can monitor all OpenShift clusters from a single pane of glass. This solution is secure, scalable, and highly customizable.


βœ… Follow me for more OpenShift + DevOps content.

πŸ’¬ Questions? Drop them in the comments!

0
Subscribe to my newsletter

Read articles from AKSHAY SIV directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

AKSHAY SIV
AKSHAY SIV

πŸš€ DevOps Engineer | Cloud Enthusiast | Automation Specialist πŸ“Œ Sharing insights on DevOps best practices, infrastructure as code, and system reliability.