🚀 Overview

Monitoring OpenShift clusters is critical for visibility, reliability, and performance. OpenShift includes Prometheus and Thanos out-of-the-box, which can be leveraged to create a powerful multi-cluster monitoring dashboard using Grafana.

This guide walks you through step-by-step, from exposing Thanos to building a centralized Grafana dashboard.

🧱 Architecture Diagram

Cluster A         Cluster B         Cluster C
┌────────────┐    ┌────────────┐    ┌────────────┐
│ Thanos Querier│  │ Thanos Querier│  │ Thanos Querier│
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       ▼                  ▼                  ▼
   HTTPS Route A      HTTPS Route B      HTTPS Route C
        \                |                /
         \               |               /
          \              ▼              /
               🌐 External Grafana Dashboard

🔧 Prerequisites

OpenShift clusters with monitoring enabled (openshift-monitoring namespace present)
Access to each cluster via oc
External Grafana instance (VM or container)
Cluster-admin or monitoring-view access on OpenShift

✅ Step 1: Expose Thanos Querier in Each Cluster

Check if the Thanos route exists:

oc get route thanos-querier -n openshift-monitoring

If missing, create one:

oc expose svc thanos-querier -n openshift-monitoring --name=thanos-querier-route

Ensure TLS is set to reencrypt:

oc patch route thanos-querier-route -n openshift-monitoring \
  -p '{"spec":{"tls":{"termination":"reencrypt"}}}'

🔐 Step 2: Create a Service Account with Access

Create a minimal-privilege service account:

oc create sa grafana-sa -n openshift-monitoring
oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-sa

Generate token:

oc create token grafana-sa -n openshift-monitoring

Save this token — you’ll use it in Grafana.

📡 Step 3: Set Up Grafana and Add Thanos as Data Sources

In Grafana UI > Configuration > Data Sources > Add Prometheus:

Repeat for each cluster:

Name: Thanos - Cluster X
URL: https://thanos-querier-openshift-monitoring.apps.<cluster>.com
Auth: Bearer Token (paste token)
Skip TLS Verify: ✅ if using self-signed certs

📊 Step 4: Create a Dynamic Multi-Cluster Dashboard

Create a template variable:

Dashboard Settings > Variables > Add Variable:

Name: datasource
Type: Datasource
Datasource type: Prometheus

Then, in each panel:

Use ${datasource} as the data source.
Query example: up{job="kubelet"}
Add label filters like cluster, namespace, pod

Now the dashboard dynamically switches clusters from a dropdown.

🧠 Bonus: Use Label Filters for Cluster-Specific Views

Many metrics include cluster label by default. Use it in PromQL:

sum(up{job="kubelet"}) by (cluster)

You can also add namespace or workload-level filters.

🔒 Security Tips

Use dedicated grafana-sa with only cluster-monitoring-view role
Store tokens securely (e.g., in secrets manager)
Use HTTPS for all endpoints
Monitor for expired tokens

🎉 Conclusion

With Thanos and Grafana, you can monitor all OpenShift clusters from a single pane of glass. This solution is secure, scalable, and highly customizable.

✅ Follow me for more OpenShift + DevOps content.

💬 Questions? Drop them in the comments!

Centralized Monitoring for Multi-Cluster OpenShift with Prometheus, Thanos & Grafana