Monitor with Google Cloud & Prometheus

Introduction: The Power of Golden Signals at the Edge

It’s late Friday afternoon—the kind of time you really don’t want something to go wrong—when the reports start coming in: the mobile app is painfully slow, eventually showing a generic “something went wrong” message.

I pull up the dashboards, and the problem is instantly visible at the edge: the API Gateway is returning a surge of 500 errors, with noticeable spikes in latency. It’s clear the issue is affecting all users.

Digging into the microservices, I find that one of the internal services—critical to processing user requests—has become extremely slow to respond. The upstream services that depend on it begin timing out and cancelling their HTTP requests, which in turn triggers error responses all the way up to the API Gateway. The result? A perfect storm of 5xx errors visible to every client.

The root cause is simple: the service is at capacity, and needs to be scaled up. Once I increase the replicas and give it some headroom, the latency drops, the errors disappear, and the app goes back to normal—with no issues throughout the weekend.

That experience reinforces something essential: problems at the edge are often the earliest and clearest indicators that something is wrong. If you’re only monitoring what’s happening deep inside your applications, you’re reacting too late.

That’s where the Golden Signals come in:

Latency – how long it takes to respond
Traffic – how much is going through
Errors – how many requests are failing
Saturation – how full or constrained your system is

These signals, when monitored at the HTTP Load Balancer or API Gateway level, give you instant visibility into how your system behaves from the user’s perspective. They also help quickly localize root causes and spot patterns—whether it’s a misbehaving deployment, a resource bottleneck, or an external abuse attempt.

In this post, I’ll walk through how we combined Google Cloud’s built-in metrics—like those from the HTTP Load Balancer—with Google Managed Prometheus (GMP) for a complete observability setup in GKE. You’ll see how to expose and scrape custom application metrics using GMP, while relying on GCP’s free, automatically collected signals for critical edge-level visibility. Together, these tools offer a powerful foundation for understanding performance issues, improving reliability, and even catching early signs of security problems—without over-complicating your monitoring stack or blowing your budget.

Why Google Managed Prometheus?

Prometheus is a fantastic tool—widely adopted, open-source, and supported by a rich ecosystem of exporters. But anyone who’s run it at scale knows: it starts simple, and gets complex fast.

In self-managed environments, you often end up with:

Multiple Prometheus instances across clusters
Manual federation or setup of Thanos for long-term storage and global querying
Separate Grafana stacks tied to each environment
Exposing /metrics endpoints across VPCs, dealing with firewall rules, TLS configs, IAM, and so on

Sure, self-managed Prometheus is flexible—you control retention, scrape frequency, and storage costs. If you're running in a cost-sensitive setup, you can scrape more aggressively and store locally to avoid egress or ingestion fees. But the operational overhead adds up quickly, especially with:

Private networks (harder to expose scrape targets)
Multi-cluster environments
Security concerns (do you really want a public metrics endpoint open to the world?)

That’s where Google Managed Prometheus (GMP) shines.

With GMP:

You deploy a lightweight collector in GKE, and it automatically pushes metrics to the backend over Google’s internal APIs—no need to expose your app’s /metrics endpoints or open up firewall rules.
You get Prometheus compatibility with PodMonitoring, ClusterPodMonitoring, and existing exporters.
All the hard parts—storage, scaling, retention, querying—are fully managed.
You can query across regions and clusters from a single Grafana or Cloud Monitoring dashboard.
And you don’t need to worry about running or patching a Prometheus server, ever.

💰 How GMP Pricing Works

Google Managed Prometheus is billed based on the number of ingested samples. https://cloud.google.com/stackdriver/pricing

As of writing, the pricing is:

A sample is basically a “row” in a /metrics endpoint. ⚠️ Histogram or distribution metrics can generate many time series per label combination*—watch out for costs with these.*

Example: Scraping two simple gauge metrics every minute:

2 samples × 60 (minutes per hour) × 8760 (hours per year) = 1,051,200 samples/year

At $0.06 per million samples, that’s:

1.05 million samples/year × $0.06 = ~$0.06/year

💡 Quick tip: Scraping every 30s is twice more expensive than scraping every minute.

Is it more expensive than running your own stack on a per-metric basis? Sometimes, yes. But in exchange, you save a huge amount of operational effort—and that’s not even counting the security benefits of keeping your metrics in GCP’s managed service. And if you're scraping smartly—only the metrics you care about, at a resolution that matches your use case—GMP can be surprisingly cost-efficient. For many teams, the trade-off is worth it: less infrastructure to maintain, fewer moving parts, and faster time to insight.

Architecture Overview and Practical Setup

🏗️ How Google Managed Prometheus Fits In

Let’s look at the overall architecture (👇 see diagram) to understand where GMP sits and how it connects:

Built-in GCP metrics (e.g., load balancer, VM, GKE, bucket stats) flow directly into Monarch, Google’s global time-series database.
Custom application metrics (from /metrics endpoints in your pods) are scraped by the GMP write proxy, which pushes them to Monarch over Google's internal network.
Both sets of data—built-in and custom—can be queried:
- via GCP’s Metrics Explorer, Dashboards, and Alert Policies
- or through Grafana, using the GMP read proxy, with support for PromQL.

👉 This design lets you centralize visibility across built-in and custom metrics without exposing anything over the public internet.

🧭 Scoping Projects and Multi-Project Monitoring

If you're working in an organization with multiple GCP projects (per environment, per team, etc.), it's a good idea to designate a scoping project—a central Monitoring Workspace that can access metrics from multiple child projects.

This is just a regular GCP project where, in the Cloud Console → Monitoring, you configure the list of child projects it should observe. Once that's set up, the scoping project will be able to read metrics from all configured children.

You can then deploy your GMP read proxy in this project and use it as the data source in Grafana. That gives you centralized access to metrics from all environments, allowing you to build cross-project dashboards and alerts without needing to deploy Prometheus or Grafana everywhere.

This setup avoids the complexity of maintaining separate monitoring stacks per project or cluster—and gives platform teams a unified view of the entire system.

🔍 Quick tip: To query GCP’s built-in metrics with PromQL, check out the official mapping reference:
👉 GCP PromQL Mapping Guide

⚙️ Controlling What You Scrape (and What You Pay For)

By default, Prometheus can scrape a lot of metrics—many of which you don’t care about. That’s why it’s smart to use the OperatorConfig resource to control:

Which metrics are ingested
At what interval
Whether to scrape the kubelet, system components, etc.

Here’s an example:

apiVersion: monitoring.googleapis.com/v1
kind: OperatorConfig
metadata:
  name: config
  namespace: gmp-public
spec:
  collection:
    filter:
      matchOneOf:
        - '{__name__="http_requests_total"}'
        - '{__name__="container_cpu_usage_seconds_total"}'
    kubeletScraping:
      interval: 180s
  features:
    targetStatus:
      enabled: true

This helps you stay under budget by reducing unnecessary ingestion and maintain better control of your observability footprint

🧵 PodMonitoring vs ClusterPodMonitoring

When defining targets to scrape, you have two options:

PodMonitoring is namespace-scoped. It’s great when teams define their own monitors in their own namespaces.
ClusterPodMonitoring is cluster-wide, and it's especially useful for platform teams who manage observability centrally. For example, it’s ideal when you’ve standardized microservices—you can scrape all pods across all namespaces that match a common label set (e.g., app=api, team=platform) without needing to define a PodMonitoring in each namespace.

Here's a simple PodMonitoring example:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: kube-state-metrics
  namespace: gmp-public
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  endpoints:
    - port: http
      interval: 180s

Choose based on your team model—PodMonitoring for autonomy, ClusterPodMonitoring for central control.

🧭 Conclusion

Monitoring isn’t just about uptime—it’s about understanding how your systems behave, where they break, and how quickly you can react. That visibility starts at the edge: with golden signals, built-in GCP metrics, and a clear view of what your users are experiencing.

By combining Google Cloud’s built-in metrics with Google Managed Prometheus, you can build a monitoring stack that’s both deep and wide—covering everything from infrastructure to application-level insights. You get Prometheus compatibility without the operational headache, and you avoid duplicating Grafana/Prometheus instances across every cluster or project.

When you layer in good practices—like using a scoping project, filtering metrics with OperatorConfig, and choosing the right PodMonitoring or ClusterPodMonitoring approach—you get cost-effective, secure, and scalable observability.

The end goal? Fewer Friday surprises, faster root cause detection, smoother weekends and it helps surface unexpected behavior early—whether it’s a failing deploy or suspicious activity.

Golden Signals with Google Cloud: Built-In Metrics + Managed Prometheus

Table of contents