How to Set Up Alertmanager for Monitoring Slow Response Times

James KesslerJames Kessler
4 min read

Previously we created an Axum server with a metrics endpoint. We scraped the website with Prometheus and dashboarded the request numbers in Grafana.

In this article, we explore setting up alert rules in Prometheus to monitor slow endpoints and send alerts to Alertmanager. We create a delayed response endpoint in Axum, configure Prometheus to detect slow responses, and set up an Alertmanager webhook to capture alerts. By integrating Prometheus with Alertmanager and testing alerts through delayed requests, we demonstrate how to utilize Prometheus alerts and handle them effectively using a webhook receiver. This guide includes configurations for Docker Compose, Prometheus rules, and Alertmanager settings, providing a comprehensive approach to monitoring and alerting in your applications.

Slow Endpoint

Create an endpoint in axum that waits an arbitrary period of time before returning:

async fn delay(Path(milliseconds): Path<i16>) -> Html <String> {
    {
        // execute the delay specified by the path parameter
        tokio::time::sleep(std::time::Duration::from_millis(milliseconds as u64)).await;
        // return a simple HTML response
        Html(format!("<p>Delayed for <em>{milliseconds}</em> milliseconds</p>"))
    }
}

Then add it to the axum router:

#[tokio::main]
async fn main() {
    // build our application with a route
    let app = Router::new()
        .route("/", get(handler))
        // add the new delay route
        .route("/delay/{milliseconds}", get(delay))

Now visit http://localhost:3000/delay/100 to trigger a delayed request. You should see Delayed for 100 milliseconds.

Prometheus Rules

Open the Prometheus query page in your browser (default http://localhost:9090). Use the query input to create and test an alert expression of your choice. We will build one for slow response times, but this is also a good chance to experiment with PromQL.

http_request_duration_seconds queries the request duration histogram. We’re filtering to the hello-world job and returning the 50th percentile, which returns 0 in this instance because no requests have been made.

Let’s query on responses longer than 200ms. This will not return anything until we trigger some delay requests which can be done by visiting http://localhost:3000/delay/400.

Create a new file in the root rules.yml

groups:
  - name: Delay greater than 200ms
    rules:
      - alert: DelayGreaterThan200ms
        expr: http_request_duration_seconds{job="hello-world",quantile="0.5"} > 0.2
        for: 1s

The rules file needs to be added to the volumes of the prometheus container. It can be mapped to /etc/prometheus/rules.yml

  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./rules.yml:/etc/prometheus/rules.yml
      - prometheus-data:/prometheus

Now update the prometheus.yml file to include the rules file

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.
rule_files:
  - rules.yml

Restart the docker compose file and visit the Prometheus url. The new alert should be visible, but Inactive since it is not being fired by slow endpoints.

Alertmanager

Alerting with Prometheus is divided into two parts. Alerting rules in Prometheus servers send alerts to an Alertmanager. The Alertmanager then handles these alerts, including silencing, inhibition, aggregation, and sending out notifications through methods like email, on-call notification systems, and chat platforms.

Lets configure Alertmanager to trigger a webhook receiver. Visit https://webhook.site and copy Your unique URL. Create an alertmanager.yml file at the root of your project with the webhook url:

global:
  resolve_timeout: 5m
route:
  receiver: webhook_receiver

receivers:
  - name: webhook_receiver
    # This is a webhook receiver that sends alerts to a specified URL.
    # It can be used to integrate with external systems or services.
    # create one at https://webhook.site/
    webhook_configs:
      - url: 'https://webhook.site/your-unique-webhook-id'
        send_resolved: false

Add Alertmanager to the docker-compose.yaml file:

services:
  grafana:
    image: grafana/grafana
    ports:
      - "3001:3000"
    volumes:
      - grafana-data:/var/lib/grafana
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./rules.yml:/etc/prometheus/rules.yml
      - prometheus-data:/prometheus
  alert-manager:
    image: prom/alertmanager
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
      - alertmanager-data:/alertmanager
volumes:
  grafana-data:
    external: false
  prometheus-data:
    external: false
  alertmanager-data:
    external: false

Tell Prometheus how to reach Alertmanager by adding an alerting configuration to prometheusl.yml

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.
rule_files:
  - rules.yml
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - host.docker.internal:9093

Now restart the docker compose containers. Test that the Alertmanager console is now visible at http://localhost:9093

Triggering an alert

Visit http://localhost:3000/delay/400 and reload the page a few times until the Prometheus alert state changes from Inactive to Pending and then Firing.

Alertmanager should show the alert after Prometheus has fired it.

Finally, the webhook should have been called

Full source code: https://github.com/vicero/rust-prometheus-grafana
Diff for the entire article.

Further Reading

0
Subscribe to my newsletter

Read articles from James Kessler directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

James Kessler
James Kessler

I'm a software developer with over 20 years of experience building robust, scalable systems across a range of industries.