Building a Full Monitoring and Alerting Pipeline with Prometheus and PagerDuty

Subroto SharmaSubroto Sharma
19 min read

Introduction

Modern infrastructure demands proactive monitoring and instant incident response. This guide walks through integrating Prometheus (monitoring), Grafana (visualization), and PagerDuty/Slack (alerting) to create a robust system that detects issues, visualizes metrics, and notifies teams in real time.

Core Components Overview

  • Node Exporter: Collects system metrics (CPU, memory, disk) from Linux servers.

  • Prometheus: Scrapes, stores, and analyzes time-series data; triggers alerts via rules.

  • Grafana: Visualizes Prometheus data into dashboards.

  • Alertmanager: Routes Prometheus alerts to Slack and PagerDuty.

Node Exporter

Node Exporter is a Prometheus exporter for collecting hardware and OS-level metrics from Linux machines. It gathers data like:

  • CPU usage

  • Memory usage

  • Disk I/O

  • Network stats

  • Filesystem usage

  • System load

👉 In simple terms:
It’s a small agent you install on your server to expose system metrics that Prometheus can scrape and monitor.

⚙️ How Does Node Exporter Work?

1️⃣ Node Exporter runs on your server. It collects system metrics using Linux’s /proc and /sys filesystems.

2️⃣ Exposes metrics via HTTP. The metrics are available at: http://<server_ip>:9100/metrics

3️⃣ Prometheus scrapes the data. Prometheus, your monitoring tool, pulls the metrics from Node Exporter at regular intervals.

4️⃣ Visualize in Grafana. Grafana can use Prometheus data to build beautiful dashboards.

Diagram Idea:

Here’s a visual layout idea you can add:

[ Linux Server ]
      |
      |  (1️⃣ Collect)
[ Node Exporter ]
      |
      |  (2️⃣ Expose HTTP)
      |
[  Prometheus  ]
      |
      |  (3️⃣ Scrape)
      |
[ Time-Series DB ]
      |
      |  (4️⃣ Visualize)
      |
[ Grafana Dashboard ]

Prometheus

Prometheus is an open-source monitoring and alerting toolkit. It’s designed to collect metrics from your applications and infrastructure, store them in a time-series database, and let you query, visualize, and alert on that data.

👉 In simple terms:
*
It keeps track of your systems’ health and performance over time.*

🔧 Key Features:

  • Time-series data: Metrics are stored with a timestamp.

  • Pull-based: Prometheus scrapes metrics from endpoints.

  • PromQL: Its powerful query language.

  • Built-in alerts: Trigger alerts based on metric thresholds.

  • Easy integration: Works great with Grafana for dashboards.

⚙️ How Does Prometheus Work?

1️⃣ Target exposes metrics: Your application (or Node Exporter, etc.) exposes data at an HTTP endpoint (e.g., /metrics).

2️⃣ Prometheus scrapes: Prometheus polls these endpoints at regular intervals and collects metrics.

3️⃣ Stores in time-series DB: All metrics are saved with labels and timestamps.

4️⃣ Query & visualize: You can run PromQL queries to explore data or use Grafana for dashboards.

5️⃣ Alerts: Prometheus can trigger alerts if metrics meet certain conditions (via Alertmanager).

🖼️ Diagram Idea:

[ Your App / Node Exporter ]
          |
          | (1️⃣ Expose metrics)
          |
[ HTTP Endpoint: /metrics ]
          |
          | (2️⃣ Scrape)
          |
[ Prometheus Server ]
    |        |         |
(3️⃣ Store) (4️⃣ Query) (5️⃣ Alert)
    |                  |
[ Time-Series DB ]  [ Alertmanager ]
          |
          |
(6️⃣ Visualize)
          |
[ Grafana Dashboard ]

Grafana

Grafana is an open-source visualization and analytics platform. It takes data from multiple sources (like Prometheus, MySQL, Elasticsearch) and lets you build dashboards, graphs, and charts to monitor and analyze your data in real-time.

👉 In simple words:
*
Grafana is your **dashboard tool** that turns raw metrics into beautiful, interactive visualizations.*

🎯 Key Features:

  • Multi-source: Supports Prometheus, InfluxDB, Elasticsearch, AWS CloudWatch, and more.

  • Custom Dashboards: Build interactive graphs, charts, and tables.

  • Alerting: Set thresholds and alerts (Slack, Email, PagerDuty, etc.).

  • Templating: Use variables to create dynamic dashboards.

  • User Management: Teams & permission control.

  • Plugins: Add panels & data sources.

⚙️ How Does Grafana Work?

1️⃣ Connect Data Source: You first connect Grafana to a data source (e.g., Prometheus).

2️⃣ Query Data: Grafana uses the query language of that data source (e.g., PromQL for Prometheus) to fetch data.

3️⃣ Visualize: You build dashboards with panels (graphs, tables, heatmaps, etc.).

4️⃣ Set Alerts: You can add alert rules on any panel to notify you when thresholds are hit.

5️⃣ Share: Dashboards can be shared or embedded in other tools.

🖼️ Diagram Idea:

[ Prometheus / Data Sources ]
          |
          | (1️⃣ Connect)
          |
[ Grafana Data Source Layer ]
          |
          | (2️⃣ Query)
          |
[ Query Engine (PromQL/SQL) ]
          |
          | (3️⃣ Visualize)
          |
[ Dashboards & Panels ]
    |            |
(4️ Alerts)  (5️ Share)

Simple architecture:

+---------------------+

| Data Sources |

| (Prometheus, etc.) |

+---------------------+

|

v

+---------------------+

| Grafana |

| - Query Engine |

| - Dashboards |

| - Alerts |

+---------------------+

|

v

+---------------------+

| You & Your Team |

| - View & Analyze |

| - Get Alerts |

+---------------------+

Service Discovery

Service Discovery is the automatic detection of services in a network. It allows your applications (or systems like Prometheus) to find and connect to other services without manual configuration.

👉 In simple words:
*
When new services (like web servers, databases) come online, Service Discovery makes sure they’re automatically found and monitored—without needing to edit config files every time!*

💡 Why Is It Important?

  • In dynamic environments (like Kubernetes, Docker Swarm, Cloud), services are constantly starting, stopping, scaling.

  • Instead of hardcoding IPs and ports, Service Discovery tracks these changes automatically.

Alertmanager

Alertmanager is a component of Prometheus that handles alerts generated by Prometheus. It manages:

  • Routing alerts

  • Silencing unnecessary alerts

  • Grouping similar alerts

  • Sending notifications via email, Slack, PagerDuty, Opsgenie, etc.

👉 In simple words:
*
Prometheus watches your systems and when something goes wrong, Alertmanager is the one that tells your team via a message!*

🔧 Key Features:

  • Multi-channel alerts: Email, Slack, PagerDuty, Opsgenie, Webhooks, etc.

  • Deduplication: Combines repeated alerts into one.

  • Grouping: Sends related alerts together.

  • Silencing: Mute alerts during maintenance.

  • Inhibition: Suppress lower-priority alerts if higher-priority ones are firing.

⚙️ How Does Alertmanager Work?

1️⃣ Prometheus Rules: Prometheus uses alerting rules (in YAML) to define when an alert should fire.

2️⃣ Alert Sent: When a condition is met, Prometheus sends the alert to Alertmanager.

3️⃣ Alertmanager Processing:

  • Groups similar alerts together.

  • Applies silences and inhibition rules.

  • Decides the routing (who/where to send).

4️⃣ Notification Sent:
Alertmanager sends notifications to the configured receivers (Slack, email, etc.).

Now it’s time for configuration.

PART-1: Launch server with node exporter

Let’s proceed with launching the server using EC2:

  1. Go to the EC2 dashboard and click on Launch EC2 Instance.

  2. Enter an appropriate name for your instance.

  3. Select the operating system. In my case, I am using Ubuntu (latest version 24.04).

  4. Choose the instance type (e.g., t2.medium).

  5. Select your .PEM key for SSH access.

  6. Choose your VPC and the appropriate public subnet.

  7. Enable Auto-assign Public IP.

  8. Select the Security Group.

  9. For storage, you can keep the default settings or increase the size as needed—it’s up to you.

  10. In the User Data section, insert the following installation script.

#!/bin/bash

# Create node_exporter user
apt update
sudo useradd --no-create-home --shell /bin/false node_exporter

# Download node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz -P /tmp

# Extract the downloaded tarball
cd /tmp
tar xvf node_exporter-1.3.1.linux-amd64.tar.gz

# Move the binary to /usr/local/bin
sudo cp /tmp/node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

# Create a systemd service file for node_exporter
sudo bash -c 'cat <<EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF'

# Reload systemd, start and enable node_exporter
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

# Verify the service is running
sudo systemctl status node_exporter --no-pager

I have inserted the user data, and before clicking on Launch Instance, I increased the number of instances from 1 to 3 because I want to launch 3 instances at the same time. After launching, I will rename the instances accordingly.

Here is my Instance

We have installed Node Exporter on all the launched servers during the server launch by inserting the script in the User Data section.

After the servers are running, copy your server’s IP address and access it in your browser using port 9100 (e.g., http://<your-server-ip>:9100). You will see a screen similar to the screenshot below.

⚠️ Important: Make sure to allow port 9100 in your Security Group. If this port is not allowed, the page will not load.

Click on Metrics then you will see the below matrics

Part 2: Prometheus installation and configuration

Now we will install Prometheus using the script below.

To configure Prometheus, you need to create a configuration file with this script. In my case, I will create the file in the /root directory, but you can create it anywhere you prefer.prometheus.yml

# Create Prometheus user and directories
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.36.2/prometheus-2.36.2.linux-amd64.tar.gz -P /tmp

# Extract the archive
cd /tmp
tar xvf prometheus-2.36.2.linux-amd64.tar.gz

# Move binaries to /usr/local/bin and set ownership
sudo cp /tmp/prometheus-2.36.2.linux-amd64/prometheus /usr/local/bin/
sudo cp /tmp/prometheus-2.36.2.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

# Move consoles and console libraries to /etc/prometheus and set ownership
sudo cp -r /tmp/prometheus-2.36.2.linux-amd64/consoles /etc/prometheus/
sudo cp -r /tmp/prometheus-2.36.2.linux-amd64/console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries

Now we will set executable permission for this file.

chmod +x prometheus.yml and then run the file

./prometheus.yml

Our installation process is now complete. Next, we will configure Prometheus.

To do this, follow the script provided below.

⚠️ Important: Make sure to replace the DNS with your launched server’s DNS and create the Prometheus configuration file at /etc/prometheus/prometheus.yml.

global:
  scrape_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['ec2-34-235-127-49.compute-1.amazonaws.com:9093']

rule_files:
  - alert_rules.yml

scrape_configs:
  - job_name: 'Prometheus-Grafana'
    static_configs:
      - targets: ['ec2-34-235-127-49.compute-1.amazonaws.com:9090'] # Prometheus server

  - job_name: 'node-exporter-server-1'
    static_configs:
      - targets: ['ec2-34-235-127-49.compute-1.amazonaws.com:9100'] # Prometheus-Grafana-NodeExporter

  - job_name: 'node-exporter-server-2'
    static_configs:
      - targets: ['ec2-18-212-75-197.compute-1.amazonaws.com:9100'] # Node Exporter on server 1
 
  - job_name: 'node-exporter-server-3'
    static_configs:
      - targets: ['ec2-54-89-190-75.compute-1.amazonaws.com:9100'] # Node Exporter on server 2

  - job_name: 'All_Servers'
    static_configs:
      - targets:
          - 'ec2-34-235-127-49.compute-1.amazonaws.com:9100'
          - 'ec2-18-212-75-197.compute-1.amazonaws.com:9100'
          - 'ec2-54-89-190-75.compute-1.amazonaws.com:9100' # Node Exporter all servers

Now we will configure the Prometheus service by creating a service file at /etc/systemd/system/prometheus.service using the following script.

# Create a file called prometheus.service at /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.enable-lifecycle \
--web.enable-admin-api \
--log.level=info

[Install]
WantedBy=multi-user.target

Now we will run the following command to start the Prometheus service:

# Reload systemd and start Prometheus service
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus

# Verify Prometheus is running
sudo systemctl status prometheus

# Check Prometheus binaries
ls -ltr /usr/local/bin/ | grep prom

curl -s -XPOST localhost:9090/-/reload

Now we will check whether the Prometheus server is accessible in the browser.

Copy your server’s IP address and access it using port 9090 (e.g., http://<your-server-ip>:9090). You should see the following screen.

Next, click on the "Status" menu, and then select "Targets." There, you will see metrics from the other servers. In this section, you can run different queries to monitor server metrics such as memory, disk, CPU, and I/O device usage.

⚠️ Note: Prometheus runs on port 9090, so make sure this port is allowed in your Security Group; otherwise, it will not be accessible.

Part 3: Grafana Installation and configuration

Now we will install Grafana using the commands below. Please run the following commands one by one.

sudo apt update
sudo apt-get install -y gnupg2 curl software-properties-common
curl https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt update


# #You might get public key error when you perform above command. We can resolve this by running below comamnd with the key given in the error

# sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 8C8C34C524098CB6

sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"

sudo apt-get update

sudo apt-get -y install grafana

sudo systemctl enable --now grafana-server

sudo /bin/systemctl start grafana-server

sudo /bin/systemctl status grafana-server --no-pager

##################### CUSTOM DASHBOARD###################
https://grafana.com/grafana/dashboards/10180

Now, copy your server’s IP address and access it in the browser using port 3000 (e.g., http://<your-server-ip>:3000). You will see the Grafana dashboard.

Log in using the default username and password (admin / admin), and make sure to change the password after logging in.

⚠️ Important: Remember to allow the Grafana port (3000) in your Security Group; otherwise, you won’t be able to access Grafana from the browser.

By default, Grafana doesn’t know where to collect metrics from—it’s your responsibility to configure it.

Now we will configure Grafana. Since Grafana is installed on our master server, we will use http://localhost:9090 as the data source URL. However, if you installed Grafana on a separate server, you should enter that server’s IP address instead of localhost.

To configure Grafana:

  1. From the left-hand menu, select Connections.

  2. Then select Data Sources.

  3. From there, choose Prometheus as the data source.

  4. In the URL field, enter: http://localhost:9090.

  5. Keep all other settings as default, and click on Test & Save.

This will connect Grafana to Prometheus successfully.

Now go back to the Dashboard and click on it. Select Import. We will import a custom dashboard that automatically configures metrics like server memory, RAM, CPU, I/O devices, etc.

To configure the custom dashboard, use the following command.

##################### CUSTOM DASHBOARD###################
https://grafana.com/grafana/dashboards/10180

Sometimes you may need to create a personalized dashboard where default templates or built-in panels won’t meet your requirements. For example, if your company wants to monitor only CPU usage, you will need to create a custom dashboard.

To do this:

  1. Go to the Dashboard section in Grafana.

  2. Click on New Dashboard.

  3. Then click Add Visualization.

  4. In the query editor, write the appropriate Prometheus query to display the CPU usage (for example: node_cpu_seconds_total or a custom query based on your need).

# Total CPU Usage
# This query calculates the total CPU usage by subtracting the idle CPU time from 100%.
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Select the code/query, insert it into the query editor, and click on Run. You will then see the dashboard update with your custom visualization as shown below.

If you want to add multiple queries, simply click on the Add Query option. You can add any command you want to visualize in your personalized dashboard.

For example, I’m going to add another query to monitor memory usage on this dashboard. Once added, it will be visualized along with the existing data.

I’m now going to run the following query:

# Memory Usage Percentage
# This query shows the percentage of memory used.
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

This way, you can build a personalized dashboard tailored to your company’s monitoring requirements.

Part -4: PagerDudy Installation and Configuration

🚀 Steps to Open a PagerDuty Account:

1️⃣ Go to the PagerDuty Website:

2️⃣ Start Free Trial:

  • Click on “Start a Free Trial” (usually located at the top-right of the homepage).

3️⃣ Fill Out the Registration Form:

  • Work Email: Enter your valid work email address.

  • Full Name: Provide your full name.

  • Company Name: Enter your company or organization name.

  • Phone Number: (Optional but recommended)

  • Password: Set a strong password.

4️⃣ Agree to Terms:

  • Accept the Terms of Service and Privacy Policy.

  • (Optional) You can opt in or out of marketing emails.

5️⃣ Click on “Start Free Trial” or “Sign Up”:

  • PagerDuty will now create your account.

6️⃣ Email Verification:

  • Check your inbox for a verification email.

  • Click the verification link to activate your account.

7️⃣ Basic Setup:

  • Once logged in, PagerDuty will guide you through initial setup:

    • Add a Service (this is what you’ll monitor).

    • Set up an Escalation Policy.

    • Invite team members if needed.

    • Integrate your monitoring tool (like Prometheus, Nagios, etc.).

8️⃣ Set Up Notification Preferences:

  • Go to User Settings → Notification Rules to configure how and when you’ll receive alerts (email, SMS, phone call, etc.).

When you login, you will see the following screen

Now we will Add your colleagues

After adding your colleague, they will receive an email. Ask them to click the link in the email to complete their account setup.

Now we will create a service

Keep it default

Keep as default as recommended

Here’s an important point: I will select Prometheus because if any alert is triggered in Prometheus, it will be sent to PagerDuty, and PagerDuty will then notify the user. That’s why I am choosing Prometheus as the source.

Part 5: Alert_Manager Installation and Configuration

Now I will configure Alertmanager. Since we want to send alerts from Prometheus to both PagerDuty and Slack, we will need the following:

  • Slack API URL,

  • PagerDuty API URL,

  • PagerDuty service key,

  • and the correct Slack channel name.

In the Slack App service, select Incoming Webhooks and click on Add. It will open in your browser—there, click on Add to Slack.

Next, choose the channel where you want to post alerts and click on Add Incoming Webhooks Integration. After clicking, you will receive an API URL.

Copy the API URL from there and save the settings.

PagerDuty URL:

To get the PagerDuty API key and service key, go to the PagerDuty page. Once you integrate Prometheus with PagerDuty, you will see a page like the one shown below. From there, copy the PagerDuty API URL and the service key for use in your configuration.

Now we will configure Alertmanager. Before configuring it, we need to install Alertmanager on the server using the following commands:

wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz -P /tmp
cd /tmp
tar -xvzf alertmanager-0.24.0.linux-amd64.tar.gz
mv alertmanager-0.24.0.linux-amd64/alertmanager /usr/local/bin
mkdir /etc/alertmanager/
sudo chown prometheus:prometheus /etc/alertmanager/

After installing Alertmanager, we will configure it by editing the following file:

/etc/alertmanager/alertmanager.yml

global:
    #CHANGE-PAGER-DUTY-URL && SLACKURL HERE-HERE

  slack_api_url: 'https://hooks.slack.com/services/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'
  pagerduty_url: 'https://events.pagerduty.com/generic/2010-04-15/create_event.json'

route:
  receiver: 'pagerduty-notifications'
  group_by: ['alertname','instance','severity']


  routes:
    - receiver: "pagerduty-notifications"
      group_wait: 10s
      match_re:
        severity: critical|warning
      continue: true

    - receiver: "slack-notifications"
      group_wait: 10s
      match_re:
        severity: critical|warning
      continue: true


receivers:
- name: 'pagerduty-notifications'
  pagerduty_configs:
  #ENTER SERVICE KEY FROM PAGER DUTY HERE
  - service_key: beca6e0c1371420cd0fd713f1d873f17
    send_resolved: true


- name: 'slack-notifications'
  slack_configs:
  # CHANGE THE CHANNEL NAME
  - channel: '#aws-devops-kubernetes'
    send_resolved: true
    icon_url: https://avatars3.githubusercontent.com/u/3380462
    title: |-
    [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.instance }}
    {{- if gt (len .CommonLabels) (len .GroupLabels) -}}
      {{" "}}(
      {{- with .CommonLabels.Remove .GroupLabels.Names }}
        {{- range $index, $label := .SortedPairs -}}
          {{ if $index }}, {{ end }}
          {{- $label.Name }}="{{ $label.Value -}}"
        {{- end }}
      {{- end -}}
      )
    {{- end }}
    text: >-
    {{ range .Alerts -}}
    Alert: {{ .Annotations.title }}{{ if .Labels.severity }} -
{{ .Labels.severity }}{{ end }}

    Description: {{ .Annotations.description }}

    Details:
      {{ range .Labels.SortedPairs }} • {{ .Name }}:
{{ .Value }}
      {{ end }}
    {{ end }}

After configuring Alertmanager, we will now set up the alert rules.

Before applying the alert rules, we need to modify another file: prometheus.yml. As you saw earlier when I ran the configuration, I had commented out some lines in that file.

Since we are now configuring Alertmanager, we need to uncomment those lines and restart the Prometheus service first. Once that is done, we can proceed to apply the alert rules.

# Create Prometheus configuration file at /etc/prometheus/

global:
  scrape_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['ec2-34-235-127-49.compute-1.amazonaws.com:9093']

rule_files:
  - alert_rules.yml

scrape_configs:
  - job_name: 'Prometheus-Grafana'
    static_configs:
      - targets: ['ec2-34-235-127-49.compute-1.amazonaws.com:9090'] # Prometheus server

  - job_name: 'node-exporter-server-1'
    static_configs:
      - targets: ['ec2-34-235-127-49.compute-1.amazonaws.com:9100'] # Prometheus-Grafana-NodeExporter

  - job_name: 'node-exporter-server-2'
    static_configs:
      - targets: ['ec2-18-212-75-197.compute-1.amazonaws.com:9100'] # Node Exporter on server 1
 
  - job_name: 'node-exporter-server-3'
    static_configs:
      - targets: ['ec2-54-89-190-75.compute-1.amazonaws.com:9100'] # Node Exporter on server 2

  - job_name: 'All_Servers'
    static_configs:
      - targets:
          - 'ec2-34-235-127-49.compute-1.amazonaws.com:9100'
          - 'ec2-18-212-75-197.compute-1.amazonaws.com:9100'
          - 'ec2-54-89-190-75.compute-1.amazonaws.com:9100' # Node Exporter all servers

Now we will configure the alert_rules.yml file at the following location:

/etc/prometheus/alert_rules.yml

groups:
- name: "All_Servers"
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 30s
    labels:
      severity: critical
  - alert: Memory Less Than 80%
    expr: node_memory_MemAvailable_bytes{job="All_Servers"} / node_memory_MemTotal_bytes{batch="b36",env="prod"} 100 < 80
    for: 1m
    labels:
      severity: warning
  - alert: CPU At 100%
    expr: (100
max(1 - rate(node_cpu_seconds_total{job="AppServers",mode="idle"}[1m])) by (instance)) >= 90
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: Host high CPU load is 100% (instance {{ $labels.instance }})
      description: "CPU load is at 100%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: Disk Utilization More Than 50%
    expr: (node_filesystem_avail_bytes{job="All_Servers"} * 100) / node_filesystem_size_bytes{job="All_Servers"} < 50 and ON (instance, device, mountpoint) node_filesystem_readonly{job="All_Servers"} == 0
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: Host out of disk space For Instance (instance {{ $labels.instance }})
      description: "Disk is almost full (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Next, we need to create another file to activate the Alertmanager service. The file name will be:

alertmanager.service.yml

This file should be placed in the following location:

/etc/systemd/system/alertmanager.service.yml

[Unit]
Description=Alertmanager for Prometheus

[Service]
Restart=always
User=prometheus
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/etc/alertmanager/
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
Type=simple

[Install]
WantedBy=multi-user.target

Now that everything is configured, we will start the services using the following commands:

sudo systemctl daemon-reload
sudo systemctl start alertmanager.service
sudo systemctl enable alertmanager.service
sudo systemctl status alertmanager.service --no-pager



sudo systemctl restart alertmanager.service
sudo systemctl status alertmanager.service --no-pager


sudo systemctl restart prometheus
sudo /bin/systemctl restart grafana-server
sudo systemctl restart alertmanager.service
curl -s -XPOST localhost:9090/-/reload

After running Alertmanager, you will receive notifications in your Slack channel.

Now we will check whether the alert is working through Prometheus. To do this, open your browser and access:

http://<your-server-ip>:9090

Then click on the "Alerts" tab. There, you will be able to see if the alert has been triggered or not.

There are currently no alerts. So, what should we do next?

We will download a tool and run a loop to fill up the storage, which will help us observe how Prometheus and Alertmanager behave under alert conditions.

For this, I will visit the Packer website, download Packer file in two worker node and then run a loop to simulate the scenario. After that, we will monitor what happens.

Now, we will open PagerDuty and check if an incident has been created automatically. It should display the incident details, including its priority and information about what happened.

In a real-time scenario, you would select the incident and click on Acknowledge. You can also choose a specific incident and reassign it to a particular person or group as needed. This is how it work.

Conclusion

By integrating Prometheus with PagerDuty and Slack, teams gain real-time visibility into infrastructure health and streamline incident response. This setup reduces downtime and ensures critical issues are never missed.

0
Subscribe to my newsletter

Read articles from Subroto Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Subroto Sharma
Subroto Sharma

I'm a passionate and results-driven DevOps Engineer with hands-on experience in automating infrastructure, optimizing CI/CD pipelines, and enhancing software delivery through modern DevOps and DevSecOps practices. My expertise lies in bridging the gap between development and operations to streamline workflows, increase deployment velocity, and ensure application security at every stage of the software lifecycle. I specialize in containerization with Docker and Kubernetes, infrastructure-as-code using Terraform, and managing scalable cloud environments—primarily on AWS. I’ve worked extensively with tools like Jenkins, GitHub Actions, SonarQube, Trivy, and various monitoring/logging stacks to build secure, efficient, and resilient systems. Driven by automation and a continuous improvement mindset, I aim to deliver value faster and more reliably by integrating cutting-edge tools and practices into development pipelines.