Building a Full Monitoring and Alerting Pipeline with Prometheus and PagerDuty


Introduction
Modern infrastructure demands proactive monitoring and instant incident response. This guide walks through integrating Prometheus (monitoring), Grafana (visualization), and PagerDuty/Slack (alerting) to create a robust system that detects issues, visualizes metrics, and notifies teams in real time.
Core Components Overview
Node Exporter: Collects system metrics (CPU, memory, disk) from Linux servers.
Prometheus: Scrapes, stores, and analyzes time-series data; triggers alerts via rules.
Grafana: Visualizes Prometheus data into dashboards.
Alertmanager: Routes Prometheus alerts to Slack and PagerDuty.
Node Exporter
Node Exporter is a Prometheus exporter for collecting hardware and OS-level metrics from Linux machines. It gathers data like:
CPU usage
Memory usage
Disk I/O
Network stats
Filesystem usage
System load
👉 In simple terms:
It’s a small agent you install on your server to expose system metrics that Prometheus can scrape and monitor.
⚙️ How Does Node Exporter Work?
1️⃣ Node Exporter runs on your server. It collects system metrics using Linux’s /proc and /sys filesystems.
2️⃣ Exposes metrics via HTTP. The metrics are available at: http://<server_ip>:9100/metrics
3️⃣ Prometheus scrapes the data. Prometheus, your monitoring tool, pulls the metrics from Node Exporter at regular intervals.
4️⃣ Visualize in Grafana. Grafana can use Prometheus data to build beautiful dashboards.
Diagram Idea:
Here’s a visual layout idea you can add:
[ Linux Server ] |
Prometheus
Prometheus is an open-source monitoring and alerting toolkit. It’s designed to collect metrics from your applications and infrastructure, store them in a time-series database, and let you query, visualize, and alert on that data.
👉 In simple terms:
*It keeps track of your systems’ health and performance over time.*
🔧 Key Features:
Time-series data: Metrics are stored with a timestamp.
Pull-based: Prometheus scrapes metrics from endpoints.
PromQL: Its powerful query language.
Built-in alerts: Trigger alerts based on metric thresholds.
Easy integration: Works great with Grafana for dashboards.
⚙️ How Does Prometheus Work?
1️⃣ Target exposes metrics: Your application (or Node Exporter, etc.) exposes data at an HTTP endpoint (e.g., /metrics).
2️⃣ Prometheus scrapes: Prometheus polls these endpoints at regular intervals and collects metrics.
3️⃣ Stores in time-series DB: All metrics are saved with labels and timestamps.
4️⃣ Query & visualize: You can run PromQL queries to explore data or use Grafana for dashboards.
5️⃣ Alerts: Prometheus can trigger alerts if metrics meet certain conditions (via Alertmanager).
🖼️ Diagram Idea:
[ Your App / Node Exporter ] |
Grafana
Grafana is an open-source visualization and analytics platform. It takes data from multiple sources (like Prometheus, MySQL, Elasticsearch) and lets you build dashboards, graphs, and charts to monitor and analyze your data in real-time.
👉 In simple words:
*Grafana is your **dashboard tool** that turns raw metrics into beautiful, interactive visualizations.*
🎯 Key Features:
Multi-source: Supports Prometheus, InfluxDB, Elasticsearch, AWS CloudWatch, and more.
Custom Dashboards: Build interactive graphs, charts, and tables.
Alerting: Set thresholds and alerts (Slack, Email, PagerDuty, etc.).
Templating: Use variables to create dynamic dashboards.
User Management: Teams & permission control.
Plugins: Add panels & data sources.
⚙️ How Does Grafana Work?
1️⃣ Connect Data Source: You first connect Grafana to a data source (e.g., Prometheus).
2️⃣ Query Data: Grafana uses the query language of that data source (e.g., PromQL for Prometheus) to fetch data.
3️⃣ Visualize: You build dashboards with panels (graphs, tables, heatmaps, etc.).
4️⃣ Set Alerts: You can add alert rules on any panel to notify you when thresholds are hit.
5️⃣ Share: Dashboards can be shared or embedded in other tools.
🖼️ Diagram Idea:
[ Prometheus / Data Sources ] |
Simple architecture:
+---------------------+
| Data Sources |
| (Prometheus, etc.) |
+---------------------+
|
v
+---------------------+
| Grafana |
| - Query Engine |
| - Dashboards |
| - Alerts |
+---------------------+
|
v
+---------------------+
| You & Your Team |
| - View & Analyze |
| - Get Alerts |
+---------------------+
Service Discovery
Service Discovery is the automatic detection of services in a network. It allows your applications (or systems like Prometheus) to find and connect to other services without manual configuration.
👉 In simple words:
*When new services (like web servers, databases) come online, Service Discovery makes sure they’re automatically found and monitored—without needing to edit config files every time!*
💡 Why Is It Important?
In dynamic environments (like Kubernetes, Docker Swarm, Cloud), services are constantly starting, stopping, scaling.
Instead of hardcoding IPs and ports, Service Discovery tracks these changes automatically.
Alertmanager
Alertmanager is a component of Prometheus that handles alerts generated by Prometheus. It manages:
Routing alerts
Silencing unnecessary alerts
Grouping similar alerts
Sending notifications via email, Slack, PagerDuty, Opsgenie, etc.
👉 In simple words:
*Prometheus watches your systems and when something goes wrong, Alertmanager is the one that tells your team via a message!*
🔧 Key Features:
Multi-channel alerts: Email, Slack, PagerDuty, Opsgenie, Webhooks, etc.
Deduplication: Combines repeated alerts into one.
Grouping: Sends related alerts together.
Silencing: Mute alerts during maintenance.
Inhibition: Suppress lower-priority alerts if higher-priority ones are firing.
⚙️ How Does Alertmanager Work?
1️⃣ Prometheus Rules: Prometheus uses alerting rules (in YAML) to define when an alert should fire.
2️⃣ Alert Sent: When a condition is met, Prometheus sends the alert to Alertmanager.
3️⃣ Alertmanager Processing:
Groups similar alerts together.
Applies silences and inhibition rules.
Decides the routing (who/where to send).
4️⃣ Notification Sent:
Alertmanager sends notifications to the configured receivers (Slack, email, etc.).
Now it’s time for configuration.
PART-1: Launch server with node exporter
Let’s proceed with launching the server using EC2:
Go to the EC2 dashboard and click on Launch EC2 Instance.
Enter an appropriate name for your instance.
Select the operating system. In my case, I am using Ubuntu (latest version 24.04).
Choose the instance type (e.g., t2.medium).
Select your .PEM key for SSH access.
Choose your VPC and the appropriate public subnet.
Enable Auto-assign Public IP.
Select the Security Group.
For storage, you can keep the default settings or increase the size as needed—it’s up to you.
In the User Data section, insert the following installation script.
#!/bin/bash |
I have inserted the user data, and before clicking on Launch Instance, I increased the number of instances from 1 to 3 because I want to launch 3 instances at the same time. After launching, I will rename the instances accordingly.
Here is my Instance
We have installed Node Exporter on all the launched servers during the server launch by inserting the script in the User Data section.
After the servers are running, copy your server’s IP address and access it in your browser using port 9100 (e.g., http://<your-server-ip>:9100)
. You will see a screen similar to the screenshot below.
⚠️ Important: Make sure to allow port 9100 in your Security Group. If this port is not allowed, the page will not load.
Click on Metrics then you will see the below matrics
Part 2: Prometheus installation and configuration
Now we will install Prometheus using the script below.
To configure Prometheus, you need to create a configuration file with this script. In my case, I will create the file in the /root directory, but you can create it anywhere you prefer.prometheus.yml
# Create Prometheus user and directories |
Now we will set executable permission for this file.
chmod +x prometheus.yml
and then run the file
./prometheus.yml |
Our installation process is now complete. Next, we will configure Prometheus.
To do this, follow the script provided below.
⚠️ Important:
Make sure to replace the DNS with your launched server’s DNS and create the Prometheus configuration file at /etc/prometheus/prometheus.yml.
global: |
Now we will configure the Prometheus service by creating a service file at /etc/systemd/system/prometheus.service
using the following script.
# Create a file called prometheus.service at /etc/systemd/system/prometheus.service
[Unit] |
Now we will run the following command to start the Prometheus service:
# Reload systemd and start Prometheus service |
Now we will check whether the Prometheus server is accessible in the browser.
Copy your server’s IP address and access it using port 9090 (e.g., http://<your-server-ip>:9090
). You should see the following screen.
Next, click on the "Status" menu, and then select "Targets." There, you will see metrics from the other servers. In this section, you can run different queries to monitor server metrics such as memory, disk, CPU, and I/O device usage.
⚠️ Note:
Prometheus runs on port 9090, so make sure this port is allowed in your Security Group; otherwise, it will not be accessible.
Part 3: Grafana Installation and configuration
Now we will install Grafana using the commands below. Please run the following commands one by one.
sudo apt update |
Now, copy your server’s IP address and access it in the browser using port 3000 (e.g., http://<your-server-ip>:3000)
. You will see the Grafana dashboard.
Log in using the default username and password (admin / admin), and make sure to change the password after logging in.
⚠️ Important:
Remember to allow the Grafana port (3000) in your Security Group; otherwise, you won’t be able to access Grafana from the browser.
By default, Grafana doesn’t know where to collect metrics from—it’s your responsibility to configure it.
Now we will configure Grafana. Since Grafana is installed on our master server, we will use http://localhost:9090 as the data source URL. However, if you installed Grafana on a separate server, you should enter that server’s IP address instead of localhost.
To configure Grafana:
From the left-hand menu, select Connections.
Then select Data Sources.
From there, choose Prometheus as the data source.
In the URL field, enter: http://localhost:9090.
Keep all other settings as default, and click on Test & Save.
This will connect Grafana to Prometheus successfully.
Now go back to the Dashboard and click on it. Select Import. We will import a custom dashboard that automatically configures metrics like server memory, RAM, CPU, I/O devices, etc.
To configure the custom dashboard, use the following command.
##################### CUSTOM DASHBOARD################### |
Sometimes you may need to create a personalized dashboard where default templates or built-in panels won’t meet your requirements. For example, if your company wants to monitor only CPU usage, you will need to create a custom dashboard.
To do this:
Go to the Dashboard section in Grafana.
Click on New Dashboard.
Then click Add Visualization.
In the query editor, write the appropriate Prometheus query to display the CPU usage (for example: node_cpu_seconds_total or a custom query based on your need).
# Total CPU Usage |
Select the code/query, insert it into the query editor, and click on Run. You will then see the dashboard update with your custom visualization as shown below.
If you want to add multiple queries, simply click on the Add Query option. You can add any command you want to visualize in your personalized dashboard.
For example, I’m going to add another query to monitor memory usage on this dashboard. Once added, it will be visualized along with the existing data.
I’m now going to run the following query:
# Memory Usage Percentage |
This way, you can build a personalized dashboard tailored to your company’s monitoring requirements.
Part -4: PagerDudy Installation and Configuration
🚀 Steps to Open a PagerDuty Account:
1️⃣ Go to the PagerDuty Website:
- Visit: https://www.pagerduty.com
2️⃣ Start Free Trial:
- Click on “Start a Free Trial” (usually located at the top-right of the homepage).
3️⃣ Fill Out the Registration Form:
Work Email: Enter your valid work email address.
Full Name: Provide your full name.
Company Name: Enter your company or organization name.
Phone Number: (Optional but recommended)
Password: Set a strong password.
4️⃣ Agree to Terms:
Accept the Terms of Service and Privacy Policy.
(Optional) You can opt in or out of marketing emails.
5️⃣ Click on “Start Free Trial” or “Sign Up”:
- PagerDuty will now create your account.
6️⃣ Email Verification:
Check your inbox for a verification email.
Click the verification link to activate your account.
7️⃣ Basic Setup:
Once logged in, PagerDuty will guide you through initial setup:
Add a Service (this is what you’ll monitor).
Set up an Escalation Policy.
Invite team members if needed.
Integrate your monitoring tool (like Prometheus, Nagios, etc.).
8️⃣ Set Up Notification Preferences:
- Go to User Settings → Notification Rules to configure how and when you’ll receive alerts (email, SMS, phone call, etc.).
When you login, you will see the following screen
Now we will Add your colleagues
After adding your colleague, they will receive an email. Ask them to click the link in the email to complete their account setup.
Now we will create a service
Keep it default
Keep as default as recommended
Here’s an important point: I will select Prometheus because if any alert is triggered in Prometheus, it will be sent to PagerDuty, and PagerDuty will then notify the user. That’s why I am choosing Prometheus as the source.
Part 5: Alert_Manager Installation and Configuration
Now I will configure Alertmanager. Since we want to send alerts from Prometheus to both PagerDuty and Slack, we will need the following:
Slack API URL,
PagerDuty API URL,
PagerDuty service key,
and the correct Slack channel name.
In the Slack App service, select Incoming Webhooks and click on Add. It will open in your browser—there, click on Add to Slack.
Next, choose the channel where you want to post alerts and click on Add Incoming Webhooks Integration. After clicking, you will receive an API URL.
Copy the API URL from there and save the settings.
PagerDuty URL:
To get the PagerDuty API key and service key, go to the PagerDuty page. Once you integrate Prometheus with PagerDuty, you will see a page like the one shown below. From there, copy the PagerDuty API URL and the service key for use in your configuration.
Now we will configure Alertmanager. Before configuring it, we need to install Alertmanager on the server using the following commands:
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz -P /tmp |
After installing Alertmanager, we will configure it by editing the following file:
/etc/alertmanager/alertmanager.yml
global: |
After configuring Alertmanager, we will now set up the alert rules.
Before applying the alert rules, we need to modify another file: prometheus.yml. As you saw earlier when I ran the configuration, I had commented out some lines in that file.
Since we are now configuring Alertmanager, we need to uncomment those lines and restart the Prometheus service first. Once that is done, we can proceed to apply the alert rules.
# Create Prometheus configuration file at /etc/prometheus/
global: |
Now we will configure the alert_rules.yml file at the following location:
/etc/prometheus/alert_rules.yml
groups: |
Next, we need to create another file to activate the Alertmanager service. The file name will be:
alertmanager.service.yml
This file should be placed in the following location:
/etc/systemd/system/alertmanager.service.yml
[Unit] |
Now that everything is configured, we will start the services using the following commands:
sudo systemctl daemon-reload |
After running Alertmanager, you will receive notifications in your Slack channel.
Now we will check whether the alert is working through Prometheus. To do this, open your browser and access:
http://<your-server-ip>:9090
Then click on the "Alerts" tab. There, you will be able to see if the alert has been triggered or not.
There are currently no alerts. So, what should we do next?
We will download a tool and run a loop to fill up the storage, which will help us observe how Prometheus and Alertmanager behave under alert conditions.
For this, I will visit the Packer website, download Packer file in two worker node and then run a loop to simulate the scenario. After that, we will monitor what happens.
Now, we will open PagerDuty and check if an incident has been created automatically. It should display the incident details, including its priority and information about what happened.
In a real-time scenario, you would select the incident and click on Acknowledge. You can also choose a specific incident and reassign it to a particular person or group as needed. This is how it work.
Conclusion
By integrating Prometheus with PagerDuty and Slack, teams gain real-time visibility into infrastructure health and streamline incident response. This setup reduces downtime and ensures critical issues are never missed.
Subscribe to my newsletter
Read articles from Subroto Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Subroto Sharma
Subroto Sharma
I'm a passionate and results-driven DevOps Engineer with hands-on experience in automating infrastructure, optimizing CI/CD pipelines, and enhancing software delivery through modern DevOps and DevSecOps practices. My expertise lies in bridging the gap between development and operations to streamline workflows, increase deployment velocity, and ensure application security at every stage of the software lifecycle. I specialize in containerization with Docker and Kubernetes, infrastructure-as-code using Terraform, and managing scalable cloud environments—primarily on AWS. I’ve worked extensively with tools like Jenkins, GitHub Actions, SonarQube, Trivy, and various monitoring/logging stacks to build secure, efficient, and resilient systems. Driven by automation and a continuous improvement mindset, I aim to deliver value faster and more reliably by integrating cutting-edge tools and practices into development pipelines.