Build a Grafana dashboard

Vanshika SharmaVanshika Sharma
8 min read

Task 01:

  • In the sidebar, hover your cursor over the Create (plus sign) icon and click Dashboard.

  • Click Add a new panel.

  • In the Query editor below the graph, enter the query from earlier and then press Shift + Enter:

sum(rate(tns_request_duration_seconds_count[5m])) by(route)

  • In the Legend field, enter {{route}} to rename the time series in the legend. The graph legend updates when you click outside the field.

  • In the Panel editor on the right, under Settings, change the panel title to “Traffic”.

  • Click Apply in the top-right corner to save the panel and return to the dashboard view.

  • Click the Save dashboard (disk) icon at the top of the dashboard to save your dashboard.

  • Enter a name in the Dashboard name field and then click Save.

Here’s a clear breakdown of Task 01 to guide you through building your first Grafana dashboard panel using the provided query:


Task 01: Create a Traffic Panel in Grafana

1. Create a New Dashboard

  • On the left sidebar, hover over the Create icon (➕).

  • Click Dashboard.

2. Add a New Panel

  • Click Add a new panel in the center.

3. Enter the Query

In the Query editor (below the graph area):

  • Paste this query:

      sum(rate(tns_request_duration_seconds_count[5m])) by(route)
    
  • Press Shift + Enter to run the query.

4. Customize the Legend

  • In the Legend field (below the query), enter:

      {{route}}
    
  • Click outside the field to update the graph legend.

5. Change Panel Title

  • On the right, in the Panel editor > Settings section:

    • Change the Panel title to:

        Traffic
      

6. Apply Changes

  • Click Apply (top-right corner) to save the panel and return to the dashboard view.

7. Save the Dashboard

  • Click the Save dashboard icon (💾) at the top of the screen.

  • In the Dashboard name field, enter a name like:

      Web Traffic Dashboard
    
  • Click Save.


Alerting

What is Alerting?

Alerting is the process of continuously monitoring data from systems, services, or applications and generating notifications when certain predefined conditions or thresholds are met.

In the context of monitoring tools like Grafana, alerting helps identify issues such as high traffic, increased error rates, or resource exhaustion. It ensures that problems are detected early so that action can be taken promptly.


Key Components of Alerting:

  1. Alert Rule: Defines what condition should trigger an alert.

    • Example: If CPU usage is above 90% for 5 minutes.
  2. Condition: The specific logic that evaluates whether the threshold is breached.

  3. Evaluation Interval: How often the system checks the condition.

  4. Notification Channel (Contact Point): Where the alert is sent (e.g., email, Slack, webhook).

  5. Alert State: Indicates whether the condition is active (firing), resolved, or pending.


Task-02


1. Set up Grafana Cloud

A. Create a Grafana Cloud Account

B. Create Your Stack

  • After signing in, click "Create Stack".

  • Choose:

    • Stack name: (e.g., vanshika-devops)

    • Region: Choose the nearest to you.

  • Click "Create Stack"

C. Access Grafana

  • From your dashboard, click "Launch Grafana."

  • You’ll be taken to your Grafana Cloud instance (e.g., https://your-stack.grafana.net)


2. Set Up a Sample Alert

We’ll simulate a basic alert using Grafana’s built-in metrics (or with Prometheus if you connect it).

A. Go to Alerting Section

  • In the left menu, click "Alerting" > "Alert rules."

  • Click "New alert rule."

B. Define the Alert Rule

  • Choose a data source. If no source is added, add Grafana TestData DB for demo purposes:

    • Go to Connections > Data sources

    • Add TestData DB

    • Return to alert creation

C. Configure the Query

  • Query A:

    • Choose TestData DB

    • Select scenario: Random walk

  • Set Condition: When avg() of query A is above 50

  • Set Evaluation interval: Every 1 minute

D. Add Summary & Labels

  • Summary: Demo Alert: Random walk is too high

  • Labels: severity=low, team=devops

E. Add a Contact Point

  • Go to Alerting > Contact points

  • Click New contact point

  • Choose method: Email, Slack, Webhook, etc.

  • Enter destination details and save

  • Go to Alerting > Notification policies

  • Add a policy that matches your labels (e.g., severity=low)

  • Select the contact point you created

G. Save the Alert Rule

  • Back in the alert rule editor, click Save rule

Final Step: Test the Alert

  • Temporarily lower the threshold to ensure the alert triggers.

  • Check your email or Slack to see if you receive a notification.


Grafana Cloud


Task - 03

  1. Set up alerts for EC2 instances.

  2. Set up alerts for AWS Billing Alerts.


Part 1: Set up Alerts for EC2 Instances

A. Connect AWS CloudWatch to Grafana Cloud

  1. In Grafana Cloud:

    • Go to Connections > Data Sources

    • Click "Add data source."

    • Choose CloudWatch

  2. Enter AWS credentials:

    • Choose Access & secret key or IAM role (via AWS plugin).

    • Provide:

      • Access Key ID

      • Secret Access Key

      • Region (e.g., us-east-1)

    • Click Save & Test

For secure and production-ready use, prefer IAM roles or use AWS CloudWatch integration in Grafana Cloud's AWS plugin.


B. Create an Alert Rule for EC2

  1. Go to Alerting > Alert Rules

  2. Click New alert rule

  3. Select CloudWatch as data source

  4. Build query:

    • Namespace: AWS/EC2

    • Metric Name: CPUUtilization

    • Statistics: Average

    • Dimensions: Choose your instance ID

  5. Set condition:

    • WHEN avg() of query A IS ABOVE 80 for 5 minutes
  6. Add summary:

    • High CPU usage on EC2 instance {{instance_id}}
  7. Save and attach a notification policy/contact point.


Part 2: Set up Alerts for AWS Billing

AWS Billing metrics are only available in the us-east-1 region. Ensure CloudWatch billing alerts are enabled in AWS first.

A. Enable Billing Metrics in AWS

  1. Go to AWS Console > CloudWatch

  2. In the left sidebar, click Billing

  3. Click Enable billing metrics (if not already enabled)


B. Create an Alert Rule for Billing in Grafana

  1. In Grafana Cloud, go to Alerting > Alert rules

  2. Click New alert rule

  3. Select CloudWatch as the data source

  4. Build the query:

    • Namespace: AWS/Billing

    • Metric Name: EstimatedCharges

    • Dimensions:

      • Currency = USD

      • (Optional) ServiceName = AmazonEC2

  5. Set condition:

    • WHEN avg() of query A IS ABOVE 10
  6. Summary:

    • Billing alert: AWS charges have exceeded $10
  7. Save and assign to a contact point (e.g., email or Slack)


Prometheus

Prometheus Monitoring Architecture:


  • Prometheus Server: Scrapes metrics from targets and stores them in a local time-series database.

  • Exporters: Expose metrics in a Prometheus-readable format (e.g., Node Exporter for system metrics).

  • Push Gateway: Used for short-lived jobs to push metrics to Prometheus.

  • Alertmanager: Manages alerts from Prometheus and sends notifications to channels like email or Slack.

  • Service Discovery: Automatically finds targets to scrape (Kubernetes, AWS EC2, etc.).

  • Grafana: Used for better visualization and dashboards by connecting to Prometheus.

Data Flow:
Targets → Prometheus → Storage → Alerts → Alertmanager → Notifications → Grafana for visualization


Key Features of Prometheus:

  • Multi-dimensional Data Model: Metrics are stored as time series identified by metric name and labels.

  • Powerful Query Language (PromQL): Flexible queries to select and aggregate time series data.

  • Pull-based Metrics Collection: Prometheus scrapes targets over HTTP at regular intervals.

  • Time-Series Storage: Efficient, local data storage with optional remote storage integrations.

  • Alerting: Built-in alerting system with rules and integration with Alertmanager for notifications.

  • Service Discovery: Automatically discovers targets via Kubernetes, Consul, EC2, etc.

  • Visualization: Basic UI built-in, and excellent integration with tools like Grafana for rich dashboards.

  • Scalability and Reliability: Designed to run standalone without external dependencies.

  • Open Source: Fully open-source with a strong, active community.


Components of Prometheus:

  • Prometheus Server: The core component that scrapes, stores, and queries metrics.

  • Exporters: Help expose application and system metrics in a format Prometheus can scrape (e.g., Node Exporter, Blackbox Exporter).

  • Push Gateway: Allows short-lived applications to push metrics to Prometheus.

  • Alertmanager: Handles alerts sent by the Prometheus server, managing routing and notifications.

  • Service Discovery: Automatically finds targets to monitor, integrating with systems like Kubernetes, Consul, and AWS.

  • Visualization Layer: Prometheus has a basic UI, but it integrates seamlessly with Grafana for advanced dashboards and visualizations.

Database used by Prometheus:

Prometheus uses Prometheus' own time-series database (TSDB) to store its data. This database is optimized for handling time-series data, which is data indexed by timestamps. The data is stored in a custom storage format, designed to handle the high volume of metrics that Prometheus collects from various services.

The key characteristics of Prometheus' database are:

  1. Time-series optimized: It stores data with a timestamp and labels, allowing for efficient querying and aggregation of time-series metrics.

  2. Write-once: Data is written to the database once, and cannot be updated or deleted, which simplifies the storage model.

  3. Efficient storage: Prometheus uses compression algorithms to store data efficiently, and older data can be downsampled to reduce storage usage.

  4. Retention-based: Prometheus supports retention policies, where you can set how long data is kept before it's automatically deleted.

While Prometheus is not a relational database like MySQL or PostgreSQL, its custom-built TSDB is highly suited to its use case of monitoring and metrics collection.

Default data retention period in Prometheus:

The default data retention period in Prometheus is 15 days.

This means that by default, Prometheus will store metrics data for 15 days before automatically deleting the older data. However, you can customize this retention period by adjusting the --storage.tsdb.retention.time flag when starting Prometheus.

For example, to set a retention period of 30 days, you can start Prometheus with the following option:

--storage.tsdb.retention.time=30d

This allows you to retain data for a longer or shorter period depending on your needs.

1
Subscribe to my newsletter

Read articles from Vanshika Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vanshika Sharma
Vanshika Sharma

I am currently a B.Tech student pursuing Computer Science with a specialization in Data Science at I.T.S Engineering College. I am always excited to learn and explore new things to increase my knowledge. I have good knowledge of programming languages such as C, Python, Java, and web development.