Task 01:

In the sidebar, hover your cursor over the Create (plus sign) icon and click Dashboard.
Click Add a new panel.
In the Query editor below the graph, enter the query from earlier and then press Shift + Enter:

sum(rate(tns_request_duration_seconds_count[5m])) by(route)

In the Legend field, enter {{route}} to rename the time series in the legend. The graph legend updates when you click outside the field.
In the Panel editor on the right, under Settings, change the panel title to “Traffic”.
Click Apply in the top-right corner to save the panel and return to the dashboard view.
Click the Save dashboard (disk) icon at the top of the dashboard to save your dashboard.
Enter a name in the Dashboard name field and then click Save.

Here’s a clear breakdown of Task 01 to guide you through building your first Grafana dashboard panel using the provided query:

✅ Task 01: Create a Traffic Panel in Grafana

1. Create a New Dashboard

On the left sidebar, hover over the Create icon (➕).
Click Dashboard.

2. Add a New Panel

Click Add a new panel in the center.

3. Enter the Query

In the Query editor (below the graph area):

Paste this query:

  sum(rate(tns_request_duration_seconds_count[5m])) by(route)

Press Shift + Enter to run the query.

4. Customize the Legend

In the Legend field (below the query), enter:
```
  {{route}}
```
Click outside the field to update the graph legend.

5. Change Panel Title

On the right, in the Panel editor > Settings section:
- Change the Panel title to:
```
  Traffic
```

6. Apply Changes

Click Apply (top-right corner) to save the panel and return to the dashboard view.

7. Save the Dashboard

Click the Save dashboard icon (💾) at the top of the screen.
In the Dashboard name field, enter a name like:
```
  Web Traffic Dashboard
```
Click Save.

Alerting

What is Alerting?

Alerting is the process of continuously monitoring data from systems, services, or applications and generating notifications when certain predefined conditions or thresholds are met.

In the context of monitoring tools like Grafana, alerting helps identify issues such as high traffic, increased error rates, or resource exhaustion. It ensures that problems are detected early so that action can be taken promptly.

Key Components of Alerting:

Alert Rule: Defines what condition should trigger an alert.
- Example: If CPU usage is above 90% for 5 minutes.
Condition: The specific logic that evaluates whether the threshold is breached.
Evaluation Interval: How often the system checks the condition.
Notification Channel (Contact Point): Where the alert is sent (e.g., email, Slack, webhook).
Alert State: Indicates whether the condition is active (firing), resolved, or pending.

Task-02

Setup Grafana cloud
Setup sample alerting

1. Set up Grafana Cloud

A. Create a Grafana Cloud Account

Go to: https://grafana.com/signup
Sign up using your email, GitHub, or Google account.
Choose the Grafana Cloud Free plan.

B. Create Your Stack

After signing in, click "Create Stack".
Choose:
- Stack name: (e.g., vanshika-devops)
- Region: Choose the nearest to you.
Click "Create Stack"

C. Access Grafana

From your dashboard, click "Launch Grafana."
You’ll be taken to your Grafana Cloud instance (e.g., https://your-stack.grafana.net)

2. Set Up a Sample Alert

We’ll simulate a basic alert using Grafana’s built-in metrics (or with Prometheus if you connect it).

A. Go to Alerting Section

In the left menu, click "Alerting" > "Alert rules."
Click "New alert rule."

B. Define the Alert Rule

Choose a data source. If no source is added, add Grafana TestData DB for demo purposes:
- Go to Connections > Data sources
- Add TestData DB
- Return to alert creation

C. Configure the Query

Query A:
- Choose TestData DB
- Select scenario: Random walk
Set Condition: When avg() of query A is above 50
Set Evaluation interval: Every 1 minute

D. Add Summary & Labels

Summary: Demo Alert: Random walk is too high
Labels: severity=low, team=devops

E. Add a Contact Point

Go to Alerting > Contact points
Click New contact point
Choose method: Email, Slack, Webhook, etc.
Enter destination details and save

F. Link Contact Point to Alert Rule

Go to Alerting > Notification policies
Add a policy that matches your labels (e.g., severity=low)
Select the contact point you created

G. Save the Alert Rule

Back in the alert rule editor, click Save rule

Final Step: Test the Alert

Temporarily lower the threshold to ensure the alert triggers.
Check your email or Slack to see if you receive a notification.

Grafana Cloud

Task - 03

Set up alerts for EC2 instances.
Set up alerts for AWS Billing Alerts.

Part 1: Set up Alerts for EC2 Instances

A. Connect AWS CloudWatch to Grafana Cloud

In Grafana Cloud:
- Go to Connections > Data Sources
- Click "Add data source."
- Choose CloudWatch
Enter AWS credentials:
- Choose Access & secret key or IAM role (via AWS plugin).
- Provide:
  - Access Key ID
  - Secret Access Key
  - Region (e.g., us-east-1)
- Click Save & Test

For secure and production-ready use, prefer IAM roles or use AWS CloudWatch integration in Grafana Cloud's AWS plugin.

B. Create an Alert Rule for EC2

Go to Alerting > Alert Rules
Click New alert rule
Select CloudWatch as data source
Build query:
- Namespace: AWS/EC2
- Metric Name: CPUUtilization
- Statistics: Average
- Dimensions: Choose your instance ID
Set condition:
- WHEN avg() of query A IS ABOVE 80 for 5 minutes
Add summary:
- High CPU usage on EC2 instance {{instance_id}}
Save and attach a notification policy/contact point.

Part 2: Set up Alerts for AWS Billing

AWS Billing metrics are only available in the us-east-1 region. Ensure CloudWatch billing alerts are enabled in AWS first.

A. Enable Billing Metrics in AWS

Go to AWS Console > CloudWatch
In the left sidebar, click Billing
Click Enable billing metrics (if not already enabled)

B. Create an Alert Rule for Billing in Grafana

In Grafana Cloud, go to Alerting > Alert rules
Click New alert rule
Select CloudWatch as the data source
Build the query:
- Namespace: AWS/Billing
- Metric Name: EstimatedCharges
- Dimensions:
  - Currency = USD
  - (Optional) ServiceName = AmazonEC2
Set condition:
- WHEN avg() of query A IS ABOVE 10
Summary:
- Billing alert: AWS charges have exceeded $10
Save and assign to a contact point (e.g., email or Slack)

Prometheus

Prometheus Monitoring Architecture:

Prometheus Server: Scrapes metrics from targets and stores them in a local time-series database.
Exporters: Expose metrics in a Prometheus-readable format (e.g., Node Exporter for system metrics).
Push Gateway: Used for short-lived jobs to push metrics to Prometheus.
Alertmanager: Manages alerts from Prometheus and sends notifications to channels like email or Slack.
Service Discovery: Automatically finds targets to scrape (Kubernetes, AWS EC2, etc.).
Grafana: Used for better visualization and dashboards by connecting to Prometheus.

Data Flow:
Targets → Prometheus → Storage → Alerts → Alertmanager → Notifications → Grafana for visualization

Key Features of Prometheus:

Multi-dimensional Data Model: Metrics are stored as time series identified by metric name and labels.
Powerful Query Language (PromQL): Flexible queries to select and aggregate time series data.
Pull-based Metrics Collection: Prometheus scrapes targets over HTTP at regular intervals.
Time-Series Storage: Efficient, local data storage with optional remote storage integrations.
Alerting: Built-in alerting system with rules and integration with Alertmanager for notifications.
Service Discovery: Automatically discovers targets via Kubernetes, Consul, EC2, etc.
Visualization: Basic UI built-in, and excellent integration with tools like Grafana for rich dashboards.
Scalability and Reliability: Designed to run standalone without external dependencies.
Open Source: Fully open-source with a strong, active community.

Components of Prometheus:

Prometheus Server: The core component that scrapes, stores, and queries metrics.
Exporters: Help expose application and system metrics in a format Prometheus can scrape (e.g., Node Exporter, Blackbox Exporter).
Push Gateway: Allows short-lived applications to push metrics to Prometheus.
Alertmanager: Handles alerts sent by the Prometheus server, managing routing and notifications.
Service Discovery: Automatically finds targets to monitor, integrating with systems like Kubernetes, Consul, and AWS.
Visualization Layer: Prometheus has a basic UI, but it integrates seamlessly with Grafana for advanced dashboards and visualizations.

Database used by Prometheus:

Prometheus uses Prometheus' own time-series database (TSDB) to store its data. This database is optimized for handling time-series data, which is data indexed by timestamps. The data is stored in a custom storage format, designed to handle the high volume of metrics that Prometheus collects from various services.

The key characteristics of Prometheus' database are:

Time-series optimized: It stores data with a timestamp and labels, allowing for efficient querying and aggregation of time-series metrics.
Write-once: Data is written to the database once, and cannot be updated or deleted, which simplifies the storage model.
Efficient storage: Prometheus uses compression algorithms to store data efficiently, and older data can be downsampled to reduce storage usage.
Retention-based: Prometheus supports retention policies, where you can set how long data is kept before it's automatically deleted.

While Prometheus is not a relational database like MySQL or PostgreSQL, its custom-built TSDB is highly suited to its use case of monitoring and metrics collection.

Default data retention period in Prometheus:

The default data retention period in Prometheus is 15 days.

This means that by default, Prometheus will store metrics data for 15 days before automatically deleting the older data. However, you can customize this retention period by adjusting the --storage.tsdb.retention.time flag when starting Prometheus.

For example, to set a retention period of 30 days, you can start Prometheus with the following option:

--storage.tsdb.retention.time=30d

This allows you to retain data for a longer or shorter period depending on your needs.

Build a Grafana dashboard

Table of contents