Full-Stack Observability on AWS EKS: Prometheus, Grafana & ELK with Helm

Introduction

In the world of DevOps, monitoring and logging are non-negotiable essentials. They provide real-time visibility into system health, help debug issues faster, and ensure proactive performance tuning. Without robust observability, even the most resilient applications can fail silently.

For this full-day hands-on lab, my goal was to build a complete monitoring and logging pipeline on Kubernetes using open-source tools. I wanted to gain end-to-end visibility into pod performance, node resource usage, and container logs—all while hosting everything on AWS EKS.

To achieve this, I used the following stack:

  • Prometheus – for collecting and storing metrics

  • Grafana – for visualizing and alerting on metrics

  • Elasticsearch – for indexing and storing logs

  • Kibana – for log visualization and analytics

  • Filebeat – to ship Kubernetes logs to Elasticsearch

  • Helm – to simplify deployments of all components

  • EKS (Elastic Kubernetes Service) – as the managed Kubernetes platform

This blog walks through the exact steps I followed—from cluster setup to visualizing logs and metrics—and the key takeaways from this practical observability journey.

Step 1:Setting Up the EKS Admin EC2 Instance To interact with the Kubernetes cluster on EKS

I first launched an EC2 instance that serves as my administration node. From this machine, I installed and used tools like kubectl, eksctl, and helm.

✅ EC2 Instance Configuration:

FieldValue
Nameeks-admin-ec2
AMIAmazon Linux 2 (x86_64)
Instance Typet2.medium (or t3.medium if not using free tier)
Key PairCreated/Used existing key pair (e.g., eks-key)
NetworkDefault VPC selected
Security GroupAllowed SSH (22), HTTP (80), HTTPS (443)
Storage20 GiB (gp3)

This instance ran in the us-east-1a availability zone and used a public IPv4 address for easy access.

Phase 1: Connect and Initial Setup

Once the EC2 admin instance was running, I SSH’d into it and installed all necessary tools to interact with my EKS cluster.

Step 1: Update the System

sudo yum update -y

Step 2: Install Required Tools

✅ Install Docker

sudo yum install docker -y
sudo systemctl enable docker
sudo systemctl start docker

✅ Install kubectl

curl -LO "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin/
kubectl version --client

✅ Install eksctl

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz
sudo mv eksctl /usr/local/bin

✅ Install Helm

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

✅ Install jq

sudo yum install jq -y

Step 3: Install AWS CLI v2

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version

Step 4: Configure AWS CLI

aws configure

You’ll be prompted to enter:

  • AWS Access Key ID

  • AWS Secret Access Key

  • Default region name: us-east-1

  • Default output format: json

Phase 2: Creating the EKS Cluster using eksctl

With all the required tools installed and the AWS CLI configured, I used eksctl to spin up a fully managed EKS cluster on AWS.

🚀 Cluster Creation Command

eksctl create cluster \
  --name devops-cluster \
  --region us-east-1 \
  --nodes 2 \
  --node-type t3.medium \
  --with-oidc \
  --managed

What this command does:

  • --name devops-cluster: Names the cluster.

  • --region us-east-1: Deploys the cluster in N. Virginia.

  • --nodes 2: Starts with 2 worker nodes.

  • --node-type t3.medium: Each node uses t3.medium instance type.

  • --with-oidc: Enables OIDC provider for IAM roles for service accounts (IRSA).

  • --managed: Uses AWS-managed node groups for easier upgrades and scaling.

The provisioning process automatically:

✅ Creates the EKS control plane
✅ Sets up a VPC (if not provided)
✅ Deploys managed worker nodes
✅ Configures Kubernetes add-ons like coredns, kube-proxy, and metrics-server

After around 15 minutes, the cluster and its node group were fully ready to use.

Verifying the EKS Cluster Nodes

kubectl get nodes
#Check the service and get the external IP (LoadBalancer):
kubectl get svc

This confirmed that both nodes were in Ready state and running Kubernetes version v1.32.3-eks.

Deploying My Application to EKS

After creating the EKS cluster, I deployed my application using Kubernetes manifests directly from my GitHub repository. This helped me generate live traffic and logs, which were later used for observability.

I ran the following command:

kubectl apply -f https://raw.githubusercontent.com/PasupuletiBhavya/devsecops-project/master/Manifests/dss.yml

This deployed my application (YelpCamp-style app) to the cluster, making it accessible via a LoadBalancer. With this running app, I could proceed to set up Prometheus, Grafana, and the ELK stack to monitor logs and metrics.

Test the Load Balancer URL

Phase 3: Monitoring with Prometheus & Grafana

With the EKS cluster ready, I moved on to deploying a comprehensive monitoring solution using the kube-prometheus-stack Helm chart, which bundles Prometheus, Grafana, and several Kubernetes observability tools.

Step 1: Add Helm Repo for Prometheus Community Charts

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Step 2: Install kube-prometheus-stack Chart

helm install kube-prom-stack prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace

This installs a complete monitoring suite, including:

  • Prometheus – for metrics collection

  • Grafana – for dashboards and visualizations

  • Alertmanager – for alert handling

  • Node Exporter – to expose node metrics

  • Kube State Metrics – for Kubernetes object monitoring

Step 3: Validate Deployment

I checked the pods and services in the monitoring namespace:

kubectl get pods -n monitoring
kubectl get svc -n monitoring

All components were up and running:

✅ Pods Status

Pod NameStatus
kube-prom-stack-grafanaRunning
kube-prom-stack-kube-prome-prometheusRunning
kube-state-metricsRunning
node-exporterRunning
alertmanagerRunning

Phase 4: Exposing Grafana & Prometheus via LoadBalancer

By default, the Grafana and Prometheus services are internal-only (ClusterIP). To access their UIs from the browser, I patched both services to be of type LoadBalancer.

Step 1: Patch Services

🔄 Expose Grafana:

kubectl patch svc kube-prom-stack-grafana -n monitoring \
  -p '{"spec": {"type": "LoadBalancer"}}'

🔄 Expose Prometheus:

kubectl patch svc kube-prom-stack-kube-prome-prometheus -n monitoring \
  -p '{"spec": {"type": "LoadBalancer"}}'

Step 2: Get External IPs

After patching, I waited a minute and ran:

kubectl get svc -n monitoring

This returned external DNS endpoints for both Grafana and Prometheus:

ServiceExternal IP / URL
kube-prom-stack-grafanaa865c57d922fe4de4b08cef4e3c0aeee-2006643144.us-east-1.elb.amazonaws.com (Port 80)
kube-prom-stack-kube-prome-prometheusa8a9d4669a7144b5caeddf178c5e8eab-21081589.us-east-1.elb.amazonaws.com (Port 9090)

📌 You can now access:

  • Grafana: http://<grafana-external-ip>

  • Prometheus: http://<prometheus-external-ip>

Step 3: Retrieve Grafana Admin Password

To log in to the Grafana dashboard, I fetched the auto-generated admin password:

kubectl get secret kube-prom-stack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
echo

Credentials:

  • Username: admin

  • Password: (output from the above command)

👉 Steps to Import:

  1. Go to Grafana UI.

  2. In the left sidebar, click “+” ➝ Import.

  3. Enter the dashboard ID (e.g., 1860) and click Load.

  4. Select Prometheus as the data source and click Import.

Phase 6: External Endpoint Monitoring with Blackbox Exporter

In real-world production systems, it’s essential to monitor not just internal metrics but also the availability of external endpoints. For this, I used Blackbox Exporter with Prometheus to actively probe the HTTP status of my deployed app.

🚀 Step 1: Define a Probe Resource

I created a custom Probe object using the following YAML to monitor my external application running at port 3000:

apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: campground-probe
  namespace: monitoring
  labels:
    probe: "true"   # ✅ Required for Prometheus to scrape this probe
spec:
  jobName: "blackbox-campground"
  interval: 30s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring.svc.cluster.local:9115
  targets:
    staticConfig:
      static:
        - http://a6df26611d1c84f4d9431caf2ebe7e1f-1142985076.us-east-1.elb.amazonaws.com:3000/

This config tells Prometheus to:

  • Check the URL every 30 seconds

  • Use http_2xx module to verify HTTP status

  • Use Blackbox Exporter inside the monitoring namespace

✅ Result: Probe is UP! 🟢

Prometheus successfully picked up the probe, and I could see it turn green (UP) in the Prometheus Targets UI. This means:

  • ✅ Prometheus is scraping the probe endpoint

  • ✅ Blackbox Exporter is reachable

  • ✅ The external app is up and responding correctly

Visualize Probe Status in Grafana

You can visualize Blackbox status directly in Grafana by creating a panel with the following PromQL:

probe_success{job="blackbox-campground"}

This will show a 1 (UP) or 0 (DOWN) based on the latest probe result.

Phase 7: Deploying ELK Stack for Kubernetes Log Monitoring

While Prometheus and Grafana give us great metrics visibility, we also need to monitor application and system logs. That’s where the ELK stack—Elasticsearch, Logstash (optional), and Kibana—comes in. I deployed the ELK stack using Helm charts for simplicity.

Step 1: Add Elastic Helm Repository

helm repo add elastic https://helm.elastic.co
helm repo update

Step 2: Create Namespace for ELK

kubectl create namespace elk

Step 3: Install Elasticsearch

helm install elasticsearch elastic/elasticsearch \
  -n elk \
  --set volumeClaimTemplate.storageClassName=gp2 \
  --set replicas=1 \
  --set minimumMasterNodes=1 \
  --set resources.requests.memory=512Mi \
  --set resources.requests.cpu=100m \
  --set resources.limits.memory=1Gi \
  --set resources.limits.cpu=500m

⚠️ I initially faced an issue where the Elasticsearch pod was stuck in Pending due to EBS volume provisioning failure. The fix was to:

  • Install the AWS EBS CSI driver

  • Attach the AmazonEBSCSIDriverPolicy to my worker node IAM role

  • Then reattempt deployment

You can check pod status using:

kubectl get pods -n elk -l app=elasticsearch-master

Step 4: Install Kibana

helm install kibana elastic/kibana -n elk \
  --set service.type=LoadBalancer

Then get the external IP:

kubectl get svc -n elk | grep kibana

Access Kibana at:

http://<EXTERNAL-IP>:5601

Step 5: Install Filebeat (Log Forwarder)

helm install filebeat elastic/filebeat -n elk \
  --set daemonset.enabled=true \
  --set elasticsearch.hosts="{http://elasticsearch-master.elk.svc.cluster.local:9200}"

Check Filebeat pods:

kubectl get pods -n elk -l app=filebeat

Step 6: Access Logs in Kibana

Once Kibana is up:

  1. Open http://<EXTERNAL-IP>:5601

  2. Go to “Discover”

  3. You should start seeing Kubernetes logs (collected by Filebeat and indexed by Elasticsearch)

What Each Component Does:

ComponentRole
ElasticsearchStores the logs indexed from the cluster
FilebeatCollects logs from all Kubernetes nodes
KibanaProvides a dashboard for visualizing and searching logs

✅ Elasticsearch is Up and Running!

After deploying the Elasticsearch Helm chart and patching the service to LoadBalancer, I accessed it via:

http://<elasticsearch-external-ip>:9200

As seen in the screenshot:

{
  "name": "elasticsearch-master-0",
  "cluster_name": "elasticsearch",
  "version": {
    "number": "8.5.1",
    ...
  },
  "tagline": "You Know, for Search"
}

This confirms that:

  • Elasticsearch is accessible

  • Cluster health is good

  • Ready to receive logs from Filebeat

What This Confirms:

  • ✅ Your Elasticsearch Pod is running

  • ✅ The LoadBalancer service is working externally

  • ✅ Elasticsearch is responding correctly on port 9200

  • ✅ You have secure HTTPS access to the API

Elasticsearch Setup and External Access (via Load Balancer)

✅ What I Achieved:

  • Deployed Elasticsearch on Amazon EKS using Helm

  • Exposed Elasticsearch service using Load Balancer (not just port-forward)

  • Verified secure external access using the default elastic user credentials

Steps I Followed:

1. Check Pod Status

kubectl get pods -n elk

elasticsearch-master-0 was in Running state with 1/1 containers ready.

2. Get Elasticsearch Password

kubectl get secrets --namespace=elk elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d

🔐 Password: c6WWaP7tt26OGiY8

3. Expose Elasticsearch via LoadBalancer

kubectl patch svc elasticsearch-master -n elk \
  -p '{"spec": {"type": "LoadBalancer"}}'

4. Get External Access URL

kubectl get svc -n elk

🔗 URL:

https://ab42505e742ba4deab140d74087ed823-28402472.us-east-1.elb.amazonaws.com:9200

5. Verify from Browser / cURL

curl -u elastic:c6WWaP7tt26OGiY8 -k https://ab42505e742ba4deab140d74087ed823-28402472.us-east-1.elb.amazonaws.com:9200

📘 Notes:

  • I used self-signed certs, hence -k flag for cURL

  • Elastic Helm chart defaults to TLS enabled, which is ideal for production

Step-by-Step: Deploying Kibana & Verifying ELK Stack Access on Kubernetes (with Load Balancer)

After successfully exposing Elasticsearch, the next phase was to install Kibana and make it externally accessible via a LoadBalancer. Here's how I did it:

✅ Step 1: Install Kibana via Helm with LoadBalancer Enabled

helm install kibana elastic/kibana -n elk \
  --set service.type=LoadBalancer \
  --set elasticsearchHosts=https://elasticsearch-master:9200 \
  --set resources.requests.memory=512Mi \
  --set resources.requests.cpu=100m \
  --set resources.limits.memory=1Gi \
  --set resources.limits.cpu=500m

📝 Notes:

  • elasticsearchHosts points to your Elasticsearch service (inside cluster).

  • Resource limits help manage memory/CPU in Kubernetes.

Step 2: Wait for External IP

After a few minutes, I fetched the external IP:

kubectl get svc -n elk

✔️ I saw the EXTERNAL-IP assigned to kibana-kibana like:

kibana-kibana   LoadBalancer   10.100.x.x   abcd1234.elb.amazonaws.com   5601:xxxx/TCP

Step 3: Access the Kibana UI

Open your browser:

http://<EXTERNAL-IP>:5601

🧭 You’ll land on the Kibana dashboard, ready to visualize logs and metrics.

(Optional) Configure Public Base URL for Kibana

To avoid redirect issues, especially when accessing Kibana behind a LoadBalancer:

--set kibanaConfig.kibana.yml.server.publicBaseUrl=http://<EXTERNAL-IP>:5601

💡 What’s Next?

Now that both Elasticsearch and Kibana are publicly accessible, you can:

  • Explore Elasticsearch data in Kibana

  • Create index patterns and dashboards

  • Add Filebeat or Logstash to ingest logs

  • Combine with Grafana dashboards or Prometheus alerts

Filebeat Setup for Real-Time Log Shipping

To send Kubernetes pod logs to Elasticsearch, I used Filebeat + Autodiscover.

✅ Add Elastic Helm repo :

helm repo add elastic https://helm.elastic.co
helm repo update

✅ Created filebeat-values.yaml for Autodiscover

filebeatConfig:
  filebeat.yml: |
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          hints.enabled: true
          hints.default_config:
            type: container
            paths:
              - /var/log/containers/*.log
    output.elasticsearch:
      hosts: ["https://elasticsearch-master:9200"]
      username: "elastic"
      password: "c6WWaP7tt26OGiY8"
      ssl.verification_mode: "none"

Saved as filebeat-values.yaml

Install Filebeat

helm install filebeat elastic/filebeat -n elk \
  -f filebeat-values.yaml

Checked status:

kubectl get pods -n elk

✅ Filebeat pod was running!

Verifying Logs in Kibana

Once Filebeat was running:

  • Opened Kibana

  • Create an Index Pattern:

    • Go to "Stack Management" → "Index Patterns"

    • Click "Create index pattern"

    • Enter:

        filebeat-*
      
    • Select @timestamp as the time filter field

    • Click Create index pattern

  • Go to Discover Tab:

    • Navigate to “Discover”

    • Select your new index pattern (filebeat-*)

    • You should see logs coming in from your containers!

🎉 I could see real-time logs from Kubernetes pods being shipped to Elasticsearch via Filebeat!

✅ Final Setup Overview

ComponentStatus
Elasticsearch✅ Running (LB)
Kibana✅ Running (LB)
Filebeat✅ Installed
Logs in Kibana✅ Verified

Live Kubernetes Logs in Kibana Discover

After successfully installing Filebeat and connecting it to Elasticsearch, I moved to Kibana's Discover tab to view live logs.

📸 Here’s a snapshot of my Kibana dashboard:

As you can see:

  • I queried the filebeat-* index

  • Kibana showed live pod logs from my Kubernetes cluster

  • The logs contain metadata like:

    • agent.hostname: which Filebeat pod shipped the logs

    • kubernetes.namespace: elk

    • container.name: which container the log came from

    • timestamp, node, zone, topology, and more

🎉 Success: My entire EKS cluster logs are now searchable and filterable in Kibana.

What’s Happening Behind the Scenes:

  • Filebeat is running as a DaemonSet in Kubernetes

  • It autodetects containers using Kubernetes hints

  • Logs from /var/log/containers/*.log are shipped securely to Elasticsearch

  • Kibana visualizes and indexes them for querying

Create Your First Visualization

After setting up Filebeat and confirming logs were reaching Elasticsearch, I used Kibana Lens to quickly visualize log volume.

What I Did:

  • Selected the filebeat-* index

  • Used @timestamp on the X-axis

  • Set Y-axis to show the count of records

  • Time range: Last 5 minutes

  • Chart type: Vertical bar

What It Shows:

This chart displays how many logs are coming in every few seconds. It helps confirm:

  • Filebeat is sending logs continuously

  • There are no big gaps or sudden spikes

Simple and effective way to monitor log flow in real time. ✅

🍩 Pod-wise Log Distribution (Donut Chart)

To understand which pods are generating the most logs, I created a donut chart in Kibana.

What I Did:

  • Index pattern: filebeat-*

  • Slice by: Top 5 values of kubernetes.pod.name

  • Size by: Count of records

  • Time range: Last 5 minutes

What It Shows:

This chart shows the log volume share from each pod.

For example:

  • filebeat-filebeat-77fs9 generated 25% of logs

  • elasticsearch-master-0 and grafana each contributed ~21%

  • Helps spot noisy pods or identify issues quickly

A great way to visually monitor log load across components!

Log Distribution by Pod – Latest View

To track how logs are distributed among the top 5 Kubernetes pods, I generated this donut chart.

Setup:

  • Index pattern: filebeat-*

  • Slice by: Top 5 values of kubernetes.pod.name

  • Size by: Count of records

Insights:

  • filebeat-filebeat-77fs9 contributed the most logs (~37%)

  • elasticsearch-master-0 and filebeat-filebeat-wg9ws each logged ~31%

  • This helps quickly identify the busiest pods in terms of logging

👉 Great for spotting potential log flooding or heavy activity.

Pod-Wise Log Distribution – Bar Chart View

To visualize log volume by pod, I created a vertical bar chart using Filebeat data in Kibana.

Configuration:

  • Index pattern: filebeat-*

  • X-axis: Top 5 values of kubernetes.pod.name

  • Y-axis: Count of log records

Quick Takeaway:

  • filebeat-filebeat-77fs9 generated the highest number of logs.

  • Other pods like filebeat-filebeat-wg9ws and elasticsearch-master-0 also show steady activity.

  • This helps in quickly identifying which pods are generating the most logs in near real time.

Namespace-Wise Log Activity Over Time

This bar chart shows how log events are distributed across Kubernetes namespaces (elk, kube-system, and monitoring) over time.

Configuration:

  • Index pattern: filebeat-*

  • X-axis: @timestamp (interval: 30 seconds)

  • Y-axis: Unique count of kubernetes.namespace_labels.kubernetes_io/metadata_name

  • Breakdown: Top 3 values of kubernetes.namespace

Insights:

  • Most activity came from the elk namespace, which includes Elasticsearch and Kibana.

  • kube-system and monitoring show consistent but lower activity.

  • This helps verify if logs are being collected from all critical namespaces.

Log Count per Namespace Over Time

This visualization shows log traffic trends in the elk and monitoring namespaces over the past 15 minutes.

Configuration:

  • Index pattern: filebeat-*

  • X-axis: @timestamp (interval: 30 seconds)

  • Y-axis: Count of records

  • Breakdown: Top 3 values of kubernetes.namespace

Key Observations:

  • The elk namespace is consistently generating logs, which makes sense as it runs Elasticsearch and Kibana.

  • The monitoring namespace shows periodic spikes, indicating bursts of log activity, possibly from Prometheus or Grafana.

This breakdown helps validate that Filebeat is capturing logs across key namespaces as expected.

Error and Failure Logs Tracked Over Time

This chart filters and visualizes logs that include error or failure indicators such as:

"error" OR log.level = "error" OR "ERR" OR "failed"

Configuration:

  • Index pattern: filebeat-*

  • X-axis: @timestamp (30-minute intervals)

  • Y-axis: Count of records

  • Breakdown: Top 3 values of kubernetes.container.name

Observation:

  • A spike in error-related logs was observed from the Grafana container during the recent time window.

  • This helps proactively identify which services are experiencing issues and when.

✅ This kind of filtering and visualization is essential for real-time troubleshooting and alerting.

Final Cleanup (No Billing Left)

Deleted Cluster

eksctl delete cluster --name devops-cluster --region us-east-1

Manually Cleaned:

  • Helm releases

  • EBS volumes (via EC2 dashboard)

  • Load balancers & security groups

  • IAM roles

  • CloudWatch log groups

    What I Did: E2E Kubernetes Observability Stack in a Day

    ✅ Why Monitoring & Logging Matter

    In DevOps, real-time observability is critical for:

    • Detecting issues before users do

    • Troubleshooting failures quickly

    • Analyzing system health & resource usage

'🗓️ My Goal

Build a complete monitoring + logging setup on Kubernetes using open-source tools and clean it up to avoid billing.

Tools Used

  • Amazon EKS – Kubernetes cluster on AWS

  • Helm – Easy deployment of complex apps

  • Prometheus + Grafana – Metrics monitoring & dashboards

  • Elasticsearch + Kibana + Filebeat (ELK) – Centralized log aggregation & analysis

    Final Thoughts

    Setting up an end-to-end observability stack on Kubernetes might seem overwhelming at first — but with the right tools and a structured approach, it becomes manageable and rewarding.

    This hands-on exercise helped me:

    • Understand how logs and metrics flow in real-world clusters

    • Troubleshoot Helm installation issues

    • Visualize logs using Kibana and monitor metrics via Grafana

    • Clean up infrastructure to avoid unnecessary AWS billing

Whether you're learning DevOps or managing production-grade clusters, observability is a skill worth mastering.

🔗 Reference

I referred to this excellent guide during my setup process:
Comprehensive AWS EKS Cluster Monitoring with Prometheus, Grafana, and EFK Stack – 10 Weeks of CloudOps A big thanks to the author for such a well-structured walkthrough!

Thanks for reading!
If you found this helpful, feel free to connect with me or drop your thoughts in the comments.

2
Subscribe to my newsletter

Read articles from Bhavya Pasupuleti directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bhavya Pasupuleti
Bhavya Pasupuleti