Full-Stack Observability on AWS EKS: Prometheus, Grafana & ELK with Helm


Introduction
In the world of DevOps, monitoring and logging are non-negotiable essentials. They provide real-time visibility into system health, help debug issues faster, and ensure proactive performance tuning. Without robust observability, even the most resilient applications can fail silently.
For this full-day hands-on lab, my goal was to build a complete monitoring and logging pipeline on Kubernetes using open-source tools. I wanted to gain end-to-end visibility into pod performance, node resource usage, and container logs—all while hosting everything on AWS EKS.
To achieve this, I used the following stack:
Prometheus – for collecting and storing metrics
Grafana – for visualizing and alerting on metrics
Elasticsearch – for indexing and storing logs
Kibana – for log visualization and analytics
Filebeat – to ship Kubernetes logs to Elasticsearch
Helm – to simplify deployments of all components
EKS (Elastic Kubernetes Service) – as the managed Kubernetes platform
This blog walks through the exact steps I followed—from cluster setup to visualizing logs and metrics—and the key takeaways from this practical observability journey.
Step 1:Setting Up the EKS Admin EC2 Instance To interact with the Kubernetes cluster on EKS
I first launched an EC2 instance that serves as my administration node. From this machine, I installed and used tools like kubectl, eksctl, and helm.
✅ EC2 Instance Configuration:
Field | Value |
Name | eks-admin-ec2 |
AMI | Amazon Linux 2 (x86_64) |
Instance Type | t2.medium (or t3.medium if not using free tier) |
Key Pair | Created/Used existing key pair (e.g., eks-key) |
Network | Default VPC selected |
Security Group | Allowed SSH (22), HTTP (80), HTTPS (443) |
Storage | 20 GiB (gp3) |
This instance ran in the us-east-1a availability zone and used a public IPv4 address for easy access.
Phase 1: Connect and Initial Setup
Once the EC2 admin instance was running, I SSH’d into it and installed all necessary tools to interact with my EKS cluster.
Step 1: Update the System
sudo yum update -y
Step 2: Install Required Tools
✅ Install Docker
sudo yum install docker -y
sudo systemctl enable docker
sudo systemctl start docker
✅ Install kubectl
curl -LO "https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin/
kubectl version --client
✅ Install eksctl
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz
sudo mv eksctl /usr/local/bin
✅ Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
✅ Install jq
sudo yum install jq -y
Step 3: Install AWS CLI v2
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version
Step 4: Configure AWS CLI
aws configure
You’ll be prompted to enter:
AWS Access Key ID
AWS Secret Access Key
Default region name:
us-east-1
Default output format:
json
Phase 2: Creating the EKS Cluster using eksctl
With all the required tools installed and the AWS CLI configured, I used eksctl
to spin up a fully managed EKS cluster on AWS.
🚀 Cluster Creation Command
eksctl create cluster \
--name devops-cluster \
--region us-east-1 \
--nodes 2 \
--node-type t3.medium \
--with-oidc \
--managed
What this command does:
--name devops-cluster
: Names the cluster.--region us-east-1
: Deploys the cluster in N. Virginia.--nodes 2
: Starts with 2 worker nodes.--node-type t3.medium
: Each node usest3.medium
instance type.--with-oidc
: Enables OIDC provider for IAM roles for service accounts (IRSA).--managed
: Uses AWS-managed node groups for easier upgrades and scaling.
The provisioning process automatically:
✅ Creates the EKS control plane
✅ Sets up a VPC (if not provided)
✅ Deploys managed worker nodes
✅ Configures Kubernetes add-ons like coredns
, kube-proxy
, and metrics-server
After around 15 minutes, the cluster and its node group were fully ready to use.
Verifying the EKS Cluster Nodes
kubectl get nodes
#Check the service and get the external IP (LoadBalancer):
kubectl get svc
This confirmed that both nodes were in Ready
state and running Kubernetes version v1.32.3-eks
.
Deploying My Application to EKS
After creating the EKS cluster, I deployed my application using Kubernetes manifests directly from my GitHub repository. This helped me generate live traffic and logs, which were later used for observability.
I ran the following command:
kubectl apply -f https://raw.githubusercontent.com/PasupuletiBhavya/devsecops-project/master/Manifests/dss.yml
This deployed my application (YelpCamp-style app) to the cluster, making it accessible via a LoadBalancer. With this running app, I could proceed to set up Prometheus, Grafana, and the ELK stack to monitor logs and metrics.
Test the Load Balancer URL
Phase 3: Monitoring with Prometheus & Grafana
With the EKS cluster ready, I moved on to deploying a comprehensive monitoring solution using the kube-prometheus-stack
Helm chart, which bundles Prometheus, Grafana, and several Kubernetes observability tools.
Step 1: Add Helm Repo for Prometheus Community Charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Step 2: Install kube-prometheus-stack
Chart
helm install kube-prom-stack prometheus-community/kube-prometheus-stack \
-n monitoring --create-namespace
This installs a complete monitoring suite, including:
✅ Prometheus – for metrics collection
✅ Grafana – for dashboards and visualizations
✅ Alertmanager – for alert handling
✅ Node Exporter – to expose node metrics
✅ Kube State Metrics – for Kubernetes object monitoring
Step 3: Validate Deployment
I checked the pods and services in the monitoring
namespace:
kubectl get pods -n monitoring
kubectl get svc -n monitoring
All components were up and running:
✅ Pods Status
Pod Name | Status |
kube-prom-stack-grafana | Running |
kube-prom-stack-kube-prome-prometheus | Running |
kube-state-metrics | Running |
node-exporter | Running |
alertmanager | Running |
Phase 4: Exposing Grafana & Prometheus via LoadBalancer
By default, the Grafana and Prometheus services are internal-only (ClusterIP). To access their UIs from the browser, I patched both services to be of type LoadBalancer.
Step 1: Patch Services
🔄 Expose Grafana:
kubectl patch svc kube-prom-stack-grafana -n monitoring \
-p '{"spec": {"type": "LoadBalancer"}}'
🔄 Expose Prometheus:
kubectl patch svc kube-prom-stack-kube-prome-prometheus -n monitoring \
-p '{"spec": {"type": "LoadBalancer"}}'
Step 2: Get External IPs
After patching, I waited a minute and ran:
kubectl get svc -n monitoring
This returned external DNS endpoints for both Grafana and Prometheus:
Service | External IP / URL |
kube-prom-stack-grafana | a865c57d922fe4de4b08cef4e3c0aeee-2006643144.us-east-1.elb.amazonaws.com (Port 80) |
kube-prom-stack-kube-prome-prometheus | a8a9d4669a7144b5caeddf178c5e8eab-21081589.us-east-1.elb.amazonaws.com (Port 9090) |
📌 You can now access:
Grafana:
http://<grafana-external-ip>
Prometheus:
http://<prometheus-external-ip>
Step 3: Retrieve Grafana Admin Password
To log in to the Grafana dashboard, I fetched the auto-generated admin password:
kubectl get secret kube-prom-stack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
echo
Credentials:
Username:
admin
Password: (output from the above command)
👉 Steps to Import:
Go to Grafana UI.
In the left sidebar, click “+” ➝ Import.
Enter the dashboard ID (e.g.,
1860
) and click Load.Select Prometheus as the data source and click Import.
Phase 6: External Endpoint Monitoring with Blackbox Exporter
In real-world production systems, it’s essential to monitor not just internal metrics but also the availability of external endpoints. For this, I used Blackbox Exporter with Prometheus to actively probe the HTTP status of my deployed app.
🚀 Step 1: Define a Probe
Resource
I created a custom Probe
object using the following YAML to monitor my external application running at port 3000
:
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: campground-probe
namespace: monitoring
labels:
probe: "true" # ✅ Required for Prometheus to scrape this probe
spec:
jobName: "blackbox-campground"
interval: 30s
module: http_2xx
prober:
url: blackbox-exporter.monitoring.svc.cluster.local:9115
targets:
staticConfig:
static:
- http://a6df26611d1c84f4d9431caf2ebe7e1f-1142985076.us-east-1.elb.amazonaws.com:3000/
This config tells Prometheus to:
Check the URL every 30 seconds
Use
http_2xx
module to verify HTTP statusUse Blackbox Exporter inside the
monitoring
namespace
✅ Result: Probe is UP! 🟢
Prometheus successfully picked up the probe, and I could see it turn green (UP) in the Prometheus Targets UI. This means:
✅ Prometheus is scraping the probe endpoint
✅ Blackbox Exporter is reachable
✅ The external app is up and responding correctly
Visualize Probe Status in Grafana
You can visualize Blackbox status directly in Grafana by creating a panel with the following PromQL:
probe_success{job="blackbox-campground"}
This will show a 1 (UP) or 0 (DOWN) based on the latest probe result.
Phase 7: Deploying ELK Stack for Kubernetes Log Monitoring
While Prometheus and Grafana give us great metrics visibility, we also need to monitor application and system logs. That’s where the ELK stack—Elasticsearch, Logstash (optional), and Kibana—comes in. I deployed the ELK stack using Helm charts for simplicity.
Step 1: Add Elastic Helm Repository
helm repo add elastic https://helm.elastic.co
helm repo update
Step 2: Create Namespace for ELK
kubectl create namespace elk
Step 3: Install Elasticsearch
helm install elasticsearch elastic/elasticsearch \
-n elk \
--set volumeClaimTemplate.storageClassName=gp2 \
--set replicas=1 \
--set minimumMasterNodes=1 \
--set resources.requests.memory=512Mi \
--set resources.requests.cpu=100m \
--set resources.limits.memory=1Gi \
--set resources.limits.cpu=500m
⚠️ I initially faced an issue where the Elasticsearch pod was stuck in
Pending
due to EBS volume provisioning failure. The fix was to:
Install the AWS EBS CSI driver
Attach the
AmazonEBSCSIDriverPolicy
to my worker node IAM roleThen reattempt deployment
You can check pod status using:
kubectl get pods -n elk -l app=elasticsearch-master
Step 4: Install Kibana
helm install kibana elastic/kibana -n elk \
--set service.type=LoadBalancer
Then get the external IP:
kubectl get svc -n elk | grep kibana
Access Kibana at:
http://<EXTERNAL-IP>:5601
Step 5: Install Filebeat (Log Forwarder)
helm install filebeat elastic/filebeat -n elk \
--set daemonset.enabled=true \
--set elasticsearch.hosts="{http://elasticsearch-master.elk.svc.cluster.local:9200}"
Check Filebeat pods:
kubectl get pods -n elk -l app=filebeat
Step 6: Access Logs in Kibana
Once Kibana is up:
Open
http://<EXTERNAL-IP>:5601
Go to “Discover”
You should start seeing Kubernetes logs (collected by Filebeat and indexed by Elasticsearch)
What Each Component Does:
Component | Role |
Elasticsearch | Stores the logs indexed from the cluster |
Filebeat | Collects logs from all Kubernetes nodes |
Kibana | Provides a dashboard for visualizing and searching logs |
✅ Elasticsearch is Up and Running!
After deploying the Elasticsearch Helm chart and patching the service to LoadBalancer
, I accessed it via:
http://<elasticsearch-external-ip>:9200
As seen in the screenshot:
{
"name": "elasticsearch-master-0",
"cluster_name": "elasticsearch",
"version": {
"number": "8.5.1",
...
},
"tagline": "You Know, for Search"
}
This confirms that:
Elasticsearch is accessible
Cluster health is good
Ready to receive logs from Filebeat
What This Confirms:
✅ Your Elasticsearch Pod is running
✅ The LoadBalancer service is working externally
✅ Elasticsearch is responding correctly on port
9200
✅ You have secure HTTPS access to the API
Elasticsearch Setup and External Access (via Load Balancer)
✅ What I Achieved:
Deployed Elasticsearch on Amazon EKS using Helm
Exposed Elasticsearch service using Load Balancer (not just port-forward)
Verified secure external access using the default
elastic
user credentials
Steps I Followed:
1. Check Pod Status
kubectl get pods -n elk
✅ elasticsearch-master-0
was in Running
state with 1/1
containers ready.
2. Get Elasticsearch Password
kubectl get secrets --namespace=elk elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d
🔐 Password: c6WWaP7tt26OGiY8
3. Expose Elasticsearch via LoadBalancer
kubectl patch svc elasticsearch-master -n elk \
-p '{"spec": {"type": "LoadBalancer"}}'
4. Get External Access URL
kubectl get svc -n elk
🔗 URL:
https://ab42505e742ba4deab140d74087ed823-28402472.us-east-1.elb.amazonaws.com:9200
5. Verify from Browser / cURL
curl -u elastic:c6WWaP7tt26OGiY8 -k https://ab42505e742ba4deab140d74087ed823-28402472.us-east-1.elb.amazonaws.com:9200
📘 Notes:
I used self-signed certs, hence
-k
flag for cURLElastic Helm chart defaults to TLS enabled, which is ideal for production
Step-by-Step: Deploying Kibana & Verifying ELK Stack Access on Kubernetes (with Load Balancer)
After successfully exposing Elasticsearch, the next phase was to install Kibana and make it externally accessible via a LoadBalancer. Here's how I did it:
✅ Step 1: Install Kibana via Helm with LoadBalancer Enabled
helm install kibana elastic/kibana -n elk \
--set service.type=LoadBalancer \
--set elasticsearchHosts=https://elasticsearch-master:9200 \
--set resources.requests.memory=512Mi \
--set resources.requests.cpu=100m \
--set resources.limits.memory=1Gi \
--set resources.limits.cpu=500m
📝 Notes:
elasticsearchHosts
points to your Elasticsearch service (inside cluster).Resource limits help manage memory/CPU in Kubernetes.
Step 2: Wait for External IP
After a few minutes, I fetched the external IP:
kubectl get svc -n elk
✔️ I saw the EXTERNAL-IP assigned to kibana-kibana
like:
kibana-kibana LoadBalancer 10.100.x.x abcd1234.elb.amazonaws.com 5601:xxxx/TCP
Step 3: Access the Kibana UI
Open your browser:
http://<EXTERNAL-IP>:5601
🧭 You’ll land on the Kibana dashboard, ready to visualize logs and metrics.
(Optional) Configure Public Base URL for Kibana
To avoid redirect issues, especially when accessing Kibana behind a LoadBalancer:
--set kibanaConfig.kibana.yml.server.publicBaseUrl=http://<EXTERNAL-IP>:5601
💡 What’s Next?
Now that both Elasticsearch and Kibana are publicly accessible, you can:
Explore Elasticsearch data in Kibana
Create index patterns and dashboards
Add Filebeat or Logstash to ingest logs
Combine with Grafana dashboards or Prometheus alerts
Filebeat Setup for Real-Time Log Shipping
To send Kubernetes pod logs to Elasticsearch, I used Filebeat + Autodiscover.
✅ Add Elastic Helm repo :
helm repo add elastic https://helm.elastic.co
helm repo update
✅ Created filebeat-values.yaml
for Autodiscover
filebeatConfig:
filebeat.yml: |
filebeat.autodiscover:
providers:
- type: kubernetes
hints.enabled: true
hints.default_config:
type: container
paths:
- /var/log/containers/*.log
output.elasticsearch:
hosts: ["https://elasticsearch-master:9200"]
username: "elastic"
password: "c6WWaP7tt26OGiY8"
ssl.verification_mode: "none"
Saved as filebeat-values.yaml
Install Filebeat
helm install filebeat elastic/filebeat -n elk \
-f filebeat-values.yaml
Checked status:
kubectl get pods -n elk
✅ Filebeat pod was running!
Verifying Logs in Kibana
Once Filebeat was running:
Opened Kibana
Create an Index Pattern:
Go to "Stack Management" → "Index Patterns"
Click "Create index pattern"
Enter:
filebeat-*
Select
@timestamp
as the time filter fieldClick Create index pattern
Go to Discover Tab:
Navigate to “Discover”
Select your new index pattern (
filebeat-*
)You should see logs coming in from your containers!
🎉 I could see real-time logs from Kubernetes pods being shipped to Elasticsearch via Filebeat!
✅ Final Setup Overview
Component | Status |
Elasticsearch | ✅ Running (LB) |
Kibana | ✅ Running (LB) |
Filebeat | ✅ Installed |
Logs in Kibana | ✅ Verified |
Live Kubernetes Logs in Kibana Discover
After successfully installing Filebeat and connecting it to Elasticsearch, I moved to Kibana's Discover tab to view live logs.
📸 Here’s a snapshot of my Kibana dashboard:
As you can see:
I queried the
filebeat-*
indexKibana showed live pod logs from my Kubernetes cluster
The logs contain metadata like:
agent.hostname
: which Filebeat pod shipped the logskubernetes.namespace
:elk
container.name
: which container the log came fromtimestamp
,node
,zone
,topology
, and more
🎉 Success: My entire EKS cluster logs are now searchable and filterable in Kibana.
What’s Happening Behind the Scenes:
Filebeat is running as a DaemonSet in Kubernetes
It autodetects containers using Kubernetes hints
Logs from
/var/log/containers/*.log
are shipped securely to ElasticsearchKibana visualizes and indexes them for querying
Create Your First Visualization
After setting up Filebeat and confirming logs were reaching Elasticsearch, I used Kibana Lens to quickly visualize log volume.
What I Did:
Selected the
filebeat-*
indexUsed
@timestamp
on the X-axisSet Y-axis to show the count of records
Time range: Last 5 minutes
Chart type: Vertical bar
What It Shows:
This chart displays how many logs are coming in every few seconds. It helps confirm:
Filebeat is sending logs continuously
There are no big gaps or sudden spikes
Simple and effective way to monitor log flow in real time. ✅
🍩 Pod-wise Log Distribution (Donut Chart)
To understand which pods are generating the most logs, I created a donut chart in Kibana.
What I Did:
Index pattern:
filebeat-*
Slice by: Top 5 values of
kubernetes.pod.name
Size by: Count of records
Time range: Last 5 minutes
What It Shows:
This chart shows the log volume share from each pod.
For example:
filebeat-filebeat-77fs9
generated 25% of logselasticsearch-master-0
andgrafana
each contributed ~21%Helps spot noisy pods or identify issues quickly
A great way to visually monitor log load across components!
Log Distribution by Pod – Latest View
To track how logs are distributed among the top 5 Kubernetes pods, I generated this donut chart.
Setup:
Index pattern:
filebeat-*
Slice by: Top 5 values of
kubernetes.pod.name
Size by: Count of records
Insights:
filebeat-filebeat-77fs9
contributed the most logs (~37%)elasticsearch-master-0
andfilebeat-filebeat-wg9ws
each logged ~31%This helps quickly identify the busiest pods in terms of logging
👉 Great for spotting potential log flooding or heavy activity.
Pod-Wise Log Distribution – Bar Chart View
To visualize log volume by pod, I created a vertical bar chart using Filebeat data in Kibana.
Configuration:
Index pattern:
filebeat-*
X-axis: Top 5 values of
kubernetes.pod.name
Y-axis: Count of log records
Quick Takeaway:
filebeat-filebeat-77fs9
generated the highest number of logs.Other pods like
filebeat-filebeat-wg9ws
andelasticsearch-master-0
also show steady activity.This helps in quickly identifying which pods are generating the most logs in near real time.
Namespace-Wise Log Activity Over Time
This bar chart shows how log events are distributed across Kubernetes namespaces (elk
, kube-system
, and monitoring
) over time.
Configuration:
Index pattern:
filebeat-*
X-axis:
@timestamp
(interval: 30 seconds)Y-axis: Unique count of
kubernetes.namespace_labels.kubernetes_io/metadata_name
Breakdown: Top 3 values of
kubernetes.namespace
Insights:
Most activity came from the
elk
namespace, which includes Elasticsearch and Kibana.kube-system
andmonitoring
show consistent but lower activity.This helps verify if logs are being collected from all critical namespaces.
Log Count per Namespace Over Time
This visualization shows log traffic trends in the elk
and monitoring
namespaces over the past 15 minutes.
Configuration:
Index pattern:
filebeat-*
X-axis:
@timestamp
(interval: 30 seconds)Y-axis:
Count of records
Breakdown: Top 3 values of
kubernetes.namespace
Key Observations:
The
elk
namespace is consistently generating logs, which makes sense as it runs Elasticsearch and Kibana.The
monitoring
namespace shows periodic spikes, indicating bursts of log activity, possibly from Prometheus or Grafana.
This breakdown helps validate that Filebeat is capturing logs across key namespaces as expected.
Error and Failure Logs Tracked Over Time
This chart filters and visualizes logs that include error or failure indicators such as:
"error" OR log.level = "error" OR "ERR" OR "failed"
Configuration:
Index pattern:
filebeat-*
X-axis:
@timestamp
(30-minute intervals)Y-axis:
Count of records
Breakdown:
Top 3 values of
kubernetes.container.name
Observation:
A spike in error-related logs was observed from the Grafana container during the recent time window.
This helps proactively identify which services are experiencing issues and when.
✅ This kind of filtering and visualization is essential for real-time troubleshooting and alerting.
Final Cleanup (No Billing Left)
Deleted Cluster
eksctl delete cluster --name devops-cluster --region us-east-1
Manually Cleaned:
Helm releases
EBS volumes (via EC2 dashboard)
Load balancers & security groups
IAM roles
CloudWatch log groups
What I Did: E2E Kubernetes Observability Stack in a Day
✅ Why Monitoring & Logging Matter
In DevOps, real-time observability is critical for:
Detecting issues before users do
Troubleshooting failures quickly
Analyzing system health & resource usage
'🗓️ My Goal
Build a complete monitoring + logging setup on Kubernetes using open-source tools and clean it up to avoid billing.
Tools Used
Amazon EKS – Kubernetes cluster on AWS
Helm – Easy deployment of complex apps
Prometheus + Grafana – Metrics monitoring & dashboards
Elasticsearch + Kibana + Filebeat (ELK) – Centralized log aggregation & analysis
Final Thoughts
Setting up an end-to-end observability stack on Kubernetes might seem overwhelming at first — but with the right tools and a structured approach, it becomes manageable and rewarding.
This hands-on exercise helped me:
Understand how logs and metrics flow in real-world clusters
Troubleshoot Helm installation issues
Visualize logs using Kibana and monitor metrics via Grafana
Clean up infrastructure to avoid unnecessary AWS billing
Whether you're learning DevOps or managing production-grade clusters, observability is a skill worth mastering.
🔗 Reference
I referred to this excellent guide during my setup process:
Comprehensive AWS EKS Cluster Monitoring with Prometheus, Grafana, and EFK Stack – 10 Weeks of CloudOps
A big thanks to the author for such a well-structured walkthrough!
Thanks for reading!
If you found this helpful, feel free to connect with me or drop your thoughts in the comments.
Subscribe to my newsletter
Read articles from Bhavya Pasupuleti directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
