2025 AWS Community Day Central Asia Almaty: EKS Cost Optimization Workshop

Maxat AkbanovMaxat Akbanov
17 min read

This blog post reviews the EKS workshop that was held on AWS Community Day Central Asia Conference in Almaty 22-23 August 2025.

Workshop titled as “EKS. Optimization and FinOps. How to optimize resource usage and cut the costs in EKS.” was performed by Alexander Dovnar.

The agenda of the talk discussed the following problems:

  • Issues related with overall costs of running Kubernetes clusters

  • The most popular misconfigurations in clusters that can lead to high costs

  • Ways of autoscaling

  • Monitoring and cost optimization best practices

Problem #1 – No Resource Requests or Limits

  • If you run a container without defining CPU and memory requests/limits, Kubernetes doesn’t know how much resources it needs.

  • Such pods are given the QoS class “BestEffort” – meaning they have no guaranteed resources.

  • Kubernetes can place them anywhere in the cluster, but under heavy load these pods are the first to be evicted (removed).

  • They may also cause trouble for neighboring pods, leading to CPU throttling or memory starvation.

  • Example:

    • You deploy a small “pause” app with no resources set.

    • A backup job also runs with no limits —> it can grab all available RAM, causing your important service to crash.

Takeaway:
👉 Always define CPU and memory requests/limits for your containers.
Otherwise, you risk instability: unpredictable performance, throttling, or even service outages.

X Bad Example – No Resource Limits

This is what happens when you don’t specify any resources:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pause-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pause-app
  template:
    metadata:
      labels:
        app: pause-app
    spec:
      containers:
        - name: pause-container
          image: registry.k8s.io/pause:3.3
          resources: {}   # ⚠️ Nothing defined here
  • QoS Class: BestEffort

  • Problem: No guaranteed CPU or memory.

  • Risk: Can be evicted during load, or neighbor pods can hog all resources.


✅ Good Example – With Resource Requests and Limits

Here’s the recommended way:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pause-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pause-app
  template:
    metadata:
      labels:
        app: pause-app
    spec:
      containers:
        - name: pause-container
          image: registry.k8s.io/pause:3.3
          resources:
            requests:
              cpu: "100m"   # Minimum guaranteed
              memory: "128Mi"
            limits:
              cpu: "200m"   # Hard cap
              memory: "256Mi"
  • QoS Class: Guaranteed (or Burstable, depending on values)

  • Benefit: Pod has stable performance, won’t steal everything from neighbors, and is less likely to be evicted.

What is QoS class “BestEffort“?

In Kubernetes, "BestEffort" is a Quality of Service (QoS) class assigned to Pods. This class is characterized by the absence of any specified CPU or memory requests or limits for any of the Containers within the Pod.

In situations of resource pressure on a node (e.g., memory exhaustion), BestEffort Pods are the first candidates for eviction by the Kubernetes scheduler to free up resources for higher-priority Pods.

This class is suitable for applications that are non-critical and can tolerate interruptions or resource contention, such as development environments, batch jobs, or applications with high fault tolerance.

Problem #2 – Limits Without Requests

At first, you might think: “Well, I at least set limits so the container won’t hog everything. That should be fine, right?”

Not exactly. Here’s what happens:

  • If you only set limits, Kubernetes automatically sets requests equal to limits.

  • This means Kubernetes will reserve the full limit amount for the pod, even if it doesn’t actually need it most of the time.

  • Result: wasted cluster capacity – fewer pods can be scheduled, and your nodes look “full” much sooner.

Example:

  • You set a limit of CPU: 1, Memory: 1Gi but no requests.

  • Kubernetes reserves 1 CPU and 1Gi memory for that pod as if it’s guaranteed usage.

  • Even if your container actually uses only 100m CPU and 128Mi memory, Kubernetes still blocks other pods from scheduling.

This leads to:

  • Lower cluster utilization (you pay for unused resources).

  • More nodes needed (higher cost).


X Example – Limits but No Requests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pause-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pause-app
  template:
    metadata:
      labels:
        app: pause-app
    spec:
      containers:
        - name: pause-container
          image: registry.k8s.io/pause:3.3
          resources:
            limits:
              cpu: "10m"
              memory: "10Mi"
            # ⚠️ No requests defined

👉 Kubernetes will treat it as if requests = limits.
So in this example:

  • requests.cpu = 10m

  • requests.memory = 10Mi

Even though you didn’t explicitly write it.


✅ Best Practice – Define Both Requests and Limits

resources:
  requests:
    cpu: "10m"
    memory: "10Mi"
  limits:
    cpu: "20m"
    memory: "32Mi"
  • Here, the container guarantees the minimum it needs (10m CPU, 10Mi memory).

  • But it’s allowed to burst up to a higher cap (20m CPU, 32Mi memory).

  • This gives flexibility and ensures fair scheduling.


💡 Takeaway

  • Only limits set —> requests = limits (automatic).

  • This makes your cluster look busier than it is, leading to resource waste and higher costs.

  • Always set both requests and limits, with requests reflecting the baseline need and limits allowing some safe headroom.

Problem #3 – No Limits (The “Greedy Neighbor” Problem)

If you run a container without limits, Kubernetes will let it consume as much CPU or memory as it wants on that node.

Here’s why that’s dangerous:

  • No safety net: The Kubernetes API server itself cannot stop a container from taking all available resources, because there’s no defined maximum.

  • Impact on neighbors: Even if other pods have proper requests and limits, they can start suffering because the “unlimited” pod hogs everything.

    • CPU hogging —> throttling (your app slows down).

    • Memory hogging —> evictions (pods get kicked out).

    • Extreme case —> OOMKilled (out-of-memory kills your pods).

  • Real-world effect: A monitoring agent or job suddenly spikes in CPU usage —> all your production workloads experience lag or even crash.


X Example – Pod With No Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greedy-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: greedy-app
  template:
    metadata:
      labels:
        app: greedy-app
    spec:
      containers:
        - name: greedy-container
          image: registry.k8s.io/pause:3.3
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            # ⚠️ No limits defined

👉 Even though the container requests a modest 100m CPU / 128Mi memory, it can spike to use everything available on the node.


✅ Best Practice – Always Define Limits

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "200m"
    memory: "256Mi"
  • Now the container has a guaranteed baseline (100m / 128Mi).

  • But it’s capped from going overboard (200m / 256Mi).

  • This keeps it from harming other workloads on the same node.


💡 Takeaway

  • No limits = danger. A single pod can monopolize the node.

  • Always define limits, even if they’re higher than requests.

  • This protects your neighbors, keeps the cluster stable, and prevents “noisy neighbor” issues.

👉 Think of limits as a circuit breaker for your pods: they keep one runaway container from taking the whole house down.

Problem #4 – Copy-Paste Resources

By now, we know that setting requests and limits is important.
But here’s another trap: many engineers just copy resource configs from another service, a Helm chart, or an online example - without adapting them to their app’s real needs.

Why this is a problem:

  • Not every app is the same. A monitoring agent, a web API, and a database all have very different CPU/memory usage.

  • Overestimating: If you set requests too high (just because you copied from somewhere), Kubernetes may reserve more than necessary → wasting cluster capacity and increasing costs.

  • Underestimating: If you copy very small values, your service may crash under real traffic.

  • CI/CD risk: Copy-pasted values spread quickly through Helm charts and manifests → bad defaults become cluster-wide.

Example:

  • You see a blog post where requests.cpu = 500m and limits.memory = 1Gi.

  • You paste it into your app.

  • But your service actually only needs 50m CPU and 128Mi memory → you end up paying for idle resources.


X Copy-Paste Example

resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "1"
    memory: "2Gi"

👉 These numbers might work for someone else’s app, but they could be way off for yours.


✅ Best Practice – Measure, Don’t Copy

  1. Start small: give your pod a conservative baseline (e.g., 50m CPU, 128Mi memory).

  2. Monitor usage: use tools like kubectl top pod, Prometheus, or metrics in your cloud provider.

  3. Adjust gradually: increase requests/limits if your app regularly hits 80–90% usage.

  4. Automate if possible: consider Kubernetes Vertical Pod Autoscaler (VPA) or cost-optimization tools to recommend right-sized values.

  5. Educate your team: Don’t copy and paste, measure and set instead.


💡 Takeaway

👉 Don’t blindly copy resource configs.
Every application is unique, so resource requests/limits should be tuned to real workload behavior, not borrowed defaults.

Problem #5 – No Cluster-Level Limits

So far we’ve looked at mistakes inside a single pod or deployment.
But what if the entire cluster has no rules about how much CPU or memory different teams or projects can consume?

Here’s what happens:

  • Any developer can deploy a pod with no resources or with huge limits.

  • Kubernetes will happily try to run it, even if it means starving other workloads.

  • One “runaway” service can affect the whole cluster, making it unstable and expensive.

This is especially risky in shared environments where multiple teams or projects use the same cluster.


X Example

  • Team A deploys a pod with limits.cpu = 50 (50 CPUs !!!).

  • Kubernetes doesn’t block it, because there’s no higher-level quota.

  • Suddenly, other workloads get throttled, evicted, or can’t even start.


✅ Best Practices – How to Fix It

  1. Set Defaults with LimitRange

    • Automatically applies default requests and limits to any pod in a namespace that doesn’t specify them.

    • Prevents “BestEffort” pods from sneaking in.

  2. Enforce Quotas with ResourceQuota

    • Controls the total CPU, memory, and pod count per namespace/project.

    • Example: a namespace can’t consume more than 20 CPUs and 64Gi memory.

  3. Use Monitoring + Alerts

    • Watch for quota violations or unusual resource usage.

    • Helps you plan for growth instead of reacting to outages.

  4. Organize by Namespace

    • Assign each team or project its own namespace.

    • Makes it easier to isolate workloads and apply quotas fairly.

  5. Cluster Autoscaler Limits

    • If you use autoscaling (e.g., Karpenter or the Kubernetes Cluster Autoscaler), set maximum node limits.

    • Prevents uncontrolled scaling and runaway costs.


💡 Takeaway

👉 Setting pod-level resources isn’t enough.
You also need cluster-level policies so that no single team, project, or pod can destabilize the entire system.

HPA vs KEDA

In this section, speaker compares HPA (Horizontal Pod Autoscaler) vs KEDA (Kubernetes Event-Driven Autoscaler).

Both HPA and KEDA are tools that automatically scale pods up or down in Kubernetes.
But they differ in how they decide to scale and what use cases they’re best for.


🔹 HPA (Horizontal Pod Autoscaler)

Image source

  • Scaling triggers: Uses internal metrics like CPU or memory usage.

  • Data sources: Can use custom metrics via the Kubernetes API, but setup can be tricky.

  • Queues/events support: ❌ Not supported.

  • Built-in: Comes natively with Kubernetes. No extra installation.

  • Scaling down to zero pods: ❌ Not possible (minimum is 1).

  • Prometheus integration: Possible, but requires custom metric adapters.

  • Best for: Constant, long-running services (e.g., APIs, web apps).


🔹 KEDA (Kubernetes Event-Driven Autoscaler)

Image source

  • Scaling triggers: Uses external events (e.g., messages in a queue, Kafka topics, database load, etc.) in addition to standard HPA metrics.

  • Data sources: ✅ Supports 50+ event sources out of the box (e.g., RabbitMQ, SQS, Kafka, Prometheus).

  • Queues/events support: ✅ Yes. Perfect for event-driven workloads.

  • Installation: Requires installing the KEDA operator.

  • Scaling down to zero pods: ✅ Supported (via ScaledObject configs).

  • Prometheus integration: Much easier — can directly use Prometheus queries.

  • Best for: Event-driven systems, job queues, and serverless-style patterns (scale up on demand, scale to zero when idle).


✅ Takeaway

  • HPA = great for steady, always-on workloads (web servers, APIs).

  • KEDA = great for event-driven workloads (queues, jobs, batch tasks, serverless patterns).

  • Together: They complement each other → giving Kubernetes a flexible and powerful autoscaling system.

Vertical Pod Autoscaler (VPA)

Unlike HPA (which changes the number of pods), the VPA adjusts the resources (CPU/Memory) of existing pods.

Image source: TechOps Examples

🔹 How VPA Works

VPA has three components:

  1. Recommender

    • Collects usage metrics (CPU, memory) from Metrics Server or Prometheus.

    • Suggests new request/limit values.

  2. Updater

    • Decides if a pod’s resources are too far off from the recommended values.

    • If so, it can evict (restart) the pod with new resources.

  3. Admission Controller

    • Applies the recommended requests/limits when new pods are created.

🔹 VPA Workflow

  1. VPA watches pod usage metrics.

  2. After ~2–5 minutes, it starts producing recommendations.

  3. If Auto mode is enabled, VPA may evict pods to apply new resources (this = restarts ⚠️).

  4. Newly created pods start with the updated requests/limits automatically.


🔹 When to Use VPA

Safe / Recommended:

  • Batch jobs & CronJobs —> pods restart regularly anyway.

  • Staging/test environments —> good for finding a baseline.

  • New services —> when you don’t have historical usage data.

  • Non-critical services —> Where small numbers of pod restarts are tolerated.

X Use with Caution (or avoid):

  • Stateful apps (databases, queues, Kafka, etc.) —> pod restarts = data loss or downtime.

  • Critical services —> unexpected evictions may hurt reliability.

  • With HPA on CPU at the same time —> can conflict (VPA adjusts requests, HPA scales based on request %).

  • Web apps with low tolerance to downtimes


🔹 Best Practices

  • Start with Mode: Off —> collect recommendations via:

      kubectl describe vpa <name>
    

    (Check what VPA would recommend without applying it).

  • Don’t enable Auto mode unless you have PodDisruptionBudgets and readinessProbes in place.

  • Always set minAllowed / maxAllowed resources —> prevents outlier spikes.

  • Use VPA to establish baseline requests/limits instead of manual guessing.

👉 Think of VPA as a scalpel: powerful, but dangerous if used carelessly.


🔹 Example VPA Manifest

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: pause-deployment
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pause-deployment
  updatePolicy:
    updateMode: "Off"   # Start with Off: only collect recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: pause-container
      minAllowed:
        cpu: "25m"
        memory: "256Mi"
      maxAllowed:
        cpu: "500m"
        memory: "1Gi"

💡 Takeaway

  • VPA = adjusts pod size, HPA = adjusts pod count.

  • Best for batch jobs, staging, and finding baselines.

  • Risky for stateful or critical services (because of restarts).

  • Use VPA in Off mode first, review metrics, then carefully enable.

Cluster Autoscaler (CAS) vs Karpenter

Both CAS and Karpenter are tools that manage nodes in a Kubernetes cluster. They decide when to add/remove EC2 instances (in AWS) depending on pod scheduling needs.


🔹 Cluster Autoscaler (CAS)

Image source

  • How it works:

    • Watches unschedulable pods.

    • Adds nodes from pre-defined node groups (Auto Scaling Groups).

    • Scales down when nodes are underutilized.

  • Limitations:

    • Bound to fixed node group definitions (size, instance type).

    • Scaling decisions are slower (minutes).

    • Doesn’t optimize across multiple instance types easily.

👉 CAS = stable, battle-tested, but rigid.


🔹 Karpenter

Image source

  • How it works:

    • Looks at unschedulable pods.

    • Dynamically launches the best-fitting EC2 instances (size, family, AZ) in seconds.

    • Can consolidate workloads —> remove underutilized nodes automatically.

  • Benefits:

    • Much faster scaling (seconds, not minutes).

    • Flexibility: no need to predefine rigid node groups.

    • Cost optimization: picks cheapest matching instance types (spot + on-demand mix).

    • Supports workload-specific requirements (GPUs, large-memory nodes).

👉 Karpenter = modern, flexible, cloud-native scaling.


✅ Speaker Prefers Karpenter

  • In most projects today, especially on AWS, Karpenter offers:

    • Lower cost (via right-sized, spot-friendly nodes).

    • Better performance (faster scaling decisions).

    • Less manual work (no need to manage multiple node groups).

  • CAS is still fine for legacy setups or multi-cloud clusters, but Karpenter is becoming the default choice for AWS EKS.


💡 Takeaway

👉 CAS = classic tool, good for simple, stable setups.
👉 Karpenter = modern tool, better for cost savings, flexibility, and fast scaling.

Best Practices Roadmap for Kubernetes Resource Management

Tomorrow – Quick Wins

Goal: Reduce obvious waste and enforce basic guardrails.

  • Measure actual usage

    • kubectl top pods

    • Tools like Kubecost / OpenCost / Grafana (CPU vs requests).

  • Set defaults automatically

    • Enable LimitRange in each namespace —> auto-assign requests/limits to pods.
  • Ban pods without resources

  • Add basic monitoring

    • Prometheus + kube-state-metrics + Grafana dashboards.

    • Start reporting by namespace and labels.

👉 Result: No more “BestEffort” pods, basic visibility into who uses what.


Next Month – Mid-Term Improvements

Goal: Tune workloads based on real data and educate the team.

  • Test workloads under load —> find realistic baselines.

  • Introduce VPA in Off mode to collect recommendations (no auto-evictions yet).

  • Standardization of templates/Helm charts → enforce consistent resource configs.

  • Set up regular resource reviews of usage stats (e.g., every quarter).

  • Add GitOps validation (PR checks for resource configs).

  • Enable ResourceQuotas —> set per-namespace CPU/memory/pod caps.

  • Educate the team with internal docs:

    “How to read metrics and assign resources.”

👉 Result: Resources tuned to real workloads, not guesses. Team starts building a resource-first mindset.


Later – Mature Practices

Goal: Optimize cost and efficiency cluster-wide.

  • Financial planning (chargeback / showback):

    • Transparency —> who uses how many “slices” of the cluster.
  • Policy as Code:

  • Track Key Performance Indicators (KPIs):

    • Efficiency score: actual usage vs requests.

    • Target ratios (e.g., 70–80%).

  • Automated optimization:

    • Combine VPA + custom rules for different workloads.

    • Automated audits integrated into CI/CD pipelines (linters and templates validations).

  • Proactive alerts:

    • Slack/Teams notifications for overspending, idle workloads, or misconfigured resources.

👉 Result: Resources are continuously optimized, costs predictable, teams accountable.

Workshop Scenario

Github repo: github.com/DovnarAlexander/community-day-kz-2025-eks-workshop

  • A small startup runs an EKS cluster with 3 nodes:

    • Each node = c5.4xlarge (16 vCPUs, 32 GB RAM).

    • Running On-Demand in eu-north-1.

    • Cost = $0.728/hour per node —> $2.184/hour total (~$1,576/month).

  • Problem:

    • Cluster cannot support more than 25 replicas of a worker in production, but the team sometimes needs 30+.

    • Nodes are large and expensive.

    • No resource requests/limits, no autoscaling, no binpacking.

    • Costs are high, utilization is poor, scaling is rigid.


Optimizations Covered in the Workshop

1. Karpenter

  • What it does:

    • Replaces static Auto Scaling Groups with dynamic provisioning.

    • Launches nodes on-demand, across multiple instance families/types.

    • Consolidates underutilized nodes and can choose Spot instances for cheaper compute.

  • Impact:

    • Faster scaling (seconds, not minutes).

    • Reduces costs by switching from On-Demand c5.4xlarge —> diversified Spot mix.

    • Example: final nodes include t4g.xlarge, i7ie.large, r8g.medium, etc., at ~$0.35/hour total.

Savings: From $2.18/hour —> $0.35/hour (~84% reduction).


2. Goldilocks (VPA Recommendations)

  • What it does:

    • Uses Vertical Pod Autoscaler (VPA) in recommendation mode and Goldilocks tool.

    • Analyzes real CPU/memory usage and suggests optimal requests/limits.

  • Impact:

    • Avoids over-provisioning.

    • Improves binpacking —> more pods per node.

    • Prevents “greedy” workloads from taking all resources.


3. Resources & Autoscaling

  • What it does:

    • Applies VPA recommendations manually.

    • Adds requests/limits across workloads.

  • Impact:

    • Stabilizes workloads.

    • Improves utilization —> avoids waste.


4. Kube-green

  • What it does:

    • Suspends non-critical workloads (like dev/test pods) during off-hours.
  • Impact:

    • Saves costs outside working hours.

    • Example: no dev pods running overnight/weekends.


5. KEDA (Event-Driven Autoscaling)

  • What it does:

    • Extends HPA with event-driven scaling.

    • Scales pods based on external events (e.g., queue length, SQS, Kafka, Prometheus).

    • Supports scale-to-zero.

  • Impact:

    • Workloads consume resources only when events occur.

    • Avoids idle pods running 24/7.


6. Binpacking

  • What it does:

    • Optimizes pods per node with right-sized requests/limits.

    • Uses smaller, cheaper Spot instances instead of large On-Demand.

  • Impact:

    • Nodes fit workloads more efficiently.

    • Avoids paying for unused headroom in oversized nodes.


7. Kubecost

  • What it does:

    • Provides cost visibility (namespace/team/workload level).

    • Helps track efficiency and spot waste.

  • Impact:

    • Transparency: which workloads drive costs.

    • Enables chargeback/showback later.


Before vs After

MetricBefore (c5.4xlarge × 3, On-Demand)After (mixed Spot, binpacking, autoscaling)
Nodes3 fixedDynamic, multiple smaller Spot instances
Cost per hour$2.184~$0.350
Cost per month~$1,576~$252
ScalingSlow, rigid (25 pods max)Fast, flexible (more pods per node, scale-to-zero)
UtilizationPoorHigh (right-sized workloads, binpacking)

Total savings: ~84% (>$1,300/month).


Key Takeaways

  1. Karpenter is the biggest cost saver —> dynamic Spot nodes reduce infra cost massively.

  2. Goldilocks + VPA ensures workloads aren’t over/under-provisioned.

  3. KEDA adds event-driven scaling —> scale down to zero when idle.

  4. Kube-green cuts costs during off-hours.

  5. Kubecost provides visibility —> essential for ongoing optimization.

  6. Binpacking + smaller nodes improves efficiency.

This mirrors real-world cost optimization journeys:

  • Start with visibility (Kubecost, Grafana).

  • Apply requests/limits (Goldilocks, VPA).

  • Improve node-level efficiency (binpacking, Karpenter).

  • Automate with KEDA + Kube-green.

0
Subscribe to my newsletter

Read articles from Maxat Akbanov directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Maxat Akbanov
Maxat Akbanov

Hey, I'm a postgraduate in Cyber Security with practical experience in Software Engineering and DevOps Operations. The top player on TryHackMe platform, multilingual speaker (Kazakh, Russian, English, Spanish, and Turkish), curios person, bookworm, geek, sports lover, and just a good guy to speak with!