Optimize EC2 Costs in Kubernetes with Karpenter

TL;DR — Optimizing EC2 costs in Kubernetes using Karpenter on AWS EKS involves leveraging auto-scaling features to adjust resource allocation dynamically, reducing operational costs. Key steps include setting up Karpenter, configuring NodeClass and NodePool, and testing the setup with deployments to ensure efficient scaling and cost management. This approach enhances application responsiveness while optimizing cloud infrastructure expenses.

Introduction

In today's digital landscape, where efficiency is paramount, technology teams are tasked with optimizing cloud services without sacrificing the availability, resilience, and quality of applications. This post explores how to reduce costs by optimizing autoscaling in Kubernetes clusters, with a focus on using Karpenter on AWS EKS. While this demonstration is specific to AWS, it's worth noting that Karpenter also supports other cloud platforms like Azure and GCP. Let's delve into the essential resources employed in this guide.

Below are some definitions of the resources that will be used:

EKS:

Elastic Kubernetes Service is a managed Kubernetes service provided by AWS that makes it easier for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. It handles much of the complexity of managing a Kubernetes cluster by automating tasks such as patching, node provisioning, and updates.
Karpenter:

Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler built by AWS. It aims to optimize the provisioning and scaling of compute resources by quickly launching right-sized instances in response to application needs and resource utilization. Karpenter is designed to improve upon the limitations of the Kubernetes Cluster Autoscaler, offering more responsive scaling decisions and better integration with cloud provider capabilities.
EC2:

Elastic Compute Cloud is an AWS service that provides resizable compute capacity in the cloud, designed to make web-scale computing easier for developers. EC2 offers several types of instances optimized for different tasks, and it includes options for cost savings such as:
- On-Demand Instances: Pay for compute capacity by the hour or second (minimum of 60 seconds) with no long-term commitments. This provides flexibility for applications with short-term, spiky, or unpredictable workloads that cannot be interrupted.
- Reserved Instances: Provide a significant discount (up to 75%) compared to On-Demand pricing and are best for applications with steady state or predictable usage.
- Spot Instances: Allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price. Suitable for flexible start and end times, applications that are only feasible at very low compute prices, and users with urgent computing needs for large amounts of additional capacity.
- Savings Plans: Offer significant savings over On-Demand pricing, like Reserved Instances, but with more flexibility in how you use your compute capacity.
- Dedicated Hosts: Physical servers with EC2 instance capacity fully dedicated to your use. They can help you reduce costs by allowing you to use your existing server-bound software licenses.
Helm:

Helm is a package manager for Kubernetes that allows developers to package, configure, and deploy applications and services onto Kubernetes clusters. It uses packages called charts, which are collections of files that describe a related set of Kubernetes resources. Helm helps in managing Kubernetes applications through Helm Charts which simplify the deployment and management of applications on Kubernetes.

Karpenter installation

To install Karpenter on AWS, we need to have an EKS cluster, a role for Karpenter's serviceAccount, another role for Karpenter's custom NodePool, 1 SQS queue, and Helm installed.

Creating an SQS queue for Karpenter

 aws sqs create-queue --queue-name karpenter-interruption-queue --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME}

Creating a role in AWS for Karpenter's custom NodePool

We create our TrustedPolicy. We will save it in a karpenter-nodePool-trust-policy.json file:

        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ec2.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }

We create our role with the previous trustedPolicy by replacing EKS_CLUSTER_NAME:

  aws iam create-role --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --assume-role-policy-document file://karpenter-nodePool-trust-policy.json --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME}

  aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
  aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
  aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Creating a role in AWS for the serviceAccount that will use Karpenter:

Getting EKS OIDC Link:

        aws eks describe-cluster --name ${EKS_CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text

Then we replace the OIDC EKS LINK, AWS_ACCOUNT_ID, and AWS_REGION in our TrustedPolicy. We will save it in a file called karpenter-trust-policy.json:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.XXXX.amazonaws.com/id/XXXXXX" 
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "oidc.eks.XXXXX.amazonaws.com/id/XXXXX:sub": "system:serviceaccount:karpenter:karpenter"
                    }
                }
            }
        ]
    }

Now we create the Karpenter policyController, replacing AWS_REGION, EKS_CLUSTER_NAME, and AWS_ACCOUNT_ID. We will save it in a file called karpenter-policy.json:

https://gist.github.com/stazdx/e79e381fd9a9207f790b616cecfe5679

We create the role by attaching the previously created policies and replacing EKS_CLUSTER_NAME:

   aws iam create-role --role-name karpenterSARole --assume-role-policy-document file://karpenter-trust-policy.json --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME}

   aws iam put-role-policy --role-name karpenterSARole --policy-name KarpenterSAPolicy --policy-document file://karpenter-policy.json

Installing Karpenter using Helm:

export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${EKS_CLUSTER_NAME} --query "cluster.endpoint" --output text)"

helm upgrade --install --namespace karpenter --create-namespace \
  karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version 0.36.0 \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=karpenterSARole \
  --set settings.aws.clusterName=${EKS_CLUSTER_NAME} \
  --set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set defaultProvisioner.create=true \
  --set settings.aws.interruptionQueueName=karpenter-interruption-queue

Creating NodeClass and NodePool

Tagging EKS resources for our NodeClass policy: EKS, VPC, Private Subnets, EKS Security Groups

    aws eks tag-resource --resource-arn ${EKS_ARN} --tags karpenter.sh/discovery=${EKS_CLUSTER_NAME}

    aws ec2 create-tags \
        --resources ${VPC_ID} \
        ${PRIVATE_SUBNET1_ID} ${PRIVATE_SUBNET2_ID} ${PRIVATE_SUBNET3_ID} \
        ${EKS_SG1_ID} ${EKS_SG2_ID} \
        --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME}

Creating NodeClass, this resource defines the EC2 instance family, in our case AL2 which are Amazon Linux 2, as well as the role we previously created. This role will manage Karpenter's EC2 resources, and we can also filter by tags, for us it will be the tag karpenter.sh/discovery: "${EKS_CLUSTER_NAME}". Additionally, we can see the private subnets where Karpenter will deploy the instances it needs, as well as the availability zone they belong to. We will save it as nodeclass.yaml.

 apiVersion: karpenter.k8s.aws/v1beta1
 kind: EC2NodeClass
 metadata:
   name: MyNodeClass
 spec:
   amiFamily: AL2 # Amazon Linux 2
   role: "KarpenterNodeRole-${EKS_CLUSTER_NAME}" # replace with your cluster name
   subnetSelectorTerms:
     - tags:
         karpenter.sh/discovery: "${EKS_CLUSTER_NAME}" # replace with your cluster name
   securityGroupSelectorTerms:
     - tags:
         karpenter.sh/discovery: "${EKS_CLUSTER_NAME}" # replace with your cluster name
 status:
   subnets:
   - id: subnet-XXXXXX
     zone: ${AZ1}
   - id: subnet-XXXXXX
     zone: ${AZ2}
   - id: subnet-XXXXXX
     zone: ${AZ3}

Run:

 kubectl apply -f nodeclass.yaml -n karpenter

Creating NodePool, this resource will allow you to configure the requirements for our EC2 instances. You can choose from various categories, types, families, etc. Karpenter will orchestrate the necessary one based on the cluster load. We will save it as nodepool.yaml.

In this case, we will use EC2 Spot instances since they save us up to 90% of the cost of an EC2 on-demand.

 apiVersion: karpenter.sh/v1beta1
 kind: NodePool
 metadata:
   name: default
 spec:
   template:
     metadata:
       labels:
         app: my-app
     spec:
       requirements:
         - key: kubernetes.io/arch
           operator: In
           values: ["amd64"]
         - key: kubernetes.io/os
           operator: In
           values: ["linux"]
         - key: karpenter.sh/capacity-type
           operator: In
           values: ["spot"] # ["spot", "on-demand"]
         - key: karpenter.k8s.aws/instance-category
           operator: In
           values: ["r"] # ["c", "m", "r"]
         - key: karpenter.k8s.aws/instance-family
           operator: In
           values: ["r6a"] # ["c7a", "m5", "r7a"]
         - key: node.kubernetes.io/instance-type
           operator: In
           values: ["r6a.large", "r6a.xlarge"] # ["c7a.large", "m5.xlarge", "r7a.large"]
         - key: "topology.kubernetes.io/zone"
           operator: In
           values: ["xx-xxxx-xx", "xx-xxxx-yy", "xx-xxxx-zz"] # ["us-east-1a", "us-east-1b", "us-east-1c"]
       nodeClassRef:
         name: MyNodeClass
   disruption:
     consolidationPolicy: WhenUnderutilized
     # consolidationPolicy: WhenEmpty
     # consolidateAfter: 30s
     expireAfter: 720h # 30 * 24h = 720h

Run:

 kubectl apply -f nodepool.yaml -n karpenter

Testing Karpenter

Running a deployment

 cat <<EOF > test.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: test
 spec:
   replicas: 0
   selector:
     matchLabels:
       app: test
   template:
     metadata:
       labels:
         app: test
     spec:
       containers:
         - name: test
           image: nginx
           resources:
             requests:
               cpu: 1
               memory: 1.5Gi
 EOF
 kubectl apply -f test.yaml

Now we will scale this deployment to see how Karpenter does its job. It will deploy new instances due to the demand for resources.
```
 kubectl scale deploy test --replicas=8
```

We see the Karpenter logs in action.

 kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

After seeing how it has scaled, we can delete the deployment so Karpenter can do node downscale.

 kubectl delete deployment test

 kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

Conclusion

In conclusion, implementing Karpenter on AWS EKS is a robust solution for optimizing EC2 costs through efficient auto-scaling. By leveraging different EC2 options like Spot Instances and integrating with Kubernetes, Karpenter enhances the responsiveness and cost-effectiveness of resource allocation. This setup not only reduces operational costs but also ensures that applications run smoothly by dynamically adjusting to workload demands. As cloud technologies evolve, tools like Karpenter represent a significant advancement in managing cloud resources more effectively, making them indispensable for businesses looking to optimize their cloud infrastructure.

How to Optimize EC2 Costs with Auto-Scaling in Kubernetes Using Karpenter on EKS

Table of contents