How to Optimize EC2 Costs with Auto-Scaling in Kubernetes Using Karpenter on EKS
TL;DR — Optimizing EC2 costs in Kubernetes using Karpenter on AWS EKS involves leveraging auto-scaling features to adjust resource allocation dynamically, reducing operational costs. Key steps include setting up Karpenter, configuring NodeClass and NodePool, and testing the setup with deployments to ensure efficient scaling and cost management. This approach enhances application responsiveness while optimizing cloud infrastructure expenses.
Introduction
In today's digital landscape, where efficiency is paramount, technology teams are tasked with optimizing cloud services without sacrificing the availability, resilience, and quality of applications. This post explores how to reduce costs by optimizing autoscaling in Kubernetes clusters, with a focus on using Karpenter on AWS EKS. While this demonstration is specific to AWS, it's worth noting that Karpenter also supports other cloud platforms like Azure and GCP. Let's delve into the essential resources employed in this guide.
Below are some definitions of the resources that will be used:
EKS:
Elastic Kubernetes Service is a managed Kubernetes service provided by AWS that makes it easier for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. It handles much of the complexity of managing a Kubernetes cluster by automating tasks such as patching, node provisioning, and updates.
-
Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler built by AWS. It aims to optimize the provisioning and scaling of compute resources by quickly launching right-sized instances in response to application needs and resource utilization. Karpenter is designed to improve upon the limitations of the Kubernetes Cluster Autoscaler, offering more responsive scaling decisions and better integration with cloud provider capabilities.
EC2:
Elastic Compute Cloud is an AWS service that provides resizable compute capacity in the cloud, designed to make web-scale computing easier for developers. EC2 offers several types of instances optimized for different tasks, and it includes options for cost savings such as:
On-Demand Instances: Pay for compute capacity by the hour or second (minimum of 60 seconds) with no long-term commitments. This provides flexibility for applications with short-term, spiky, or unpredictable workloads that cannot be interrupted.
Reserved Instances: Provide a significant discount (up to 75%) compared to On-Demand pricing and are best for applications with steady state or predictable usage.
Spot Instances: Allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price. Suitable for flexible start and end times, applications that are only feasible at very low compute prices, and users with urgent computing needs for large amounts of additional capacity.
Savings Plans: Offer significant savings over On-Demand pricing, like Reserved Instances, but with more flexibility in how you use your compute capacity.
Dedicated Hosts: Physical servers with EC2 instance capacity fully dedicated to your use. They can help you reduce costs by allowing you to use your existing server-bound software licenses.
Helm:
Helm is a package manager for Kubernetes that allows developers to package, configure, and deploy applications and services onto Kubernetes clusters. It uses packages called charts, which are collections of files that describe a related set of Kubernetes resources. Helm helps in managing Kubernetes applications through Helm Charts which simplify the deployment and management of applications on Kubernetes.
Karpenter installation
To install Karpenter on AWS, we need to have an EKS cluster, a role for Karpenter's serviceAccount, another role for Karpenter's custom NodePool, 1 SQS queue, and Helm installed.
Creating an SQS queue for Karpenter
aws sqs create-queue --queue-name karpenter-interruption-queue --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME}
Creating a role in AWS for Karpenter's custom NodePool
We create our TrustedPolicy. We will save it in a
karpenter-nodePool-trust-policy.json
file:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
We create our role with the previous trustedPolicy by replacing
EKS_CLUSTER_NAME
:aws iam create-role --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --assume-role-policy-document file://karpenter-nodePool-trust-policy.json --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME} aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy aws iam attach-role-policy --role-name KarpenterNodeRole-${EKS_CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Creating a role in AWS for the serviceAccount that will use Karpenter:
Getting EKS OIDC Link:
aws eks describe-cluster --name ${EKS_CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text
Then we replace the
OIDC EKS LINK
,AWS_ACCOUNT_ID
, andAWS_REGION
in our TrustedPolicy. We will save it in a file calledkarpenter-trust-policy.json
:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.XXXX.amazonaws.com/id/XXXXXX" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.eks.XXXXX.amazonaws.com/id/XXXXX:sub": "system:serviceaccount:karpenter:karpenter" } } } ] }
Now we create the Karpenter policyController, replacing
AWS_REGION
,EKS_CLUSTER_NAME
, andAWS_ACCOUNT_ID
. We will save it in a file calledkarpenter-policy.json
:
We create the role by attaching the previously created policies and replacing
EKS_CLUSTER_NAME
:aws iam create-role --role-name karpenterSARole --assume-role-policy-document file://karpenter-trust-policy.json --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME} aws iam put-role-policy --role-name karpenterSARole --policy-name KarpenterSAPolicy --policy-document file://karpenter-policy.json
- Installing
Karpenter
usingHelm
:
- Installing
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${EKS_CLUSTER_NAME} --query "cluster.endpoint" --output text)"
helm upgrade --install --namespace karpenter --create-namespace \
karpenter oci://public.ecr.aws/karpenter/karpenter \
--version 0.36.0 \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=karpenterSARole \
--set settings.aws.clusterName=${EKS_CLUSTER_NAME} \
--set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
--set defaultProvisioner.create=true \
--set settings.aws.interruptionQueueName=karpenter-interruption-queue
Creating NodeClass and NodePool
Tagging EKS resources for our NodeClass policy:
EKS, VPC, Private Subnets, EKS Security Groups
aws eks tag-resource --resource-arn ${EKS_ARN} --tags karpenter.sh/discovery=${EKS_CLUSTER_NAME} aws ec2 create-tags \ --resources ${VPC_ID} \ ${PRIVATE_SUBNET1_ID} ${PRIVATE_SUBNET2_ID} ${PRIVATE_SUBNET3_ID} \ ${EKS_SG1_ID} ${EKS_SG2_ID} \ --tags Key=karpenter.sh/discovery,Value=${EKS_CLUSTER_NAME}
Creating NodeClass, this resource defines the EC2 instance family, in our case
AL2
which are Amazon Linux 2, as well as the role we previously created. This role will manage Karpenter's EC2 resources, and we can also filter by tags, for us it will be the tagkarpenter.sh/discovery: "${EKS_CLUSTER_NAME}"
. Additionally, we can see the private subnets where Karpenter will deploy the instances it needs, as well as the availability zone they belong to. We will save it asnodeclass.yaml
.apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: MyNodeClass spec: amiFamily: AL2 # Amazon Linux 2 role: "KarpenterNodeRole-${EKS_CLUSTER_NAME}" # replace with your cluster name subnetSelectorTerms: - tags: karpenter.sh/discovery: "${EKS_CLUSTER_NAME}" # replace with your cluster name securityGroupSelectorTerms: - tags: karpenter.sh/discovery: "${EKS_CLUSTER_NAME}" # replace with your cluster name status: subnets: - id: subnet-XXXXXX zone: ${AZ1} - id: subnet-XXXXXX zone: ${AZ2} - id: subnet-XXXXXX zone: ${AZ3}
Run:
kubectl apply -f nodeclass.yaml -n karpenter
Creating NodePool, this resource will allow you to configure the requirements for our EC2 instances. You can choose from various categories, types, families, etc. Karpenter will orchestrate the necessary one based on the cluster load. We will save it as
nodepool.yaml
.In this case, we will use
EC2 Spot instances
since they save us up to 90% of the cost of an EC2 on-demand.apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: default spec: template: metadata: labels: app: my-app spec: requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: kubernetes.io/os operator: In values: ["linux"] - key: karpenter.sh/capacity-type operator: In values: ["spot"] # ["spot", "on-demand"] - key: karpenter.k8s.aws/instance-category operator: In values: ["r"] # ["c", "m", "r"] - key: karpenter.k8s.aws/instance-family operator: In values: ["r6a"] # ["c7a", "m5", "r7a"] - key: node.kubernetes.io/instance-type operator: In values: ["r6a.large", "r6a.xlarge"] # ["c7a.large", "m5.xlarge", "r7a.large"] - key: "topology.kubernetes.io/zone" operator: In values: ["xx-xxxx-xx", "xx-xxxx-yy", "xx-xxxx-zz"] # ["us-east-1a", "us-east-1b", "us-east-1c"] nodeClassRef: name: MyNodeClass disruption: consolidationPolicy: WhenUnderutilized # consolidationPolicy: WhenEmpty # consolidateAfter: 30s expireAfter: 720h # 30 * 24h = 720h
Run:
kubectl apply -f nodepool.yaml -n karpenter
Testing Karpenter
Running a
deployment
cat <<EOF > test.yaml apiVersion: apps/v1 kind: Deployment metadata: name: test spec: replicas: 0 selector: matchLabels: app: test template: metadata: labels: app: test spec: containers: - name: test image: nginx resources: requests: cpu: 1 memory: 1.5Gi EOF kubectl apply -f test.yaml
Now we will scale this deployment to see how Karpenter does its job. It will deploy new instances due to the demand for resources.
kubectl scale deploy test --replicas=8
We see the Karpenter logs in action.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
After seeing how it has scaled, we can delete the deployment so Karpenter can do node downscale.
kubectl delete deployment test kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
Conclusion
In conclusion, implementing Karpenter on AWS EKS is a robust solution for optimizing EC2 costs through efficient auto-scaling. By leveraging different EC2 options like Spot Instances and integrating with Kubernetes, Karpenter enhances the responsiveness and cost-effectiveness of resource allocation. This setup not only reduces operational costs but also ensures that applications run smoothly by dynamically adjusting to workload demands. As cloud technologies evolve, tools like Karpenter represent a significant advancement in managing cloud resources more effectively, making them indispensable for businesses looking to optimize their cloud infrastructure.
Subscribe to my newsletter
Read articles from Staz directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by