Karpenter on EKS: Beginner's Step-by-Step Guide

You know that feeling when your phone buzzes at 3 AM with alerts about insufficient capacity? When you're frantically trying to scale your Kubernetes nodes while half-asleep, desperately hoping your application doesn't crash before the new instances come online?

That's exactly what drove me to Karpenter last year. After one particularly brutal week of scaling nightmares that had me questioning my career choices, I finally decided enough was enough.

What is Karpenter (and why should you care)?

In simple terms, Karpenter is like having a super smart assistant who watches your Kubernetes cluster and automatically adds or removes nodes based on what your applications need. No more manually scaling or waking up to those dreaded "insufficient capacity" alerts!

Unlike the default Kubernetes cluster autoscaler (which just scales pre-defined node groups up or down), Karpenter is much smarter:

It creates exactly the right type of nodes for your specific workloads
It can choose the most cost-effective instance types automatically
It scales down nodes when they're no longer needed
It responds to pending pods directly, not just CPU/memory metrics

I've been using Karpenter in production for about a year now, and the difference is night and day. My clusters scale faster, we waste less money on idle resources, and I sleep better knowing Karpenter has my back.

Prerequisites - What You'll Need

Before we get our hands dirty, make sure you have:

An existing EKS cluster (with at least one node group and two worker nodes)
OIDC provider configured for your cluster
AWS CLI installed and configured with appropriate permissions
kubectl configured to talk to your cluster
eksctl (v0.202.0 or later)
Helm installed

Here's how to verify each prerequisite:

# Check AWS CLI
aws --version
# Should return aws-cli/2.x.x or higher

# Check kubectl
kubectl version --client
# Should return v1.21.x or higher

# Check eksctl
eksctl version
# Should return at least 0.202.0

# Check Helm
helm version
# Should return v3.x.x

# Verify your AWS credentials
aws sts get-caller-identity
# Should return your account ID, user ID, and ARN

# Check your EKS cluster status
aws eks describe-cluster --name <your-cluster-name> --query "cluster.status"
# Should return "ACTIVE"

# Verify OIDC provider
aws eks describe-cluster --name <your-cluster-name> --query "cluster.identity.oidc.issuer"
# Should return a URL - if empty, you need to create an OIDC provider

Karpenter CLI Deployment
├── Prerequisites Verification
│   ├── EKS cluster status
│   ├── Node groups existence
│   └── OIDC provider check
├── IAM Configuration
│   ├── Create Controller Role
│   │   ├── Trust policy
│   │   └── Required policies
│   └── Create Instance Profile
│       └── Attach Controller Role
├── Helm Installation
│   ├── Add Karpenter repo
│   └── Install with cluster config
├── Provisioner Setup
│   ├── Create Provisioner CRD
│   ├── Create AWSNodeTemplate
│   └── Tag subnets/SGs
└── Verification & Testing
    ├── Check pods/logs
    ├── Verify metrics
    └── Test scaling

Deployment Options: CLI vs Console

There are two ways to deploy Karpenter: using the AWS CLI or the AWS Management Console. I'll cover both approaches, so you can choose whatever makes you most comfortable.

Option 1: Deploying Karpenter Using AWS CLI

Step 1: Set Environment Variables

First, let's set some environment variables that we'll use throughout the installation:

export CLUSTER_NAME=<your-cluster-name>
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=$(aws configure get region)
export OIDC_PROVIDER=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
export CLUSTER_ENDPOINT=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)

Step 2: Set up IAM Roles and Policies

This is the trickiest part of the whole process, but it's super important. Karpenter needs two IAM roles:

Create the Node Role

First, create a file named trust-policy-node.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then run:

# Create the role
aws iam create-role \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --assume-role-policy-document file://trust-policy-node.json

# Attach necessary policies
aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

The SSM policy is particularly important as it provides systems manager access to your nodes, which helps with troubleshooting.

Create the Controller Role

Create a file named trust-policy-controller.json:

cat << EOF > trust-policy-controller.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:kube-system:karpenter",
          "${OIDC_PROVIDER}:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
EOF

Now create the policy:

cat << EOF > karpenter-controller-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate",
        "ec2:CreateFleet",
        "ec2:RunInstances",
        "ec2:CreateTags",
        "ec2:TerminateInstances",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeInstances",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeInstanceTypeOfferings",
        "ec2:DescribeAvailabilityZones",
        "ec2:DeleteLaunchTemplate",
        "ec2:DescribeSpotPriceHistory",
        "iam:PassRole",
        "iam:CreateServiceLinkedRole",
        "sqs:SendMessage",
        "sqs:GetQueueUrl",
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage"
      ],
      "Resource": "*"
    }
  ]
}
EOF

# Create the policy
aws iam create-policy \
  --policy-name KarpenterControllerPolicy-${CLUSTER_NAME} \
  --policy-document file://karpenter-controller-policy.json

# Create the role
aws iam create-role \
  --role-name KarpenterControllerRole-${CLUSTER_NAME} \
  --assume-role-policy-document file://trust-policy-controller.json

# Attach the policy
aws iam attach-role-policy \
  --role-name KarpenterControllerRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}

This policy grants Karpenter the permissions to create and manage EC2 instances. It needs to identify available instance types, create them, tag them, and terminate them when no longer needed.

Create an Instance Profile for the Node Role

aws iam create-instance-profile \
  --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME}

aws iam add-role-to-instance-profile \
  --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --role-name KarpenterNodeRole-${CLUSTER_NAME}

The instance profile is what EC2 instances use to assume the role. Karpenter will reference this profile when launching new instances.

Step 3: Tag Your Subnets and Security Groups

Karpenter needs to find your cluster resources. Let's tag them:

# Tag subnets
SUBNET_IDS=$(aws ec2 describe-subnets \
    --filters "Name=tag:kubernetes.io/cluster/${CLUSTER_NAME},Values=owned" \
    --query "Subnets[].SubnetId" \
    --output text)

for SUBNET_ID in ${SUBNET_IDS[@]}; do
    aws ec2 create-tags \
        --resources ${SUBNET_ID} \
        --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}"
done

# Tag security groups
SG_ID=$(aws ec2 describe-security-groups \
    --filters "Name=tag:kubernetes.io/cluster/${CLUSTER_NAME},Values=owned" \
    --query "SecurityGroups[].GroupId" \
    --output text)

aws ec2 create-tags \
    --resources ${SG_ID} \
    --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}"

These discovery tags help Karpenter identify which AWS resources belong to your cluster.

Step 4: Update the aws-auth ConfigMap

Karpenter's nodes need permission to join your cluster:

kubectl edit configmap aws-auth -n kube-system

Add this entry under the mapRoles section:

- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}
  username: system:node:{{EC2PrivateDNSName}}

Save and exit the editor. This authorizes EC2 instances with the KarpenterNodeRole to join your cluster as worker nodes.

Step 5: Install Karpenter with Helm

Now we'll install Karpenter:

helm repo add karpenter https://charts.karpenter.sh
helm repo update

helm upgrade --install karpenter karpenter/karpenter \
  --namespace kube-system \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME} \
  --set settings.aws.clusterName=${CLUSTER_NAME} \
  --set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
  --create-namespace \
  --wait

This installs the Karpenter controller and related components from the official Helm chart.

Step 6: Configure Karpenter NodePool and EC2NodeClass

Now let's tell Karpenter how we want our nodes to be created. Create a file named nodepool.yaml:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
  limits:
    resources:
      cpu: "100"
      memory: 100Gi
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

And a file named ec2nodeclass.yaml:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  role: KarpenterNodeRole-${CLUSTER_NAME}
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"

Apply them both:

kubectl apply -f nodepool.yaml
kubectl apply -f ec2nodeclass.yaml

This NodePool configuration allows Karpenter to use both Spot and On-Demand instances without restricting instance types, giving it maximum flexibility to find the most suitable and cost-effective instances.

Step 7: Verify Your Installation

Let's make sure everything's running properly:

# Check Karpenter pods
kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter

# Check logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller

# Check that your NodePool and EC2NodeClass are created
kubectl get nodepools
kubectl get ec2nodeclasses

Step 8: Test Karpenter in Action!

Time for the moment of truth! Let's create a deployment that will trigger Karpenter to create some nodes.

Create a file named inflate.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 5
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
              memory: 1Gi

Apply it and watch the magic happen:

kubectl apply -f inflate.yaml

# Watch Karpenter logs
kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

# In another terminal, watch nodes being created
kubectl get nodes -w

You should see Karpenter spring into action, creating nodes to run your new pods. Once the nodes are created, check that your pods are running:

kubectl get pods -o wide

The coolest part to watch in the logs is how Karpenter evaluates different instance types and chooses the most efficient one for your workload. This is far more intelligent than the standard cluster autoscaler.

Option 2: Deploying Karpenter Using AWS Console

Karpenter UI Deployment
├── AWS Console Navigation
│   ├── IAM Section
│   │   ├── Create Web Identity Role
│   │   └── Configure Instance Profile
│   └── EKS Section
│       └── Verify Cluster Details
├── Helm Installation (CLI required)
│   ├── Repository setup
│   └── Chart installation
├── Provisioner Configuration
│   ├── YAML application
│   └── Resource Tagging
│       ├── Subnets
│       └── Security Groups
└── Monitoring
    ├── CloudWatch Metrics
    └── EKS Node View

If you prefer clicking through the AWS Console instead of typing commands, here's how to do it:

Step 1: Create IAM Roles via Console

Create the Node Role:

Go to the AWS Console, navigate to IAM service
Click "Roles" in the left sidebar, then "Create role"
Select "AWS service" as the trusted entity and "EC2" as the use case
Click "Next"
Search for and select these policies:
- AmazonEKSWorkerNodePolicy
- AmazonEKS_CNI_Policy
- AmazonEC2ContainerRegistryReadOnly
- AmazonSSMManagedInstanceCore
Click "Next", name your role "KarpenterNodeRole-<your-cluster-name>"
Click "Create role"

Create the Controller Role:

In IAM, click "Create role"
Select "Web identity"
For Identity provider, select the OIDC provider for your EKS cluster
For Audience, enter "sts.amazonaws.com"
Click "Next"
On the permissions page, click "Next" (we'll attach a policy later)
Name the role "KarpenterControllerRole-<your-cluster-name>" and create it

Create the Controller Policy:

In IAM, go to "Policies" and click "Create policy"
Select the JSON tab
Paste the same policy JSON we used in the CLI section
Click "Next", name it "KarpenterControllerPolicy-<your-cluster-name>"
Click "Create policy"

Attach the Policy to the Role:

Go back to "Roles" and find "KarpenterControllerRole-<your-cluster-name>"
Click on it, then click "Add permissions" and "Attach policies"
Search for and select "KarpenterControllerPolicy-<your-cluster-name>"
Click "Attach policies"

Create an Instance Profile:

Creating an instance profile through the console requires AWS CloudShell:

Open CloudShell from the AWS Console (icon in the top navigation bar)
Run these commands:

export CLUSTER_NAME=<your-cluster-name>
aws iam create-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME}
aws iam add-role-to-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME} --role-name KarpenterNodeRole-${CLUSTER_NAME}

Step 2: Tag Subnets and Security Groups

Go to EC2 service in the AWS Console
Click "Subnets" in the left sidebar
Filter by typing your cluster name
Select each subnet associated with your cluster
Click the "Tags" tab, then "Add/Edit tags"
Add a tag with key "karpenter.sh/discovery" and value "<your-cluster-name>"
Click "Save"
Repeat for Security Groups: go to "Security Groups" in the left sidebar, filter, select, and add the same tag

These discovery tags are critical - they tell Karpenter which resources belong to your cluster.

Step 3: Update aws-auth ConfigMap

The easiest way to do this is through CloudShell:

Open CloudShell from the AWS Console
Run: kubectl edit configmap aws-auth -n kube-system
Add the same entry as in the CLI section under mapRoles
Save and exit

If you skip this step, your nodes won't be able to join the cluster, even though Karpenter will successfully launch them.

Step 4: Install Karpenter with Helm

Again using CloudShell:

export CLUSTER_NAME=<your-cluster-name>
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export CLUSTER_ENDPOINT=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)

helm repo add karpenter https://charts.karpenter.sh
helm repo update

helm upgrade --install karpenter karpenter/karpenter \
  --namespace kube-system \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME} \
  --set settings.aws.clusterName=${CLUSTER_NAME} \
  --set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
  --create-namespace \
  --wait

Step 5: Configure and Verify Karpenter

Create and apply the same nodepool.yaml and ec2nodeclass.yaml files in CloudShell as shown in the CLI section.

Step 6: Test Karpenter

Follow the same testing procedure as in the CLI section, using CloudShell to create and apply the inflate.yaml deployment.

Beyond the Basics: Optimizing Your Karpenter Configuration

Now that you've got Karpenter working, let me share a few tips from my experience:

Spot Instances for Cost Savings

Want to save money? Modify your NodePool to use spot instances:

spec:
  template:
    spec:
      requirements:
        # Other requirements...
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]

I've seen this cut our EC2 costs by up to 70%! But be aware that Spot instances can be reclaimed with minimal notice, so make sure your applications can handle interruptions gracefully.

Instance Type Filtering

Karpenter will choose from all available instance types by default, but you can limit it:

spec:
  template:
    spec:
      requirements:
        # Other requirements...
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "r5.large", "r5.xlarge"]

This is great when you know certain instance types work well for your workloads. In our production environment, we found that for our specific applications, a mix of memory-optimized and compute-optimized instances gave us the best performance/cost ratio.

Fast Node Termination

Karpenter can quickly remove empty nodes when they're no longer needed:

spec:
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

This ensures you're not paying for idle capacity. I recommend setting this to a value that makes sense for your workload patterns - if you have frequently fluctuating load, a short value like 30s works well, but if your load is more stable with occasional spikes, you might want a longer value like 10m to avoid frequent scale up/down cycles.

Common Issues and Troubleshooting

I've hit a few bumps along the way with Karpenter, so let me save you some time:

Nodes Not Joining the Cluster

If Karpenter provisions EC2 instances but they don't join your cluster:

Check your aws-auth ConfigMap
Verify that your KarpenterNodeRole has the right permissions
Look at the EC2 instance's system logs in the AWS Console

One time I spent hours debugging this only to find I had a typo in the role ARN in the aws-auth ConfigMap. The instances were launching just fine, but they couldn't authenticate to the cluster!

Provisioning the Wrong Instance Types

If Karpenter is selecting instance types you don't want:

Add more specific requirements to your NodePool
Check if you're hitting capacity constraints in your region

We once had an issue where Karpenter kept launching t3.micro instances for memory-intensive workloads. Turns out we needed to be more explicit about our memory requirements in the pod specs.

Controller Not Starting

If the Karpenter controller pod isn't starting:

Check the pod logs: kubectl logs -n kube-system karpenter-xxxxx -c controller
Verify your IAM roles and trust policies
Make sure your OIDC provider is configured correctly

A quick kubectl describe pod -n kube-system <karpenter-pod-name> can often reveal permission issues or configuration problems.

Conclusion

Karpenter has truly been a game-changer for me and my team. We've gone from manually managing node groups and dealing with scaling headaches to simply letting Karpenter handle everything. Our applications scale faster, we use fewer resources, and we've reduced our AWS bill significantly.

If you're just starting with Kubernetes and AWS, Karpenter might seem like an advanced topic - and it is! - but the benefits are well worth the investment. Take it slow, follow this guide step by step, and don't be afraid to experiment in a test cluster first.

For more information, check out:

Have you tried Karpenter yet? What has your experience been like? I'd love to hear from you about your own Kubernetes scaling journeys.

Happy scaling! ⚡

Deploying Karpenter on EKS: A Beginner's Guide That Grows With You

Table of contents