Part 1: From Zero to Production – Build a Scalable Amazon EKS Cluster with Terraform

Welcome to the first article of this Amazon EKS Production-Ready Series! In this hands-on guide, we’ll build a fully functional and scalable Amazon EKS (Elastic Kubernetes Service) cluster using Terraform. From configuring the network to deploying the control plane and worker nodes, this tutorial is a must-read for DevOps engineers and cloud professionals who want to set up Kubernetes in a real-world, production-grade environment.

Whether you're a DevOps engineer, cloud architect, or a Kubernetes enthusiast, this series is designed to enhance your skills and help you deploy like a proβ€”with confidence, automation, and scalability built in from the beginning.


🧱 Goal of This Article

In this tutorial series, we'll create a production-grade EKS cluster using Terraform. This includes:

  • VPC

  • Subnets (public and private)

  • NAT Gateway & Internet Gateway

  • Routing tables

  • EKS Control Plane

  • Node Group with IAM roles

This setup mirrors what many real-world production environments use to balance scalability, high availability, and security.


βœ… Prerequisites

Before diving into the Terraform configuration, ensure you have the following set up:

πŸ”Ή AWS Account – with appropriate IAM permissions to create VPC, EKS, IAM roles, etc.
πŸ”Ή AWS CLI – Installed and configured using aws configure.
πŸ”Ή Terraform CLI – Version >= 1.0
πŸ”Ή kubectl – Kubernetes command-line tool to interact with the EKS cluster.
πŸ”Ή IAM user or role – With full access to EKS, EC2, IAM, and VPC services.

Once the prerequisites are ready, you're good to go! βœ…


πŸ“ Project Structure

terraform-eks-production-cluster/
β”œβ”€β”€ 0-locals.tf                   # Reusable local variables (env, region, AZs, etc.)
β”œβ”€β”€ 1-providers.tf               # Terraform & AWS provider configuration
β”œβ”€β”€ 2-vpc.tf                     # VPC resource
β”œβ”€β”€ 3-igw.tf                     # Internet Gateway
β”œβ”€β”€ 4-subnets.tf                 # Public & private subnets across 2 AZs
β”œβ”€β”€ 5-nat.tf                     # NAT Gateway + Elastic IP for private subnet egress
β”œβ”€β”€ 6-routes.tf                  # Route tables and subnet associations
β”œβ”€β”€ 7-eks.tf                     # EKS control plane + IAM role
β”œβ”€β”€ 8-nodes.tf                   # EKS managed node group + IAM role for nodes
β”œβ”€β”€ iam/
β”‚   └── AWSLoadBalancerController.json    # IAM policy for ALB controller
β”œβ”€β”€ values/
β”‚   β”œβ”€β”€ metrics-server.yaml               # Helm values for Metrics Server
β”‚   └── nginx-ingress.yaml                # Helm values for NGINX Ingress
└── .gitignore                  # Ignore Terraform state, .terraform, secrets, etc.

πŸ”— GitHub Repository: https://github.com/neamulkabiremon/terraform-eks-production-cluster.git

πŸ”§ Step-by-Step Explanation of Each Terraform File

βœ… 0-locals.tf

Defines centralized reusable variables:

locals {
  env         = "production"
  region      = "us-east-1"
  zone1       = "us-east-1a"
  zone2       = "us-east-1b"
  eks_name    = "demo"
  eks_version = "1.30"
}

We define all reusable values here. Think of this as your centralized configuration:

  • env: your environment name (e.g., staging, production)

  • region: AWS region to deploy resources

  • zone1 & zone2: AZs for high availability

  • eks_name: cluster name

  • eks_version: EKS Kubernetes version

These values are used throughout other resources to avoid duplication and support easy environment changes.

βœ… 1-providers.tf

Specifies the AWS provider and Terraform version:

provider "aws" {
  region = "us-east-1"
}

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.49"
    }
  }
}

Declares:

  • The AWS provider region (could be moved to locals)

  • Terraform version

  • AWS provider version pinning β€” avoids unexpected breaking changes in future versions.

This ensures your Terraform setup uses compatible versions of both AWS and Terraform.

βœ… 2-vpc.tf

Creates a VPC with DNS support for Kubernetes:

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"

  enable_dns_support = true
  enable_dns_hostnames = true

  tags = {
    Name = "${local.env}-main"
  }
}

Creates a Virtual Private Cloud:

  • CIDR: 10.0.0.0/16, big enough for multiple subnets

  • Enables DNS support and hostname resolution (essential for service discovery in Kubernetes).

βœ… 3-igw.tf

Provisioning an Internet Gateway to expose public subnets:

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${local.env}-igw"
  }
}

An Internet Gateway allows public subnets to access the internet. We attach it to the VPC and tag it for visibility.

βœ… 4-subnets.tf

Creates 4 subnets: 2 private, 2 public across 2 zones:

# Sample from private_zone1
resource "aws_subnet" "private_zone1" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.0.0/19"
  availability_zone = local.zone1

  tags = {
    "Name"                                                 = "${local.env}-private-${local.zone1}"
    "kubernetes.io/role/internal-elb"                      = "1"
    "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned"
  }
}

resource "aws_subnet" "private_zone2" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.32.0/19"
  availability_zone = local.zone2

  tags = {
    "Name"                                                 = "${local.env}-private-${local.zone2}"
    "kubernetes.io/role/internal-elb"                      = "1"
    "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned"
  }
}

resource "aws_subnet" "public_zone1" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.64.0/19"
  availability_zone       = local.zone1
  map_public_ip_on_launch = true

  tags = {
    "Name"                                                 = "${local.env}-public-${local.zone1}"
    "kubernetes.io/role/elb"                               = "1"
    "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned"
  }
}

resource "aws_subnet" "public_zone2" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.96.0/19"
  availability_zone       = local.zone2
  map_public_ip_on_launch = true

  tags = {
    "Name"                                                 = "${local.env}-public-${local.zone2}"
    "kubernetes.io/role/elb"                               = "1"
    "kubernetes.io/cluster/${local.env}-${local.eks_name}" = "owned"
  }
}
  • 2 private: For worker nodes (to keep them secure)

  • 2 public: For NAT Gateway and ingress/egress traffic

Public subnets also enable map_public_ip_on_launch = true.

We also apply AWS & Kubernetes-specific tags:

  • Tags like kubernetes.io/role/internal-elb allow ALB controllers to auto-discover these subnets.

  • map_public_ip_on_launch is enabled for public subnets.

βœ… 5-nat.tf

Adds NAT Gateway and Elastic IP for private subnet internet access:

resource "aws_eip" "nat" {
  domain = "vpc"

  tags = {
    Name = "${local.env}-nat"
  }
}

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public_zone1.id

  tags = {
    Name = "${local.env}-nat"
  }

  depends_on = [aws_internet_gateway.igw]
}

Private subnets can’t reach the internet unless we:

  1. Allocate an Elastic IP

  2. Create a NAT Gateway in a public subnet

This setup ensures outbound internet access without exposing workloads.

βœ… 6-routes.tf

Defines route tables and subnet associations:

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat.id
  }

  tags = {
    Name = "${local.env}-private"
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }

  tags = {
    Name = "${local.env}-public"
  }
}

resource "aws_route_table_association" "private_zone1" {
  subnet_id      = aws_subnet.private_zone1.id
  route_table_id = aws_route_table.private.id
}

resource "aws_route_table_association" "private_zone2" {
  subnet_id      = aws_subnet.private_zone2.id
  route_table_id = aws_route_table.private.id
}

resource "aws_route_table_association" "public_zone1" {
  subnet_id      = aws_subnet.public_zone1.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "public_zone2" {
  subnet_id      = aws_subnet.public_zone2.id
  route_table_id = aws_route_table.public.id
}

Routing tables manage how subnets send traffic:

  • Private route table: Sends 0.0.0.0/0 to NAT Gateway

  • Public route table: Sends 0.0.0.0/0 to Internet Gateway

  • Each route table is then associated with corresponding subnets.

βœ… 7-eks.tf

Provisions the EKS cluster control plane with proper IAM role:

resource "aws_iam_role" "eks" {
  name = "${local.env}-${local.eks_name}-eks-cluster"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "eks.amazonaws.com"
      }
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "eks" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role      = aws_iam_role.eks.name
}

resource "aws_eks_cluster" "eks" {
  name = "${local.env}-${local.eks_name}"
  version = local.eks_version
  role_arn = aws_iam_role.eks.arn

  vpc_config {
    endpoint_private_access = false
    endpoint_public_access = true

    subnet_ids = [
      aws_subnet.private_zone1.id,
      aws_subnet.private_zone2.id
    ]
  }

  access_config {
    authentication_mode = "API"
    bootstrap_cluster_creator_admin_permissions = true
  }

  depends_on = [ aws_iam_role_policy_attachment.eks ]
}

IAM role allows eks.amazonaws.com to assume the role.

This creates the EKS Control Plane:

  • We create an IAM role with a trust policy allowing eks.amazonaws.com to assume it

  • EKS cluster is created in private subnets

  • access_config enables API access and bootstraps the creator as an admin

βœ… 8-nodes.tf

Creates a managed EKS node group with correct IAM roles:

resource "aws_iam_role" "nodes" {
  name = "${local.env}-${local.eks_name}-eks-nodes"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      }
    }
  ]
}
POLICY
}

# This policy now includes AssumeRoleForPodIdentity for the Pod Identity Agent
resource "aws_iam_role_policy_attachment" "amazon_eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "amazon_eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "amazon_ec2_container_registry_read_only" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.nodes.name
}

resource "aws_eks_node_group" "general" {
  cluster_name    = aws_eks_cluster.eks.name
  version         = local.eks_version
  node_group_name = "general"
  node_role_arn   = aws_iam_role.nodes.arn

  subnet_ids = [
    aws_subnet.private_zone1.id,
    aws_subnet.private_zone2.id
  ]

  capacity_type  = "ON_DEMAND"
  instance_types = ["t3.medium"]

  scaling_config {
    desired_size = 1
    max_size     = 10
    min_size     = 0
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    role = "general"
  }

  depends_on = [
    aws_iam_role_policy_attachment.amazon_eks_worker_node_policy,
    aws_iam_role_policy_attachment.amazon_eks_cni_policy,
    aws_iam_role_policy_attachment.amazon_ec2_container_registry_read_only,
  ]

  # Allow external changes without Terraform plan difference
  lifecycle {
    ignore_changes = [scaling_config[0].desired_size]
  }
}

Here we define the EKS Node Group:

  • Node IAM Role is assumed by EC2 instances

  • IAM policies allow:

    • Worker node management

    • CNI networking

    • Pulling images from ECR

  • Nodes are deployed in private subnets with desired/min/max scaling configs

  • Labels help with grouping nodes by role (like general, app,

πŸ§ͺ Validate & Apply the Terraform Configuration

Run the following commands:

terraform init

terraform apply -auto-approve

Terraform will create the infrastructure, and it may take some time. In my case, it took 15 minutes to provision.

πŸ” Authenticate the Cluster

Once created, authenticate and test using AWS CLI:

aws eks update-kubeconfig --region us-east-1 --name production-demo
kubectl get nodes

If nodes are listed, your EKS cluster is running πŸŽ‰


⏭️ What’s Next?

This is just Day 1 of our series. In the upcoming days, we’ll enhance this cluster and cover:

  • Role-Based Access Control (RBAC)

  • Deploying the AWS ALB Ingress Controller

  • Setting up the Ingress Controller with NGINX

  • Enabling Cluster Autoscaling

  • Configuring Persistent Volume Claims (PVC)

  • Managing Secrets securely

  • TLS Certificates using Cert-Manager

πŸ“Œ Follow me to stay updated and get notified when the next article is published!

#EKS #Terraform #AWS #DevOps #Kubernetes #InfrastructureAsCode #CloudEngineering #CI_CD #IaC #Observability

1
Subscribe to my newsletter

Read articles from Neamul Kabir Emon directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Neamul Kabir Emon
Neamul Kabir Emon

Hi! I'm a highly motivated Security and DevOps professional with 7+ years of combined experience. My expertise bridges penetration testing and DevOps engineering, allowing me to deliver a comprehensive security approach.