Infrastructure as Code (IaC): Best Practices for Cloud Production

Samuel AniekemeSamuel Aniekeme
13 min read

The Foundation for Scalable, Reliable, and Reproducible Infrastructure

Welcome to the Cloud Production Series! This series dives into the tools, workflows, and practices that define what makes cloud production truly production-grade—scalable, reliable, and maintainable.

In this first post, we’re tackling Infrastructure as Code (IaC). IaC isn’t just a buzzword; it’s the backbone of modern cloud production environments. We’ll explore why IaC is critical, share best practices, highlight tools, and show you how to structure your repositories for scalability.

Whether you're setting up infrastructure for a growing startup or managing an enterprise-scale deployment, this post will help you approach IaC the right way.


What is Infrastructure as Code (IaC)?

Infrastructure as Code means managing and provisioning cloud infrastructure using machine-readable, version-controlled files rather than manual processes. IaC automates the provisioning of infrastructure components like servers, networks, databases, and load balancers. It defines infrastructure declaratively or imperatively using code. By defining infrastructure in code, you gain:

  • Consistency: The same code produces identical infrastructure across environments.

  • Automation: Resources are provisioned programmatically, removing manual errors.

  • Reproducibility: Infrastructure can be rebuilt or rolled back on demand.

For example, instead of manually creating an S3 bucket, you define it in code:

resource "aws_s3_bucket" "example" {  
  bucket = "production-bucket"  
  versioning {  
    enabled = true  
  }  
}

This approach ensures scalability and reliability, both essential for cloud production.


Why is IaC Essential in Cloud Production?

In production environments, infrastructure isn’t static. Teams need to scale resources, roll out updates, and recover from failures seamlessly. IaC enables this through:

  1. Speed and Automation: Deploy changes quickly while avoiding manual errors.

  2. Scalability: Dynamically provision resources to meet workload demands.

  3. Consistency Across Environments: Eliminate configuration drift.

  4. Auditability: All infrastructure changes are tracked in version control.

  5. Disaster Recovery: Rebuild entire environments from scratch when needed.


Best Practices for Infrastructure as Code

Here are actionable best practices to adopt IaC effectively in cloud production:

  1. Adopt a Declarative Approach:
    Use tools like Terraform or CloudFormation to define what you want (desired state) instead of scripting how to achieve it. This reduces complexity.

  2. Version Control Your Infrastructure:

    • Use Git to store all IaC files.

    • Implement code reviews via pull requests to ensure quality and security.

    • Adopt branching strategies (e.g., main for production, develop for testing).

  3. Modularize Code for Reusability:
    Break IaC into reusable modules for components like networking, compute, and storage.

  4. Separate Environments Clearly:
    Use environment-specific configurations for dev, staging, testing and production.

  5. Automate Testing and Validation:

    • Static Analysis: Tools like terraform validate catch syntax errors and security issues.

    • Integration Tests: Use tools like Terratest to verify infrastructure after changes.

  6. Manage State Files Securely:
    For tools like Terraform, store state files in remote backends (e.g., AWS S3 with DynamoDB locking) to ensure consistency and avoid conflicts in multi-team setups.

  7. Secure Secrets Management:
    Never hardcode secrets. Use:

    • AWS Secrets Manager

    • HashiCorp Vault

    • Encrypted files (e.g., SOPS)


Key Tools for Infrastructure as Code

Here’s a breakdown of tools that play a significant role in cloud production IaC workflows, with relevant documentation for further reading:

ToolBest ForLanguageStrengthsDocumentation
TerraformMulti-cloud environmentsHCLCloud-agnostic, modular, scalableTerraform Docs
AWS CloudFormationAWS-only infrastructureJSON/YAMLNative AWS integrationsCloudFormation Docs
PulumiCode-centric IaCPython, TypeScriptLeverage general-purpose codingPulumi Docs
AnsibleConfiguration ManagementYAML (declarative)Post-deployment server configsAnsible Docs

Step by Step Process Example

We would be giving an example of a standard IAC process in a cloud production setup using AWS and Terraform.

my-terraform-project/
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── modules/
│   ├── networking/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   ├── security/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   ├── compute/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   ├── load_balancing/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf

This setup provisions:
AWS VPC with a public subnet
Internet Gateway & Route Table
Security Group for HTTP & SSH access
EC2 Instance running Apache web server

Prerequisites

  1. Install Terraform from Terraform Download Page Here

  2. Install AWS Cli from AWS Page Here

  3. AWS configure process for setup Here


Building a Scalable AWS Infrastructure with Terraform Modules

In this guide, we’ll walk through how to build a scalable AWS infrastructure using Terraform modules. Modules allow us to organize our code into reusable components, making it easier to manage and maintain. We’ll create a VPC, subnets, an Auto Scaling Group (ASG), an Application Load Balancer (ALB), and more.


1. provider.tf (AWS Provider Configuration)

This file configures the AWS provider for Terraform. It specifies the required provider version and sets the AWS region. This is the foundation for all AWS resources managed by Terraform.

Create a provider.tf file to configure the AWS provider.

touch provider.tf

Edit provider.tf:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  required_version = ">= 1.3.0"
}

provider "aws" {
  region = var.region
}

2. variables.tf (Define Configurable Variables)

This file defines input variables for the Terraform configuration. It includes parameters like the AWS region, instance type, and key pair name. These variables make the configuration flexible and reusable.

Create a variables.tf file to define input variables for the project.

touch variables.tf

Edit variables.tf:

variable "region" {
  description = "AWS region"
  default     = "us-east-1"
}

variable "instance_type" {
  description = "EC2 instance type"
  default     = "t2.micro"
}

variable "key_name" {
  description = "AWS Key Pair Name for SSH access"
  type        = string
}

variable "vpc_cidr_block" {
  description = "CIDR block for the VPC"
  default     = "10.0.0.0/16"
}

variable "public_subnets" {
  description = "Map of public subnets"
  type        = map(object({
    cidr_block = string
  }))
  default = {
    "us-east-1a" = { cidr_block = "10.0.1.0/24" }
    "us-east-1b" = { cidr_block = "10.0.2.0/24" }
  }
}

variable "desired_capacity" {
  description = "Desired capacity for the Auto Scaling Group"
  default     = 2
}

variable "min_size" {
  description = "Minimum size for the Auto Scaling Group"
  default     = 2
}

variable "max_size" {
  description = "Maximum size for the Auto Scaling Group"
  default     = 5
}

3. main.tf (Root Module Configuration)

This is the root module that calls the child modules (networking, security, compute, and load_balancing). It passes the necessary inputs to each module and ties everything together to create the infrastructure.

The main.tf file will now call the child modules to create the infrastructure.

touch main.tf

Edit main.tf:

provider "aws" {
  region = var.region
}

module "networking" {
  source = "./modules/networking"

  vpc_cidr_block    = var.vpc_cidr_block
  vpc_name          = "MyVPC"
  public_subnets    = var.public_subnets
  igw_name          = "MyInternetGateway"
  public_rt_name    = "PublicRouteTable"
}

module "security" {
  source = "./modules/security"

  vpc_id  = module.networking.vpc_id
  sg_name = "WebSecurityGroup"
}

module "compute" {
  source = "./modules/compute"

  launch_template_name = "web-server-template"
  instance_type        = var.instance_type
  key_name            = var.key_name
  sg_id               = module.security.sg_id
  user_data           = <<-EOF
    #!/bin/bash
    apt update -y
    apt install -y apache2
    systemctl start apache2
    systemctl enable apache2
    echo "Hello, Auto Scaling!" > /var/www/html/index.html
  EOF
  instance_name       = "WebServer"
  public_subnet_ids   = module.networking.public_subnet_ids
  desired_capacity    = var.desired_capacity
  min_size           = var.min_size
  max_size           = var.max_size
  asg_name           = "AutoScaledWebServer"
}

module "load_balancing" {
  source = "./modules/load_balancing"

  alb_name          = "web-load-balancer"
  sg_id             = module.security.sg_id
  public_subnet_ids = module.networking.public_subnet_ids
  vpc_id            = module.networking.vpc_id
  target_group_name = "web-target-group"
  asg_id            = module.compute.asg_id
}

output "alb_dns_name" {
  description = "DNS Name of the Application Load Balancer"
  value       = module.load_balancing.alb_dns_name
}

4. outputs.tf (Retrieve Important Information)

This file defines outputs that provide useful information after Terraform applies the configuration. For example, it outputs the DNS name of the Application Load Balancer (ALB).

Create an outputs.tf file to output key details like the ALB DNS name.

touch outputs.tf

Edit outputs.tf:

output "alb_dns_name" {
  description = "DNS Name of the Application Load Balancer"
  value       = module.load_balancing.alb_dns_name
}

5. terraform.tfvars (Store Variable Values)

This file provides values for the variables defined in variables.tf. It allows you to customize the configuration without modifying the main code.

Create a terraform.tfvars file to provide values for the variables.

touch terraform.tfvars

Edit terraform.tfvars:

region          = "us-east-1"
instance_type   = "t2.micro"
key_name        = "my-key-pair"  # Replace with your actual key pair name
vpc_cidr_block  = "10.0.0.0/16"
desired_capacity = 1
min_size        = 1
max_size        = 5

6. Modules

Now, let’s create the child modules for each component of the infrastructure.

Folder Structure

my-terraform-project/
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── modules/
│   ├── networking/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   ├── security/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   ├── compute/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   ├── load_balancing/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf

a. Networking Module

modules/networking/main.tf

This file defines the networking resources, including the VPC, subnets, internet gateway, and route tables. It creates the foundational network infrastructure.

resource "aws_vpc" "my_vpc" {
  cidr_block = var.vpc_cidr_block

  tags = {
    Name = var.vpc_name
  }
}

resource "aws_subnet" "public_subnets" {
  for_each = var.public_subnets

  vpc_id                  = aws_vpc.my_vpc.id
  cidr_block              = each.value.cidr_block
  availability_zone       = each.key
  map_public_ip_on_launch = true

  tags = {
    Name = "PublicSubnet-${each.key}"
  }
}

resource "aws_internet_gateway" "my_igw" {
  vpc_id = aws_vpc.my_vpc.id

  tags = {
    Name = var.igw_name
  }
}

resource "aws_route_table" "public_rt" {
  vpc_id = aws_vpc.my_vpc.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.my_igw.id
  }

  tags = {
    Name = var.public_rt_name
  }
}

resource "aws_route_table_association" "public_assoc" {
  for_each = aws_subnet.public_subnets

  subnet_id      = each.value.id
  route_table_id = aws_route_table.public_rt.id
}

modules/networking/variables.tf

This file defines the input variables for the networking module, such as the VPC CIDR block and subnet configurations. It ensures the module is reusable and configurable.

variable "vpc_cidr_block" {
  description = "CIDR block for the VPC"
  type        = string
}

variable "vpc_name" {
  description = "Name tag for the VPC"
  type        = string
}

variable "public_subnets" {
  description = "Map of public subnets"
  type        = map(object({
    cidr_block = string
  }))
}

variable "igw_name" {
  description = "Name tag for the Internet Gateway"
  type        = string
}

variable "public_rt_name" {
  description = "Name tag for the Public Route Table"
  type        = string
}

modules/networking/outputs.tf

This file outputs key networking details, such as the VPC ID and subnet IDs. These outputs are used by other modules to reference the networking resources.

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.my_vpc.id
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = [for subnet in aws_subnet.public_subnets : subnet.id]
}

b. Security Module

modules/security/main.tf

This file defines the security group for the EC2 instances and ALB. It configures inbound and outbound traffic rules to secure the infrastructure.

resource "aws_security_group" "web_sg" {
  vpc_id = var.vpc_id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = var.sg_name
  }
}

modules/security/variables.tf

This file defines the input variables for the security module, such as the VPC ID and security group name. It ensures the module is flexible and reusable.

variable "vpc_id" {
  description = "ID of the VPC"
  type        = string
}

variable "sg_name" {
  description = "Name tag for the Security Group"
  type        = string
}

modules/security/outputs.tf

This file outputs the security group ID, which is used by other modules to associate the security group with resources like EC2 instances and the ALB.

output "sg_id" {
  description = "ID of the Security Group"
  value       = aws_security_group.web_sg.id
}

c. Compute Module

modules/compute/main.tf

This file defines the compute resources, including the launch template and Auto Scaling Group (ASG). It ensures EC2 instances are created and scaled based on traffic.

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_launch_template" "web_template" {
  name          = var.launch_template_name
  image_id      = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
  key_name      = var.key_name

  network_interfaces {
    associate_public_ip_address = true
    security_groups             = [var.sg_id]
  }

  user_data = base64encode(var.user_data)

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = var.instance_name
    }
  }
}

resource "aws_autoscaling_group" "web_asg" {
  vpc_zone_identifier = var.public_subnet_ids
  desired_capacity    = var.desired_capacity
  min_size           = var.min_size
  max_size           = var.max_size

  launch_template {
    id      = aws_launch_template.web_template.id
    version = "$Latest"
  }

  health_check_type         = "EC2"
  health_check_grace_period = 300

  tag {
    key                 = "Name"
    value               = var.asg_name
    propagate_at_launch = true
  }
}

modules/compute/variables.tf

This file defines the input variables for the compute module, such as the instance type, key pair name, and user data. It makes the module configurable and reusable.

variable "launch_template_name" {
  description = "Name of the Launch Template"
  type        = string
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
}

variable "key_name" {
  description = "AWS Key Pair Name for SSH access"
  type        = string
}

variable "sg_id" {
  description = "ID of the Security Group"
  type        = string
}

variable "user_data" {
  description = "User data script for the EC2 instances"
  type        = string
}

variable "instance_name" {
  description = "Name tag for the EC2 instances"
  type        = string
}

variable "public_subnet_ids" {
  description = "IDs of the public subnets"
  type        = list(string)
}

variable "desired_capacity" {
  description = "Desired capacity for the Auto Scaling Group"
  type        = number
}

variable "min_size" {
  description = "Minimum size for the Auto Scaling Group"
  type        = number
}

variable "max_size" {
  description = "Maximum size for the Auto Scaling Group"
  type        = number
}

variable "asg_name" {
  description = "Name tag for the Auto Scaling Group"
  type        = string
}

modules/compute/outputs.tf

This file outputs the Auto Scaling Group ID, which is used by the load balancing module to attach the ASG to the ALB.

output "asg_id" {
  description = "ID of the Auto Scaling Group"
  value       = aws_autoscaling_group.web_asg.id
}

d. Load Balancing Module

modules/load_balancing/main.tf

This file defines the load balancing resources, including the Application Load Balancer (ALB), target group, and listener. It distributes traffic across the EC2 instances.

resource "aws_lb" "web_alb" {
  name               = var.alb_name
  internal           = false
  load_balancer_type = "application"
  security_groups    = [var.sg_id]
  subnets           = var.public_subnet_ids

  enable_deletion_protection = false
}

resource "aws_lb_target_group" "web_tg" {
  name     = var.target_group_name
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.web_alb.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web_tg.arn
  }
}

resource "aws_autoscaling_attachment" "asg_attachment" {
  autoscaling_group_name = var.asg_id
  lb_target_group_arn   = aws_lb_target_group.web_tg.arn
}

modules/load_balancing/variables.tf

This file defines the input variables for the load balancing module, such as the ALB name, security group ID, and subnet IDs. It ensures the module is reusable.

variable "alb_name" {
  description = "Name of the Application Load Balancer"
  type        = string
}

variable "sg_id" {
  description = "ID of the Security Group"
  type        = string
}

variable "public_subnet_ids" {
  description = "IDs of the public subnets"
  type        = list(string)
}

variable "vpc_id" {
  description = "ID of the VPC"
  type        = string
}

variable "target_group_name" {
  description = "Name of the Target Group"
  type        = string
}

variable "asg_id" {
  description = "ID of the Auto Scaling Group"
  type        = string
}

modules/load_balancing/outputs.tf

This file outputs the ALB DNS name, which is used to access the application after deployment. It provides a convenient way to retrieve this information.

output "alb_dns_name" {
  description = "DNS Name of the Application Load Balancer"
  value       = aws_lb.web_alb.dns_name
}

7. Deploy the Infrastructure

  1. Initialize Terraform:

     terraform init
    
  2. Plan the Deployment:

     terraform plan
    
  3. Apply the Configuration:

     terraform apply
    
  4. Access the ALB: Use the ALB DNS name output by Terraform to access your application.

This modular approach makes your Terraform code reusable, scalable, and easy to maintain.


Cleaning Up AWS Resources

Once you're done, destroy everything:

terraform destroy

Conclusion

Infrastructure as Code (IaC) is indispensable for cloud production. By automating infrastructure, enforcing best practices, and modularizing your code, you can build systems that scale reliably and minimize downtime.

In the next post of the Cloud Production Series, we’ll tackle Configuration Management in Cloud Production, exploring best practices, tools, and strategies for managing configurations in cloud production environments effectively.

Have thoughts or questions on implementing IaC? Drop a comment below and let’s discuss!


10
Subscribe to my newsletter

Read articles from Samuel Aniekeme directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Samuel Aniekeme
Samuel Aniekeme