Infrastructure as Code (IaC): Best Practices for Cloud Production

The Foundation for Scalable, Reliable, and Reproducible Infrastructure
Welcome to the Cloud Production Series! This series dives into the tools, workflows, and practices that define what makes cloud production truly production-grade—scalable, reliable, and maintainable.
In this first post, we’re tackling Infrastructure as Code (IaC). IaC isn’t just a buzzword; it’s the backbone of modern cloud production environments. We’ll explore why IaC is critical, share best practices, highlight tools, and show you how to structure your repositories for scalability.
Whether you're setting up infrastructure for a growing startup or managing an enterprise-scale deployment, this post will help you approach IaC the right way.
What is Infrastructure as Code (IaC)?
Infrastructure as Code means managing and provisioning cloud infrastructure using machine-readable, version-controlled files rather than manual processes. IaC automates the provisioning of infrastructure components like servers, networks, databases, and load balancers. It defines infrastructure declaratively or imperatively using code. By defining infrastructure in code, you gain:
Consistency: The same code produces identical infrastructure across environments.
Automation: Resources are provisioned programmatically, removing manual errors.
Reproducibility: Infrastructure can be rebuilt or rolled back on demand.
For example, instead of manually creating an S3 bucket, you define it in code:
resource "aws_s3_bucket" "example" {
bucket = "production-bucket"
versioning {
enabled = true
}
}
This approach ensures scalability and reliability, both essential for cloud production.
Why is IaC Essential in Cloud Production?
In production environments, infrastructure isn’t static. Teams need to scale resources, roll out updates, and recover from failures seamlessly. IaC enables this through:
Speed and Automation: Deploy changes quickly while avoiding manual errors.
Scalability: Dynamically provision resources to meet workload demands.
Consistency Across Environments: Eliminate configuration drift.
Auditability: All infrastructure changes are tracked in version control.
Disaster Recovery: Rebuild entire environments from scratch when needed.
Best Practices for Infrastructure as Code
Here are actionable best practices to adopt IaC effectively in cloud production:
Adopt a Declarative Approach:
Use tools like Terraform or CloudFormation to define what you want (desired state) instead of scripting how to achieve it. This reduces complexity.Version Control Your Infrastructure:
Use Git to store all IaC files.
Implement code reviews via pull requests to ensure quality and security.
Adopt branching strategies (e.g.,
main
for production,develop
for testing).
Modularize Code for Reusability:
Break IaC into reusable modules for components like networking, compute, and storage.Separate Environments Clearly:
Use environment-specific configurations fordev
,staging
,testing
andproduction
.Automate Testing and Validation:
Static Analysis: Tools like
terraform validate
catch syntax errors and security issues.Integration Tests: Use tools like Terratest to verify infrastructure after changes.
Manage State Files Securely:
For tools like Terraform, store state files in remote backends (e.g., AWS S3 with DynamoDB locking) to ensure consistency and avoid conflicts in multi-team setups.Secure Secrets Management:
Never hardcode secrets. Use:AWS Secrets Manager
HashiCorp Vault
Encrypted files (e.g., SOPS)
Key Tools for Infrastructure as Code
Here’s a breakdown of tools that play a significant role in cloud production IaC workflows, with relevant documentation for further reading:
Tool | Best For | Language | Strengths | Documentation |
Terraform | Multi-cloud environments | HCL | Cloud-agnostic, modular, scalable | Terraform Docs |
AWS CloudFormation | AWS-only infrastructure | JSON/YAML | Native AWS integrations | CloudFormation Docs |
Pulumi | Code-centric IaC | Python, TypeScript | Leverage general-purpose coding | Pulumi Docs |
Ansible | Configuration Management | YAML (declarative) | Post-deployment server configs | Ansible Docs |
Step by Step Process Example
We would be giving an example of a standard IAC process in a cloud production setup using AWS and Terraform.
my-terraform-project/
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── modules/
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ ├── security/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ ├── compute/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ ├── load_balancing/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
This setup provisions:
AWS VPC with a public subnet
Internet Gateway & Route Table
Security Group for HTTP & SSH access
EC2 Instance running Apache web server
Prerequisites
Install Terraform from Terraform Download Page Here
Install AWS Cli from AWS Page Here
AWS configure process for setup Here
Building a Scalable AWS Infrastructure with Terraform Modules
In this guide, we’ll walk through how to build a scalable AWS infrastructure using Terraform modules. Modules allow us to organize our code into reusable components, making it easier to manage and maintain. We’ll create a VPC, subnets, an Auto Scaling Group (ASG), an Application Load Balancer (ALB), and more.
1. provider.tf
(AWS Provider Configuration)
This file configures the AWS provider for Terraform. It specifies the required provider version and sets the AWS region. This is the foundation for all AWS resources managed by Terraform.
Create a provider.tf
file to configure the AWS provider.
touch provider.tf
Edit provider.tf
:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
required_version = ">= 1.3.0"
}
provider "aws" {
region = var.region
}
2. variables.tf
(Define Configurable Variables)
This file defines input variables for the Terraform configuration. It includes parameters like the AWS region, instance type, and key pair name. These variables make the configuration flexible and reusable.
Create a variables.tf
file to define input variables for the project.
touch variables.tf
Edit variables.tf
:
variable "region" {
description = "AWS region"
default = "us-east-1"
}
variable "instance_type" {
description = "EC2 instance type"
default = "t2.micro"
}
variable "key_name" {
description = "AWS Key Pair Name for SSH access"
type = string
}
variable "vpc_cidr_block" {
description = "CIDR block for the VPC"
default = "10.0.0.0/16"
}
variable "public_subnets" {
description = "Map of public subnets"
type = map(object({
cidr_block = string
}))
default = {
"us-east-1a" = { cidr_block = "10.0.1.0/24" }
"us-east-1b" = { cidr_block = "10.0.2.0/24" }
}
}
variable "desired_capacity" {
description = "Desired capacity for the Auto Scaling Group"
default = 2
}
variable "min_size" {
description = "Minimum size for the Auto Scaling Group"
default = 2
}
variable "max_size" {
description = "Maximum size for the Auto Scaling Group"
default = 5
}
3. main.tf
(Root Module Configuration)
This is the root module that calls the child modules (networking
, security
, compute
, and load_balancing
). It passes the necessary inputs to each module and ties everything together to create the infrastructure.
The main.tf
file will now call the child modules to create the infrastructure.
touch main.tf
Edit main.tf
:
provider "aws" {
region = var.region
}
module "networking" {
source = "./modules/networking"
vpc_cidr_block = var.vpc_cidr_block
vpc_name = "MyVPC"
public_subnets = var.public_subnets
igw_name = "MyInternetGateway"
public_rt_name = "PublicRouteTable"
}
module "security" {
source = "./modules/security"
vpc_id = module.networking.vpc_id
sg_name = "WebSecurityGroup"
}
module "compute" {
source = "./modules/compute"
launch_template_name = "web-server-template"
instance_type = var.instance_type
key_name = var.key_name
sg_id = module.security.sg_id
user_data = <<-EOF
#!/bin/bash
apt update -y
apt install -y apache2
systemctl start apache2
systemctl enable apache2
echo "Hello, Auto Scaling!" > /var/www/html/index.html
EOF
instance_name = "WebServer"
public_subnet_ids = module.networking.public_subnet_ids
desired_capacity = var.desired_capacity
min_size = var.min_size
max_size = var.max_size
asg_name = "AutoScaledWebServer"
}
module "load_balancing" {
source = "./modules/load_balancing"
alb_name = "web-load-balancer"
sg_id = module.security.sg_id
public_subnet_ids = module.networking.public_subnet_ids
vpc_id = module.networking.vpc_id
target_group_name = "web-target-group"
asg_id = module.compute.asg_id
}
output "alb_dns_name" {
description = "DNS Name of the Application Load Balancer"
value = module.load_balancing.alb_dns_name
}
4. outputs.tf
(Retrieve Important Information)
This file defines outputs that provide useful information after Terraform applies the configuration. For example, it outputs the DNS name of the Application Load Balancer (ALB).
Create an outputs.tf
file to output key details like the ALB DNS name.
touch outputs.tf
Edit outputs.tf
:
output "alb_dns_name" {
description = "DNS Name of the Application Load Balancer"
value = module.load_balancing.alb_dns_name
}
5. terraform.tfvars
(Store Variable Values)
This file provides values for the variables defined in variables.tf
. It allows you to customize the configuration without modifying the main code.
Create a terraform.tfvars
file to provide values for the variables.
touch terraform.tfvars
Edit terraform.tfvars
:
region = "us-east-1"
instance_type = "t2.micro"
key_name = "my-key-pair" # Replace with your actual key pair name
vpc_cidr_block = "10.0.0.0/16"
desired_capacity = 1
min_size = 1
max_size = 5
6. Modules
Now, let’s create the child modules for each component of the infrastructure.
Folder Structure
my-terraform-project/
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── modules/
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ ├── security/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ ├── compute/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ ├── load_balancing/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
a. Networking Module
modules/networking/
main.tf
This file defines the networking resources, including the VPC, subnets, internet gateway, and route tables. It creates the foundational network infrastructure.
resource "aws_vpc" "my_vpc" {
cidr_block = var.vpc_cidr_block
tags = {
Name = var.vpc_name
}
}
resource "aws_subnet" "public_subnets" {
for_each = var.public_subnets
vpc_id = aws_vpc.my_vpc.id
cidr_block = each.value.cidr_block
availability_zone = each.key
map_public_ip_on_launch = true
tags = {
Name = "PublicSubnet-${each.key}"
}
}
resource "aws_internet_gateway" "my_igw" {
vpc_id = aws_vpc.my_vpc.id
tags = {
Name = var.igw_name
}
}
resource "aws_route_table" "public_rt" {
vpc_id = aws_vpc.my_vpc.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.my_igw.id
}
tags = {
Name = var.public_rt_name
}
}
resource "aws_route_table_association" "public_assoc" {
for_each = aws_subnet.public_subnets
subnet_id = each.value.id
route_table_id = aws_route_table.public_rt.id
}
modules/networking/
variables.tf
This file defines the input variables for the networking module, such as the VPC CIDR block and subnet configurations. It ensures the module is reusable and configurable.
variable "vpc_cidr_block" {
description = "CIDR block for the VPC"
type = string
}
variable "vpc_name" {
description = "Name tag for the VPC"
type = string
}
variable "public_subnets" {
description = "Map of public subnets"
type = map(object({
cidr_block = string
}))
}
variable "igw_name" {
description = "Name tag for the Internet Gateway"
type = string
}
variable "public_rt_name" {
description = "Name tag for the Public Route Table"
type = string
}
modules/networking/
outputs.tf
This file outputs key networking details, such as the VPC ID and subnet IDs. These outputs are used by other modules to reference the networking resources.
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.my_vpc.id
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = [for subnet in aws_subnet.public_subnets : subnet.id]
}
b. Security Module
modules/security/
main.tf
This file defines the security group for the EC2 instances and ALB. It configures inbound and outbound traffic rules to secure the infrastructure.
resource "aws_security_group" "web_sg" {
vpc_id = var.vpc_id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = var.sg_name
}
}
modules/security/
variables.tf
This file defines the input variables for the security module, such as the VPC ID and security group name. It ensures the module is flexible and reusable.
variable "vpc_id" {
description = "ID of the VPC"
type = string
}
variable "sg_name" {
description = "Name tag for the Security Group"
type = string
}
modules/security/
outputs.tf
This file outputs the security group ID, which is used by other modules to associate the security group with resources like EC2 instances and the ALB.
output "sg_id" {
description = "ID of the Security Group"
value = aws_security_group.web_sg.id
}
c. Compute Module
modules/compute/
main.tf
This file defines the compute resources, including the launch template and Auto Scaling Group (ASG). It ensures EC2 instances are created and scaled based on traffic.
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"]
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_launch_template" "web_template" {
name = var.launch_template_name
image_id = data.aws_ami.ubuntu.id
instance_type = var.instance_type
key_name = var.key_name
network_interfaces {
associate_public_ip_address = true
security_groups = [var.sg_id]
}
user_data = base64encode(var.user_data)
tag_specifications {
resource_type = "instance"
tags = {
Name = var.instance_name
}
}
}
resource "aws_autoscaling_group" "web_asg" {
vpc_zone_identifier = var.public_subnet_ids
desired_capacity = var.desired_capacity
min_size = var.min_size
max_size = var.max_size
launch_template {
id = aws_launch_template.web_template.id
version = "$Latest"
}
health_check_type = "EC2"
health_check_grace_period = 300
tag {
key = "Name"
value = var.asg_name
propagate_at_launch = true
}
}
modules/compute/
variables.tf
This file defines the input variables for the compute module, such as the instance type, key pair name, and user data. It makes the module configurable and reusable.
variable "launch_template_name" {
description = "Name of the Launch Template"
type = string
}
variable "instance_type" {
description = "EC2 instance type"
type = string
}
variable "key_name" {
description = "AWS Key Pair Name for SSH access"
type = string
}
variable "sg_id" {
description = "ID of the Security Group"
type = string
}
variable "user_data" {
description = "User data script for the EC2 instances"
type = string
}
variable "instance_name" {
description = "Name tag for the EC2 instances"
type = string
}
variable "public_subnet_ids" {
description = "IDs of the public subnets"
type = list(string)
}
variable "desired_capacity" {
description = "Desired capacity for the Auto Scaling Group"
type = number
}
variable "min_size" {
description = "Minimum size for the Auto Scaling Group"
type = number
}
variable "max_size" {
description = "Maximum size for the Auto Scaling Group"
type = number
}
variable "asg_name" {
description = "Name tag for the Auto Scaling Group"
type = string
}
modules/compute/
outputs.tf
This file outputs the Auto Scaling Group ID, which is used by the load balancing module to attach the ASG to the ALB.
output "asg_id" {
description = "ID of the Auto Scaling Group"
value = aws_autoscaling_group.web_asg.id
}
d. Load Balancing Module
modules/load_balancing/
main.tf
This file defines the load balancing resources, including the Application Load Balancer (ALB), target group, and listener. It distributes traffic across the EC2 instances.
resource "aws_lb" "web_alb" {
name = var.alb_name
internal = false
load_balancer_type = "application"
security_groups = [var.sg_id]
subnets = var.public_subnet_ids
enable_deletion_protection = false
}
resource "aws_lb_target_group" "web_tg" {
name = var.target_group_name
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
}
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.web_alb.arn
port = 80
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.web_tg.arn
}
}
resource "aws_autoscaling_attachment" "asg_attachment" {
autoscaling_group_name = var.asg_id
lb_target_group_arn = aws_lb_target_group.web_tg.arn
}
modules/load_balancing/
variables.tf
This file defines the input variables for the load balancing module, such as the ALB name, security group ID, and subnet IDs. It ensures the module is reusable.
variable "alb_name" {
description = "Name of the Application Load Balancer"
type = string
}
variable "sg_id" {
description = "ID of the Security Group"
type = string
}
variable "public_subnet_ids" {
description = "IDs of the public subnets"
type = list(string)
}
variable "vpc_id" {
description = "ID of the VPC"
type = string
}
variable "target_group_name" {
description = "Name of the Target Group"
type = string
}
variable "asg_id" {
description = "ID of the Auto Scaling Group"
type = string
}
modules/load_balancing/
outputs.tf
This file outputs the ALB DNS name, which is used to access the application after deployment. It provides a convenient way to retrieve this information.
output "alb_dns_name" {
description = "DNS Name of the Application Load Balancer"
value = aws_lb.web_alb.dns_name
}
7. Deploy the Infrastructure
Initialize Terraform:
terraform init
Plan the Deployment:
terraform plan
Apply the Configuration:
terraform apply
Access the ALB: Use the ALB DNS name output by Terraform to access your application.
This modular approach makes your Terraform code reusable, scalable, and easy to maintain.
Cleaning Up AWS Resources
Once you're done, destroy everything:
terraform destroy
Conclusion
Infrastructure as Code (IaC) is indispensable for cloud production. By automating infrastructure, enforcing best practices, and modularizing your code, you can build systems that scale reliably and minimize downtime.
In the next post of the Cloud Production Series, we’ll tackle Configuration Management in Cloud Production, exploring best practices, tools, and strategies for managing configurations in cloud production environments effectively.
Have thoughts or questions on implementing IaC? Drop a comment below and let’s discuss!
Subscribe to my newsletter
Read articles from Samuel Aniekeme directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
