End-to-End Cloud-Native Deployment

Introduction

This technical walkthrough demonstrates how to deploy a secure, scalable microservice on AWS using infrastructure-as-code (Terraform), containerization (Docker), and serverless computing (ECS Fargate). We’ll dissect the architecture, code, and DevOps practices that ensure reliability and security in production environments.


Architecture Overview

Key Components:

  • Public Subnets (2): Host Application Load Balancer (ALB)

  • Private Subnets (2): Run ECS Fargate tasks (isolated)

  • NAT Gateway: Allows outbound traffic from private subnets

  • Security Groups: Layer 4 firewall rules


Docker Implementation

Dockerfile

# Stage 1: Build environment
FROM python:3.9-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt  # Isolate dependencies

# Stage 2: Runtime environment
FROM python:3.9-slim
WORKDIR /app

# Create non-root user
RUN useradd -m appuser && \
    mkdir -p /home/appuser/.local && \
    chown -R appuser:appuser /app /home/appuser

# Copy dependencies and code
COPY --from=builder --chown=appuser:appuser /root/.local /home/appuser/.local
COPY --chown=appuser:appuser src/ .

USER appuser  # Drop privileges
ENV PATH=/home/appuser/.local/bin:$PATH

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0"]  # Start FastAPI

Best Practices:

  • Multi-Stage Build: Reduces final image size (98MB vs 350MB)

  • Non-Root User: Mitigates container breakout risks

  • Dependency Isolation: Prevents version conflicts


Terraform Infrastructure

1. VPC Module (modules/vpc/main.tf)

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true  # Required for ECS service discovery
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 2)
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id  # Single NAT for cost optimization
}

Network Design:

  • CIDR: 10.0.0.0/16 (65k IPs)

  • AZs: Multi-AZ for high availability

  • NAT: Central gateway for outbound traffic

2. ECS Cluster (modules/ecs/main.tf)

resource "aws_ecs_cluster" "main" {
  name = "${var.env}-cluster"
  setting {
    name  = "containerInsights"
    value = "enabled"  # CloudWatch metrics
  }
}

resource "aws_ecs_task_definition" "app" {
  family             = "${var.env}-task"
  cpu                = 256   # Fargate vCPU units
  memory             = 512   # In MiB
  network_mode       = "awsvpc"
  execution_role_arn = aws_iam_role.ecs_exec.arn

  container_definitions = jsonencode([{
    name      = "app",
    image     = var.container_image,
    essential = true,
    portMappings = [{ 
      containerPort = 8000,
      hostPort      = 8000  # Required for awsvpc mode
    }],
    logConfiguration: {
      logDriver = "awslogs",
      options = {
        "awslogs-group"  = "/ecs/${var.env}-task",
        "awslogs-region" = var.region
      }
    }
  }])
}

Fargate Configuration:

  • vCPU/Memory: Matches task requirements (1/2 GB)

  • Networking: awsvpc mode for ENI per task

  • Logging: CloudWatch integration

3. Load Balancer (modules/ecs/alb.tf)

resource "aws_lb" "main" {
  name               = "${var.env}-alb"
  subnets            = var.public_subnet_ids
  security_groups    = [aws_security_group.alb.id]
  internal           = false  # Internet-facing
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.main.arn
  }
}

Traffic Flow:

  1. ALB receives HTTP traffic on port 80

  2. Routes to target group on port 8000

  3. Target group health checks /health endpoint


Security Implementation

1. IAM Roles

resource "aws_iam_role" "ecs_exec" {
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Action = "sts:AssumeRole",
      Effect = "Allow",
      Principal = {
        Service = "ecs-tasks.amazonaws.com"  # Least privilege
      }
    }]
  })
}

Policy Restrictions:

  • ECS Tasks: Can’t modify infrastructure

  • Secrets: Pull from AWS Secrets Manager (optional)

2. Security Groups

# ALB Security Group
resource "aws_security_group" "alb" {
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]  # Public access
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# ECS Security Group
resource "aws_security_group" "ecs" {
  ingress {
    from_port       = 8000
    to_port         = 8000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]  # Only ALB access
  }
}

Zero-Trust Model:

  • ALB: Open inbound HTTP (port 80)

  • ECS: Only allows ALB traffic (port 8000)


Deployment Workflow

# 1. Build & Push Docker Image
docker build -t myrepo/simple-time-service:latest .
docker push myrepo/simple-time-service:latest

# 2. Terraform Deployment
terraform init
terraform plan -var="container_image=myrepo/simple-time-service:latest"
terraform apply -var="container_image=myrepo/simple-time-service:latest"

# 3. Verify
ALB_DNS=$(terraform output -raw alb_dns_name)
curl -v http://$ALB_DNS/health  # Expected: {"status": "healthy"}

Troubleshooting 503 Errors

Diagnosis Steps:

  1. Target Group Health
aws elbv2 describe-target-health --target-group-arn $(terraform output -raw target_group_arn)
  1. ECS Task Logs
aws logs tail "/ecs/prod-task" --follow
  1. Network Connectivity
# Test from private subnet
aws ec2-instance-connect ssh --instance-id i-12345 --command "curl localhost:8000"

Common Fixes:

  • Security Groups: Allow ALB → ECS traffic

  • Task Definition: Correct containerPort mapping

  • IAM Roles: Add ecs-tasks.amazonaws.com trust


Best Practices Checklist

CategoryPracticeImplementation Example
SecurityNon-root containersUSER appuser in Dockerfile
CostFargate spot instancesAdd capacity_provider_strategy
ReliabilityMulti-AZ deploymentaws_subnet.private[*].availability_zone
ObservabilityCloudWatch Container Insightssetting { name = "containerInsights" }

Conclusion

This implementation showcases critical DevOps principles:

  1. Infrastructure-as-Code: Terraform manages 20+ AWS resources declaratively

  2. Secure by Default: Zero-trust networking, least privilege IAM

  3. Cloud-Native: Serverless Fargate tasks scale automatically


GitHub Repository: Particle41 DevOps Challenge
AWS Documentation: ECS Best Practices


Deploy to AWS

10
Subscribe to my newsletter

Read articles from Subhanshu Mohan Gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Subhanshu Mohan Gupta
Subhanshu Mohan Gupta

A passionate AI DevOps Engineer specialized in creating secure, scalable, and efficient systems that bridge development and operations. My expertise lies in automating complex processes, integrating AI-driven solutions, and ensuring seamless, secure delivery pipelines. With a deep understanding of cloud infrastructure, CI/CD, and cybersecurity, I thrive on solving challenges at the intersection of innovation and security, driving continuous improvement in both technology and team dynamics.