Debugging ECS Task Failures: How Terraform Remote State Saved My Deployment

Deepak KumarDeepak Kumar
4 min read

Deploying to AWS ECS with Terraform sounded simple—until it wasn’t.

A few weeks ago, I was building a Dockerized app, pushing it to ECR, and deploying it via ECS Fargate using Terraform. Everything was fine... until suddenly, my ECS task started showing "STOPPED" status every time I deployed.

This blog shares exactly what went wrong, what I tried, what worked, and most importantly—what I learned.


The Setup

I'm learning DevOps hands-on by building real infrastructure using:

  • Terraform for infrastructure as code

  • Docker for containerization

  • AWS ECS (Fargate) to run containers

  • ECR to store Docker images

  • NGINX as the test app inside the container

I had two separate Terraform folders:

  • terraform/network — created VPC, public subnets, route table, and security group

  • terraform/ecs — defined ECS cluster, task definition, and service

Everything worked well initially.


The Problem: ECS Tasks Always Stopped

After making some changes in my terraform/network stack (replacing the security group to open multiple ports), I applied the changes successfully.

But when I ran terraform apply in the ecs/ folder to re-deploy my app, ECS launched the task... and it immediately went to STOPPED state.

I kept refreshing the ECS console, checking logs, trying different container ports, even rebuilding and pushing my Docker image several times. Nothing helped.


The Root Cause

After digging deeper, I discovered that:

  • My ECS service was still using the old security group ID, which no longer existed.

  • In my ECS Terraform file, the network_configuration block was hardcoded:

  subnets          = ["subnet-abc123", "subnet-def456"]
  security_groups  = ["sg-0123456789"]
  • Since the old SG was gone (replaced during the network changes), ECS couldn't attach it to the task, and the task failed before starting.

This was a classic example of a broken dependency between infrastructure components.


The Fix: Terraform Remote State

Instead of hardcoding subnet and security group IDs, I needed a way for the ecs/ Terraform stack to dynamically read the correct, up-to-date values from the network/ stack.

The cleanest solution: terraform_remote_state

Step 1: Output subnet and SG from network

In terraform/network/outputs.tf:

output "public_subnet_ids" {
  value = aws_subnet.public[*].id
}

output "security_group_id" {
  value = aws_security_group.ecs_sg.id
}

Step 2: Read remote state in ECS

In terraform/ecs/main.tf:

data "terraform_remote_state" "network" {
  backend = "local"
  config = {
    path = "../network/terraform.tfstate"
  }
}

Step 3: Use those values in ECS service

  network_configuration {
    subnets          = data.terraform_remote_state.network.outputs.public_subnet_ids
    assign_public_ip = true
    security_groups  = [data.terraform_remote_state.network.outputs.security_group_id]
  }

Once I made this change, everything just worked.

I applied the ECS Terraform again, refreshed the console, and this time the ECS task moved to RUNNING state. Opening the public IP showed:

🚀 Hello from Deepak's Docker container (served via NGINX)!

🎉 Finally!


What I Learned (So You Don't Repeat It)

✅ Don't hardcode resource IDs

  • Use variables, outputs, and remote state instead

  • Hardcoded values become stale quickly, especially with Terraform's resource replacements

✅ Use outputs.tf as a contract

  • Think of it like a public API between your Terraform stacks

✅ Use terraform_remote_state to wire your infra

  • It allows clean separation of concerns: one stack defines, the other consumes

✅ A stopped ECS task means something failed at launch

  • Check IAM roles, SGs, subnets, logs

  • If nothing shows in logs, it's often a networking or IAM issue


What’s Next

This was my first real-world style ECS deployment. I learned more from this single issue than from hours of tutorials.

In upcoming blogs, I’ll cover:

  • Adding a Load Balancer (ALB) in front of ECS

  • Setting up CI/CD with GitHub Actions

  • Deploying a real Drupal app instead of a placeholder

If you're just starting your DevOps journey: don't worry if things break. That’s when you learn.

Thanks for reading 🙏


✅ You can view the full codebase here: GitHub - deepakaryan1988/Drupal-AWS

💬 DM me or comment if you’ve faced similar ECS issues — happy to share more!

#AWS #Terraform #DevOps #ECS #RemoteState #Debugging #Hashnode #LinkedIn

0
Subscribe to my newsletter

Read articles from Deepak Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Deepak Kumar
Deepak Kumar