Debugging AWS ECS Task Failures Using Terraform Remote State

Deploying to AWS ECS with Terraform sounded simple—until it wasn’t.

A few weeks ago, I was building a Dockerized app, pushing it to ECR, and deploying it via ECS Fargate using Terraform. Everything was fine... until suddenly, my ECS task started showing "STOPPED" status every time I deployed.

This blog shares exactly what went wrong, what I tried, what worked, and most importantly—what I learned.

The Setup

I'm learning DevOps hands-on by building real infrastructure using:

Terraform for infrastructure as code
Docker for containerization
AWS ECS (Fargate) to run containers
ECR to store Docker images
NGINX as the test app inside the container

I had two separate Terraform folders:

terraform/network — created VPC, public subnets, route table, and security group
terraform/ecs — defined ECS cluster, task definition, and service

Everything worked well initially.

The Problem: ECS Tasks Always Stopped

After making some changes in my terraform/network stack (replacing the security group to open multiple ports), I applied the changes successfully.

But when I ran terraform apply in the ecs/ folder to re-deploy my app, ECS launched the task... and it immediately went to STOPPED state.

I kept refreshing the ECS console, checking logs, trying different container ports, even rebuilding and pushing my Docker image several times. Nothing helped.

The Root Cause

After digging deeper, I discovered that:

My ECS service was still using the old security group ID, which no longer existed.
In my ECS Terraform file, the network_configuration block was hardcoded:

  subnets          = ["subnet-abc123", "subnet-def456"]
  security_groups  = ["sg-0123456789"]

Since the old SG was gone (replaced during the network changes), ECS couldn't attach it to the task, and the task failed before starting.

This was a classic example of a broken dependency between infrastructure components.

The Fix: Terraform Remote State

Instead of hardcoding subnet and security group IDs, I needed a way for the ecs/ Terraform stack to dynamically read the correct, up-to-date values from the network/ stack.

The cleanest solution: terraform_remote_state

Step 1: Output subnet and SG from network

In terraform/network/outputs.tf:

output "public_subnet_ids" {
  value = aws_subnet.public[*].id
}

output "security_group_id" {
  value = aws_security_group.ecs_sg.id
}

Step 2: Read remote state in ECS

In terraform/ecs/main.tf:

data "terraform_remote_state" "network" {
  backend = "local"
  config = {
    path = "../network/terraform.tfstate"
  }
}

Step 3: Use those values in ECS service

  network_configuration {
    subnets          = data.terraform_remote_state.network.outputs.public_subnet_ids
    assign_public_ip = true
    security_groups  = [data.terraform_remote_state.network.outputs.security_group_id]
  }

Once I made this change, everything just worked.

I applied the ECS Terraform again, refreshed the console, and this time the ECS task moved to RUNNING state. Opening the public IP showed:

🚀 Hello from Deepak's Docker container (served via NGINX)!

🎉 Finally!

What I Learned (So You Don't Repeat It)

✅ Don't hardcode resource IDs

Use variables, outputs, and remote state instead
Hardcoded values become stale quickly, especially with Terraform's resource replacements

✅ Use `outputs.tf` as a contract

Think of it like a public API between your Terraform stacks

✅ Use `terraform_remote_state` to wire your infra

It allows clean separation of concerns: one stack defines, the other consumes

✅ A stopped ECS task means something failed at launch

Check IAM roles, SGs, subnets, logs
If nothing shows in logs, it's often a networking or IAM issue

What’s Next

This was my first real-world style ECS deployment. I learned more from this single issue than from hours of tutorials.

In upcoming blogs, I’ll cover:

Adding a Load Balancer (ALB) in front of ECS
Setting up CI/CD with GitHub Actions
Deploying a real Drupal app instead of a placeholder

If you're just starting your DevOps journey: don't worry if things break. That’s when you learn.

Thanks for reading 🙏

✅ You can view the full codebase here: GitHub - deepakaryan1988/Drupal-AWS

💬 DM me or comment if you’ve faced similar ECS issues — happy to share more!

#AWS #Terraform #DevOps #ECS #RemoteState #Debugging #Hashnode #LinkedIn

Debugging ECS Task Failures: How Terraform Remote State Saved My Deployment

The Setup

The Problem: ECS Tasks Always Stopped

The Root Cause

The Fix: Terraform Remote State

Step 1: Output subnet and SG from network

Step 2: Read remote state in ECS

Step 3: Use those values in ECS service

What I Learned (So You Don't Repeat It)

✅ Don't hardcode resource IDs

✅ Use `outputs.tf` as a contract

✅ Use `terraform_remote_state` to wire your infra

✅ A stopped ECS task means something failed at launch

What’s Next

Subscribe to my newsletter

Deepak Kumar

Deepak Kumar

Debugging ECS Task Failures: How Terraform Remote State Saved My Deployment

The Setup

The Problem: ECS Tasks Always Stopped

The Root Cause

The Fix: Terraform Remote State

Step 1: Output subnet and SG from network

Step 2: Read remote state in ECS

Step 3: Use those values in ECS service

What I Learned (So You Don't Repeat It)

✅ Don't hardcode resource IDs

✅ Use outputs.tf as a contract

✅ Use terraform_remote_state to wire your infra

✅ A stopped ECS task means something failed at launch

What’s Next

Subscribe to my newsletter

Deepak Kumar

Deepak Kumar

✅ Use `outputs.tf` as a contract

✅ Use `terraform_remote_state` to wire your infra