From ECS Timeout to CI/CD Green: A Real-World DevOps Journey

Deepak KumarDeepak Kumar
2 min read

🚀 Introduction – The Problem

Our FeedbackHub project runs on AWS ECS Fargate with infrastructure managed by Terraform and deployments triggered through GitHub Actions.

Sounds neat, right? Except — our first deployments kept getting stuck.

  • ECS deployment status: IN_PROGRESS forever

  • Tasks: 3 running instead of 2

  • GitHub Actions: Timed out during wait services-stable

This was a real-world DevOps headache. Here’s how we diagnosed, fixed, and iterated.

(Note: The app occasionally “takes naps” to save AWS costs. Think of it as serverless beauty sleep.)


🔍 Phase 1 – OIDC & Pipeline Setup

Secure GitHub Actions authentication via OIDC IAM Role in AWS:

  • Created dedicated IAM role for GitHub Actions

  • Configured configure-aws-credentials@v4

  • Triggered first deployment

✅ OIDC worked flawlessly. ECS rollout… not so much.


🛠 Phase 2 – ALB Health Check Fix

Problem:

  • ALB called / → redirect (301/302)

  • Timeout too short

Fix:

  • Changed path to /api/health

  • Increased timeout to 10s

  • Temporarily allowed success codes 200-302

✅ ALB targets healthy.


⚙️ Phase 3 – ECS Container Health Check Fix

Problem:

  • ECS health check ran before app fully started (Next.js warmup + DB)

  • curl failed in Alpine

Fix:

  • Installed wget in Dockerfile

  • Updated health check command:

wget --no-verbose --tries=1 --spider http://localhost:3000/api/health || exit 1
  • Increased startPeriod to 120s (planning 300s)

  • Temporarily disabled ECS health checks to unblock

✅ ECS service stable, 2/2 running.


🎯 The Partial Win

  • App: Works perfectly

  • ECS: Deployment COMPLETED

  • Pipeline: No more timeout

(The app still naps when not in use. Cost optimization, but make it cozy.)


📚 Lessons Learned

  1. Align ALB & ECS health checks

  2. Allow realistic container boot times

  3. It’s okay to relax checks temporarily

  4. Pipelines must account for ECS timing


🔮 Phase 4 – What’s Next

  • Re-enable tuned ECS health checks

  • Add blue/green deployments

  • Tighten ALB success codes to 200

  • Enhance GitHub Actions with pre-success ECS/ALB checks


✅ Conclusion

This was real-world DevOps: unblock → iterate → stabilize.

💻 Repo: GitHub
🌐 App: Live FeedbackHub (May be snoozing to save the AWS bill!)
🔗 LinkedIn: Connect with me (Come for the DevOps talk, stay for the ECS nap jokes)

0
Subscribe to my newsletter

Read articles from Deepak Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Deepak Kumar
Deepak Kumar