Part 1: 3 ECS Health Check Mistakes I’ll Never Make Again

Alamin IslamAlamin Islam
3 min read

This is Part 1 of 5 in my series on keeping ECS deployments rock-solid — covering best practices, hidden pitfalls, and the sneaky issues that cause downtime.

After nearly 2 years in DevOps, one thing is clear:
A “running” ECS task doesn’t always mean a healthy app.

Early on, I deployed a service to Amazon ECS.
Everything looked perfect — tasks running, console green — until I tried to open the app.
Blank screen. Endless loading. Sometimes a nice 503 or 502 for variety.

That’s when I realized: Health checks aren’t just a tick-box — they’re critical to application reliability.

Over time, I’ve seen (and fixed) the same few issues again and again.
Here are 3 ECS health check mistakes to avoid.


1️⃣ Wrong Health Check Path

The Problem:
If your ALB health check is pointing to / but your app’s real “I’m alive” route is /health, you’re in trouble.
The ALB might think everything’s fine even if the important parts of your app are broken.
In my case, / was returning a redirect, so the ALB marked the service unhealthy and ECS kept restarting it.

Fix:

  • Set the target group health check path to a dedicated /health endpoint.

  • Make sure /health returns a fast 200 OK without relying on slow upstreams.

  • If you have critical dependencies (like a database), include lightweight checks for them too.


2️⃣ Not Allowing for Startup Time

The Problem:
During one deployment, my ALB started checking the new ECS tasks before the app was ready.
The result? It marked them unhealthy almost instantly, ECS killed them, and I ended up in a restart loop.

Fix:

  • Increase the health check timeout and healthy threshold to give your app breathing room.

  • In Kubernetes terms, think of this like a startup probe — don’t respond to the ALB until you’re ready.

  • If the app takes ~30 seconds to start, configure health checks so they won’t fail until at least then.


3️⃣ No Alerts for Unhealthy Targets

The Problem:
Without alerts, your ALB could be quietly flagging unhealthy targets and you’d never know.
This happened to me once, and it took a user report to even realize anything was wrong.

Fix:

  • Create a CloudWatch alarm on the UnHealthyHostCount metric.

  • Send notifications via SNS to Slack, email, or PagerDuty.

  • Test it by breaking a container on purpose and confirming the alert fires.


💡 Bonus Tip:
During rolling updates, bad health check settings can drain old tasks before new ones are ready — leaving the ALB with nothing to route to.
Always test rolling updates in staging with health checks enabled.


Final Thought

Green dashboards can lie.
Real reliability comes from health checks that are:

  • Well-configured

  • Tested in staging

  • Backed by good alerting

So the next time ECS says Running, make sure it actually means Healthy.

0
Subscribe to my newsletter

Read articles from Alamin Islam directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alamin Islam
Alamin Islam