How to Set Up Kafka and ZooKeeper with High Availability on AWS ECS

Letโs now dive into a practical deployment guide for running Kafka and ZooKeeper in a containerized, HA-ready setup using AWS ECS (EC2 or Fargate).
๐ฆ Architecture Overview
We will deploy:
Apache ZooKeeper cluster (3 nodes for HA quorum)
Apache Kafka brokers (3 brokers for replication and load balancing)
Each broker runs in its own ECS task
Internal networking via ECS Service Discovery
Data persistence with Amazon EBS (EC2) or EFS (Fargate)
+----------------------+
| Kafka Clients |
+----------------------+
|
โโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ AWS ECS Cluster โ
โ (Kafka + ZooKeeper) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
+-------------------+ +-------------------+
| Kafka Broker 1 | | Kafka Broker 2 |
| zookeeper:2181 | | zookeeper:2181 |
+-------------------+ +-------------------+
| |
+-------------------+ +-------------------+
| ZooKeeper Node 1 | | ZooKeeper Node 2 |
+-------------------+ +-------------------+
๐งฑ Step-by-Step: Kafka & ZooKeeper on ECS
1. ๐ Containerize Kafka and ZooKeeper
Use Bitnami or Confluent Docker images (or build your own):
# docker-compose.yml (for local testing)
version: '3'
services:
zookeeper:
image: bitnami/zookeeper:3.9
ports:
- "2181:2181"
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
- ZOO_MY_ID=1
- ZOO_SERVERS=server.1=zookeeper:2888:3888
kafka:
image: bitnami/kafka:3.6
ports:
- "9092:9092"
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- ALLOW_PLAINTEXT_LISTENER=yes
Push these images to Amazon ECR.
2. ๐ ๏ธ Create ECS Cluster (Fargate or EC2)
Use ECS Console or IaC (Terraform/CloudFormation)
For HA, prefer EC2 + Auto Scaling Group
Ensure subnets span multiple AZs
Attach ECS instances to a shared security group
3. ๐ Set Up IAM, Security Groups & Networking
ECS task role with permissions for:
ECR pull
CloudWatch Logs
Security Groups:
Kafka โ allow ports
9092
ZooKeeper โ allow ports
2181
,2888
,3888
Enable ECS Service Discovery (via AWS Cloud Map or Route 53)
4. ๐ข Deploy ECS Services
Deploy each Kafka and ZooKeeper instance as a separate ECS service:
๐ ZooKeeper:
Desired count: 3 (for quorum)
Static DNS names via ECS Cloud Map (
zookeeper1.service.local
, etc.)
๐ต Kafka:
Desired count: 3
Each configured with:
KAFKA_CFG_BROKER_ID
KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper1.service.local:2181,...
Sample ECS Task Definition Snippet (Kafka)
{
"containerDefinitions": [
{
"name": "kafka-broker",
"image": "your_account_id.dkr.ecr.region.amazonaws.com/kafka:latest",
"essential": true,
"portMappings": [
{ "containerPort": 9092, "hostPort": 9092 }
],
"environment": [
{ "name": "KAFKA_CFG_BROKER_ID", "value": "1" },
{ "name": "KAFKA_CFG_ZOOKEEPER_CONNECT", "value": "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181" },
{ "name": "ALLOW_PLAINTEXT_LISTENER", "value": "yes" }
]
}
],
"requiresCompatibilities": ["EC2"],
"memory": "2048",
"cpu": "1024"
}
๐ก Tip: Use
placementConstraints
to spread tasks across Availability Zones for HA.
5. ๐พ Enable Data Persistence
ZooKeeper & Kafka require persistent storage
Use EBS volumes mounted to ECS tasks (if EC2-backed)
Or use Amazon EFS with Fargate
6. ๐งช Validate the Cluster
Kafka CLI or client library can produce/consume:
kafka-console-producer.sh --broker-list kafka1:9092 --topic test kafka-console-consumer.sh --bootstrap-server kafka1:9092 --topic test --from-beginning
Use CloudWatch Logs
kafka-topics.sh
and ZooKeeper CLI tools for health checks
๐งฏ High Availability Best Practices
Component | HA Strategy |
ZooKeeper | Minimum 3 nodes for quorum; spread across AZs |
Kafka Brokers | 3+ brokers with replication (min.insync.replicas=2 ) |
Storage | EBS or EFS with redundancy |
ECS | Auto Scaling across AZs |
DNS | Use Cloud Map or Route 53 for service discovery |
Monitoring | CloudWatch, Kafka Exporter + Prometheus |
๐ Conclusion
Apache Kafka and ZooKeeper bring critical capabilities to microservices: event streaming, durability, and fault-tolerant coordination. Deploying them on AWS ECS with high availability ensures
Scalability across services
Resilience to AZ or task failures
Decoupled, event-driven communication
With ECS, you offload much of the orchestration while gaining flexibility to manage Kafka clusters as containerized servicesโwithout locking into managed services too early.
Subscribe to my newsletter
Read articles from Naren Malireddy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Naren Malireddy
Naren Malireddy
Hi, Iโm Narendra Reddy Malireddy โ or just Naren. Iโm a principal architect with over 20+ years of experience designing and delivering large-scale software and infrastructure solutions across the retail, finance, and tech sectors. My journey spans computer networks, cloud platforms, and DevOps โ and today, I specialize in helping organizations build secure, scalable, and high-performing systems, whether thatโs on-prem, in the cloud, or in hybrid environments. What drives me is the intersection of technology and business impact. I focus on enterprise IT architecture, cloud transformation (AWS, Azure, GCP), and DevSecOps โ always with an eye on security, efficiency, and long-term sustainability. Certified as a Cloud Architect and a SAFeยฎ 6 Practitioner, Iโm experienced in leading cross-functional teams within Agile and Scaled Agile frameworks. I pride myself on turning complex business challenges into future-ready, cost-effective technical solutions that move the needle. ๐ Some of my key strengths: Multi-region cloud architecture (AWS, Azure, GCP) CI/CD, Kubernetes, and secure DevOps/DevSecOps practices Identity, compliance, and threat detection in cloud-native environments Agile delivery using SAFe, ITIL, and Six Sigma Strategic leadership and stakeholder alignment during digital transformations Beyond just implementing technology, I care deeply about delivering measurable outcomes and building strong, lasting partnerships.