Cloud Event-Driven architecture


Definition of event-driven architecture
As the name suggests, event-driven, it is a paradigm of building highly scalable applications where each action is based on an event that has occurred in the system, which is triggered externally or internally as part of the system.
Overview of cloud-native applications
Cloud native apps are the systems that are specifically designed to be deployed and run in a cloud based environment where the environment on which the application run can dynamically scale.
Microservices are the abstracted apps that are deployed in cloud environment that are dedicated to perform a particular task. These are modular components that have interdependencies on infrastructure like databases, caches or other services to do their work. But they are designed to scale independently.
Characteristics of cloud-native apps
Software-as-a-service: These apps are not usually delivered, instead deployed at a provider’s place, providing functionalities as a service.
Microservice based: Made up of multiple services that are intended to do a particular task in the application.
Containerized: Docker images based systems that package code and all its dependencies making it platform independent.
Dev-ops based: Allows to be deployed in a CI/CD(Continuous integration/ Continuous deployment) configuration.
APIs: REST/gRPC based standard communication bringing in modularity between services.
Mostly stateless: Apps on their own don’t need to hold the state of any application making it easy to replace.
Scalable: Its stateless ability makes it scalable using replicas of the same instance.
Resilient and Fault-Tolerance: As these apps run on distributed systems, it makes it highly resilient and fault-tolerant on node outages.
Observable: App’s performance can be monitored individually.
Maintainability: Apps are modular and adding new apps to the system is easy.Role of message queues in event-based systems
Role of message queues in event-based systems
Message queues are conveyer belt of the whole “factory of services”. It is the backbone of event driven architectures.
Message queues are scalable systems that act as an intermediate between services to talk with each other. They are designed to be fault tolerant, cope up with large amount of messages which are sent, and intelligent enough to hold and distribute messages based on its usage by the system.
If it was an end-to-end (request-reply) design where in each service is sending and receiving messages directly like REST/gRPC, with each other, and with number of services increasing linearly, the messages shared between services will grow exponentially making it hard for the system to handle the load reducing throughput and slowing down responses and actions.
Fundamentals of Event-Based Architecture
Event driven approach’s main purpose is scalability from Applications business end.
From a development end, its main purpose is ability for teams to build services which are loosely coupled, asynchronous behaviour and modularity.
Key components
High level Components:
Message queue: Appliance on which other components relay messages to each other.
Producers/Publishers: Components that create the events which are sent to the message queue.
Subscribers/Consumers : Components that read the events from the message queue.
Contracts: Understanding between component owners on what they will produce or consume for the purpose of building the application.
Event: The actual message that are transferred between the Publisher and Subscriber. These can be JSON, Protobufs, Primitive data types and many others.
Benefits and challenges
This architecture having multiple advantages, also has many drawbacks like any other implementation.
Benefits are scalability, modularity, loosely coupled components.
Challenges occur when systems are not correctly designed or infra hurdles that can occur due to resource constraints or other limitations.
Not correctly designed systems in event driven architecture mostly lead to huge tech debts, lot of bugs, and lot of back and forth with service owners.
Comparison with traditional request-response models
Traditional request-response are point-to-point communication usually using REST or gRPC.
REST usually has extra overheads on each request but gRPC are lighter as compared to, but both are quite similar.
Problem with these are that they can get overwhelming when the number of communicating messages or numbers of services increase. Each message needs to travel independently leading to stress on the network, here the k8s network.
On the other hand, message queue keeps messages on the same queue, like a river on which multiple households have their intake pipes dipped in from the river bank.
Advantages of message queues:
Services are not responsible to handle the message movements.
If a service which had to take in the messages go down at a point in time, the messages in the queue wait until the service comes back up. In request-response model, those messages that were sent would be lost.
Use cases in modern applications
Event driven architecture are useful when the application is acceptable to be asynchronous.
This is because messages are processed in queues, first-come-first-serve. If the architecture demands immediate action on a request then this architecture may not suitable.
Uses cases:
Analytical services for any app: analytics will never require immediate processing for any events/messages that occur.
Requests that can take time to be processed: like, registering the time of a netflix show at which user is watching to backend - so it can start from that point in time next time when its played, delays are acceptable.
Jobs based workloads: Actions on systems that are external to the application - like Enterprise onprem components talking to the their cloud services.
ChatGPT’s Image generation: As workloads run async, event driven is one of ideal ways to schedule tasks to internal services
Cab faring geo location services: Delay on exact location of cabs in cab fairing services is acceptable to users up to 30-40 seconds.
Applications that should store history of events/messages - for Machine learning and AI training
Examples where it shouldn’t be used:
Chatting applications: Users will feel the delay if messaging backend is using message queues.
Trading: As immediate processing is not present, asset trading would be an issue as things would run async. These apps need milliseconds precision and message queues won’t be able to handle that.
Overview of Apache Kafka
Kafka is open-source project from Apache for implementing Message queues.
Introduction to Apache Kafka
Apache Kafka is the message queue between services, applications, appliances, devices and many more.
When building scalable containerized applications, we use Kubernetes. Kubernetes is one of the places ideal to use event-driven architecture because of its scalable nature.
Kafka can run as service in Kubernetes itself, or else it can be deployed as its own cluster for higher performance and throughput.
When deploying on cloud, it is recommended to use managed services from the provider or use confluent Kafka as configuration can be an overhead. It should be owned by the infrastructure team in the org to maintain it just like any other infrastructure components.
Kafka's architecture
In Kafka, messages are segregated based on something called “Topics”. These are abstraction of lanes on the queue to pass messages. Messages/Events are tagged with the Topic “Name” to which it will be sent onto the wild for which other “Subscribers” can pick it up and read upon.
Coming to Subscribers, Kafka Message Communication is between 2 entities, one the producer - the one that actually triggers the event in turn starting the message, and the one that consumes it - called consumer. Subscribers are the Consumer that “Opt-In” to read the messages that are part of a Topic. Every message that comes with the Topic is sent to the Subscribers of that Topic.
This model is Highly scalable in the cloud world simply because, there is a stream of data in the center that is being handled by an entity(Kafka) from which individual services can pick the message from the topic they want to have some action upon, and do their work.
Kafka or any message Queues have their own software infrastructure to function. It is a distributed system to manage events. Main component called the Zookeeper, which is a distributed system handler that works to maintain the leader of the Kafka cluster. Kafka Brokers are the components that actually handle the movement of the messages.
AWS provides MSK - Managed Streaming for Apache Kafka Services on EKS(Elastic Kubernetes Service)
Consumer groups
Consumer groups is a way of combining event subscribers into a set where a particular event to the group is sent to only one of the consumer of the group where we consider the group as the consumer.
This helps us avoid duplication of consuming same event in the application.
In Kubernetes we have concept of replicas which are used for scaling the application. Replicas of same consumer service can be put in a single consumer group, which helps in scaling the app when events being produced increases over a threshold.
Partitions
Topics are divided into partitions which are used for scalability, fault tolerance and parallel processing. Each consumer can latch onto a partition of a topic.
Designing the architecture
Event based architecture should be implemented where data trend needs attention, speed isn’t the main goal, modularity of services are a concern, services don’t care about the response for a request.
Architecture has to be fault tolerant, with defensive code, being less aggressive on action and more careful on considering the event can be wrong or missed.
Playing around with Kafka in Kubernetes: Step-by-step guide
Lets have a small event-driven tutorial.
For this tutorial, start with a kind cluster and deploy Kafka using the Kafka Kube manifest.
Set up two Python streaming services: 1 producer and 3 consumers.
Consumer service will have 3 replicas, demonstrating Kafka's consumer groups in action.
Below are the YAML manifests for Zookeeper and Kafka along with sample Python code (producer/consumer), Dockerfiles, and Kubernetes manifests to deploy with replicas.
Start with having a K8s cluster, using kind here
kind create cluster
After successfully installing kind
kubectl get nodes
Keep your files as such:
eventdriven
├── consumer
│ ├── consumer.py
│ ├── deploy.yaml
│ └── Dockerfile
├── kafka
│ ├── kafka.yaml
│ └── zookeeper.yaml
├── Makefile
└── producer
├── deploy.yaml
├── Dockerfile
└── producer.py
Makefile
PRODUCER_DIR=producer
CONSUMER_DIR=consumer
REGISTRY=eventdriven
.PHONY: all build-producer build-consumer deploy-apps clean-images deploy-kafka
all: build-producer build-consumer deploy-apps
build-images: build-producer build-consumer
build-producer:
docker build -t $(REGISTRY)/producer $(PRODUCER_DIR)
build-consumer:
docker build -t $(REGISTRY)/consumer $(CONSUMER_DIR)
load-images-to-kind: build-images
kind load docker-image $(REGISTRY)/producer
kind load docker-image $(REGISTRY)/consumer
deploy-apps: deploy-consumer deploy-producer
kubectl apply -f $(PRODUCER_DIR)/deploy.yaml
kubectl apply -f $(CONSUMER_DIR)/deploy.yaml
deploy-producer: load-images-to-kind
kubectl apply -f $(PRODUCER_DIR)/deploy.yaml
deploy-consumer: load-images-to-kind
kubectl apply -f $(CONSUMER_DIR)/deploy.yaml
delete-deploy-apps:
kubectl delete -f $(PRODUCER_DIR)/deploy.yaml
kubectl delete -f $(CONSUMER_DIR)/deploy.yaml
deploy-zookeeper:
kubectl apply -f kafka/zookeeper.yaml
deploy-kafka:
kubectl apply -f kafka/kafka.yaml
delete-kafka:
kubectl delete -f kafka/zookeeper.yaml
kubectl delete -f kafka/kafka.yaml
delete-all: delete-deploy-apps delete-kafka
Zookeeper Deployment (zookeeper.yaml):
apiVersion: v1
kind: Service
metadata:
labels:
app: zookeeper
name: zookeeper-service
spec:
type: ClusterIP
ports:
- name: client
port: 2181
targetPort: 2181
selector:
app: zookeeper
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: zookeeper
name: zookeeper
spec:
replicas: 1
selector:
matchLabels:
app: zookeeper
template:
metadata:
labels:
app: zookeeper
spec:
containers:
- name: zookeeper
image: wurstmeister/zookeeper
imagePullPolicy: IfNotPresent
ports:
- containerPort: 2181
Kafka Deployment (kafka.yaml):
apiVersion: v1
kind: Service
metadata:
labels:
app: kafka
name: kafka-service
spec:
ports:
- port: 9092
selector:
app: kafka
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: kafka
name: kafka
spec:
replicas: 1
selector:
matchLabels:
app: kafka
template:
metadata:
labels:
app: kafka
spec:
hostname: kafka
containers:
- name: kafka
image: wurstmeister/kafka
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9092
env:
- name: KAFKA_BROKER_ID
value: "1"
- name: KAFKA_ZOOKEEPER_CONNECT
value: zookeeper-service:2181
- name: KAFKA_LISTENERS
value: PLAINTEXT://:9092
- name: KAFKA_ADVERTISED_LISTENERS
value: PLAINTEXT://kafka-service:9092
Python Producer (producer.py):
import sys
from kafka import KafkaProducer
import json
import time
import os
import logging
KAFKA_BROKER = os.environ.get("KAFKA_BROKER", "kafka-service:9092")
TOPIC = os.environ.get("KAFKA_TOPIC", "demo-topic")
logging.info("Starting Consumer...")
producer = KafkaProducer(
bootstrap_servers=[KAFKA_BROKER],
value_serializer=lambda v: json.dumps(v).encode("utf-8")
)
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
i = 0
while True:
msg = {"number ": i}
producer.send(TOPIC, msg)
logging.info(f"Produced: {msg}")
time.sleep(5)
i += 1
Dockerfile for Producer
FROM python:3.9-slim
WORKDIR /app
COPY producer.py .
RUN pip install kafka-python
CMD ["python", "producer.py"]
Producer deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: producer
spec:
replicas: 1
selector:
matchLabels:
app: producer
template:
metadata:
labels:
app: producer
spec:
containers:
- name: producer
imagePullPolicy: Never
image: eventdriven/producer:latest
env:
- name: KAFKA_BROKER
value: kafka-service:9092
- name: KAFKA_TOPIC
value: demo-topic
Python Consumer (consumer.py):
from kafka import KafkaConsumer
import json
import os
import logging
import sys
KAFKA_BROKER = os.environ.get("KAFKA_BROKER", "kafka-service:9092")
TOPIC = os.environ.get("KAFKA_TOPIC", "demo-topic")
GROUP = os.environ.get("KAFKA_GROUP", "demo-group")
# Set up logging to stdout so kubectl logs can capture it
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logging.info("Starting Consumer...")
consumer = KafkaConsumer(
TOPIC,
group_id=GROUP,
bootstrap_servers=[KAFKA_BROKER],
auto_offset_reset="earliest",
value_deserializer=lambda m: json.loads(m.decode("utf-8")),
enable_auto_commit=True
)
for msg in consumer:
logging.info(f"Consumed: {msg.value}")
Dockerfile for Consumer
FROM python:3.9-slim
WORKDIR /app
COPY consumer.py .
RUN pip install kafka-python
CMD ["python", "consumer.py"]
Consumer deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: consumer
spec:
replicas: 3
selector:
matchLabels:
app: consumer
template:
metadata:
labels:
app: consumer
spec:
containers:
- name: consumer
imagePullPolicy: Never
image: eventdriven/consumer:latest
env:
- name: KAFKA_BROKER
value: kafka-service:9092
- name: KAFKA_TOPIC
value: demo-topic
- name: KAFKA_GROUP
value: demo-group
Commands
make deploy-zookeeper
# wait for zookeeper to come up
make deploy-kafka
# these deployments will take time as they need to pull the images
Kafka ready
Bringing up the apps
make deploy-apps
# this takes time as docker needs to build the apps
Once apps are up - Check logs using
kubectl logs <pod-name> -f
Column 1: producer logs
Column 2: Consumer 1 logs
Column 3: Consumer 2 logs
Column 4: Consumer 3 logs
As you can see, all events are only being read by first consumer. This is because the demo-topic by default is configured with 1 partition, hence only one consumer can latch on to it.
To configure partition, creating 3 partitions
kubectl exec -it <kafka-pod> -- bash
root@kafka:/# kafka-consumer-groups.sh --describe --group demo-group --bootstrap-server localhost:9092
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
demo-group demo-topic 0 33 34 1 kafka-python-2.2.15-4773fc7e-6f67-4799-9dab-9364e9e8b7aa /10.244.0.111 kafka-python-2.2.15
Configuring 3 partition
root@kafka:/# kafka-topics.sh --alter --topic demo-topic --partitions 3 --bootstrap-server localhost:9092
After around a minute, check the group
root@kafka:/# kafka-consumer-groups.sh --describe --group demo-group --bootstrap-server localhost:9092
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
demo-group demo-topic 2 7 7 0 kafka-python-2.2.15-f0a58b19-5c29-4f66-9ddd-9da4e4d94573 /10.244.0.113 kafka-python-2.2.15
demo-group demo-topic 0 66 66 0 kafka-python-2.2.15-4773fc7e-6f67-4799-9dab-9364e9e8b7aa /10.244.0.111 kafka-python-2.2.15
demo-group demo-topic 1 2 2 0 kafka-python-2.2.15-8c5fefd1-d653-4b27-af1c-aa13e6b60032 /10.244.0.110 kafka-python-2.2.15
Events starts flowing to other consumers as well
Event flow video, Producer send —> Kafka distributes to Partition 0, then Partition 1 and later Partition 2
Cleaning everything
make delete-all
Challenges faced
The above tutorial is simple way to see events moving to begin with, but in a big complex mesh of services, things can get very complex because of the movement of data between services.
Few of the issues like,
Event loss
Data corruption
Schema violation
Event Process failure at consumer
Can lead to time-consuming debugging.
And Kafka operational maintenance at scale is also a headache, depending on the replication and retention, your cloud bills can get huge.
P.S., Application architecture plays a key role on how smoothly you can deploy and maintain your services on the cloud to have a peaceful weekends as a software engineer.
Conclusion
Event-driven architecture enables systems to be scalable, resilient, and responsive, making it a popular choice across the tech industry. Its loosely-coupled design is beginner-friendly and future-proof. Adopting this approach helps teams build robust applications that adapt to changing needs.
Subscribe to my newsletter
Read articles from Abhiram Puranik directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
