kafka for beginner

Let’s start by answering the question “What is Kafka?”.
Apache Kafka is an open-source stream processing platform used for building real-time data pipelines and streaming applications. It is a highly scalable, fault-tolerant distributed system.

Why we need Kafka and how it solves problems?

Traditional Approach: Used Databases and Message Queues
-In many traditional systems, you might use a relational database or a message queue to handle communication between components or to store and process data. However, these approaches can have limitations when dealing with real-time and high-throughput data.

How Kafka is Used in Real-World Systems like Uber and Ola

Traditional System if Kafka not Use:
- The Passenger and Driver location updates could be sent through API calls to a centralized system or database.
- Real-time ride matching would involve querying a database to check for available drivers, which may involve significant delays in high-traffic areas.
- Notifications would need to be queued and sent at a later time (or when the conditions are met), and delays in processing may occur during high demand.
Kafka-Enabled System:
- Driver and Passenger locations are continuously sent to Kafka topics (e.g., driver-location and passenger-location).
  - Each location update from both the driver and the passenger is published as an event in Kafka, which is then persisted for a configurable time.
- Real-time Streaming: The location data is now processed in real-time as an event stream, and Kafka allows high-throughput without requiring API calls to database.
- No Delay in Location Updates: The system doesn't need to wait for API calls or database updates to be processed. The data is immediately available in Kafka topics and can be consumed by different systems.

Apache Kafka Architecture:

Apache Kafka Components:

1. Kafka Producer

The producer is the source of data. It does not send messages directly to consumers rather pushes messages to Kafka Server or Broker. The messages or data are stored in the Kafka Server or Broker. Multiple producers can send a message to the same Kafka topic or different Kafka topics.

2. Kafka Consumer

The consumer acts as the Receiver. It is responsible for receiving or consuming a message. But It does not consume or receive a message directly from Kafka Producer. Kafka Producer pushes messages to Kafka Server or broker.

3. Kafka Broker

A broker is a server that is responsible for storing, managing, and delivering Kafka messages to consumers. Broker is also responsible for provide the message to consumer.
Kafka topics are divided into partitions, and each broker stores one or more partitions. Each partition is replicated across multiple brokers to ensure fault tolerance and high availability.
The consumer can then retrieve data from these partitions by connecting to one or more brokers.

4. Kafka Topic

A Topic is a logical channel to which Kafka producers send data and from which consumers read data. Kafka topics are split into partitions for scalability.
Kafka Topic is a unique name given to a data stream or message stream. It is a structure similar to the queues found in Kafka’s databases or message queues, which is accessible to all brokers as soon as the data is written. They are named by the user. There can be thousands of topics in a Kafka cluster.

4. Kafka Partition

A Partition is a unit of parallelism in Kafka. Kafka topics are divided into partitions to allow for parallel processing and to increase the throughput of the system.
Partitions allow Kafka to distribute the data across multiple brokers. Each partition is stored on a different broker in a Kafka cluster

5. Kafka Consumer Group

A Consumer Group is a collection of consumers that work together to read data from Kafka topics. Each consumer in a group reads data from different partitions of the topic to provide parallel processing.
Kafka tracks the offsets for each consumer group to keep track of which messages have been processed, so the group can resume reading from the last acknowledged message if any consumer fails.

6. Kafka Zookeeper

Zookeeper is a distributed coordination system that Kafka uses to manage its metadata and keep track of the cluster's state. Kafka relies on Zookeeper to manage broker metadata (which brokers exist, which partitions are assigned to which brokers, etc.).
Zookeeper ensures that the Kafka cluster remains highly available and consistent, even in the case of broker failures.

Setting Up a Kafka Cluster on Windows 11 with Docker Compose

Stage 1: Install and Set Up Docker Desktop on Your Windows Machine

Step 1: To Run Docker you need to install WSL on your machine before setup Docker

Step 2: Install WSL (Windows Subsystem for Linux)

To run Docker, you need to install WSL. Open PowerShell and run the following command:
wsl.exe --install
To verify if WSL is installed successfully, run the following commands in command prompt:
wsl --version
wsl --help

Step 3: Enable Windows Features for WSL

To enable the necessary Windows features for WSL, open the Search menu, type “Turn Windows features on or off,” and select it.

In the window that opens, ensure that the following features are enabled:
- Windows Subsystem for Linux
Click OK and restart your computer if prompted.

Step 4: Check if Virtualization is Enabled

You need to ensure that virtualization is enabled on your machine. If it's not enabled, you’ll need to turn it on in the BIOS/UEFI settings.

To check if virtualization is enabled:

Open Task Manager (you can do this by pressing Ctrl + Shift + Esc).
Go to the Performance tab.
Under the CPU section, check if Virtualization is listed as "Enabled." (Refer to the screenshot below for guidance.)

Step 5: Install Docker Desktop

Go to the official Docker website and download the Docker Desktop installer using the following link:
Docker Desktop for Windows
Once the installer has finished downloading, navigate to your Downloads folder and run the Docker installer.
After the installation is finished, Docker will be installed and ready to use on your machine.

Step 6: Restart Your Device and Verify Docker Installation

Restart your computer to ensure Docker is properly set up.
After the restart, open a terminal (Command Prompt or PowerShell) and run the following commands to verify that Docker was installed successfully:
- docker --version
- docker info

If Docker is installed correctly, these commands should display information about your Docker installation.

Install and Run Kafka with Docker Desktop Using Docker Compose

Step 1: Set Up Kafka Using a docker-compose.yml File
To set up Kafka, we will use a docker-compose.yml file to deploy Kafka and Zookeeper. Below is a simple Docker Compose file that defines the services:

version: '3'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    container_name: zookeeper
    ports:
      - "2181:2181"

  kafka:
    image: wurstmeister/kafka
    container_name: kafka
    ports:
      - "9093:9093"
    environment:
      KAFKA_ADVERTISED_LISTENER: INSIDE://kafka:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL: PLAINTEXT
      KAFKA_LISTENER_INTERNAL: INSIDE://kafka:9093
      KAFKA_LISTENER_EXTERNAL: OUTSIDE://localhost:9093
      KAFKA_LISTENER_PORT: 9093
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    depends_on:
      - zookeeper

Zookeeper: Kafka depends on Zookeeper for cluster coordination. This service uses the wurstmeister/zookeeper image and exposes port 2181.
Kafka: This service uses the wurstmeister/kafka image and exposes port 9093. The environment variables configure Kafka to connect to the Zookeeper service.

Step 2: Run Kafka and Zookeeper Using Docker Compose

Save the docker-compose.yml file on your system.
Open Command Prompt or PowerShell, and navigate to the directory where the docker-compose.yml file is located.
Run the following command to start Kafka and Zookeeper in detached mode:
```
 docker-compose -f file_name.yml up -d
```

This command will start both Kafka and Zookeeper services as defined in the docker-compose.yml file.

Step 3: Create Kafka Components for Data Processing
To create Kafka components like topics, partitions, replication, producers, and consumers, follow these steps:

First, enter the Kafka container by running the following command:
```
 docker exec -it <containerID> /bin/sh
```
Replace <containerID> with the actual ID or name of the Kafka container. You can find the container ID by running docker ps.
Once inside the container, navigate to the Kafka bin directory. This is typically located at:
```
 cd /opt/kafka/bin/
```
In the bin directory, you'll find all the necessary Kafka file for managing topics, partitions, replication, producers, and consumers. You can use these file to configure and interact with your Kafka cluster.

Step 4: To create Kafka topics, configure replication and partitions, produce messages, and consume messages, run the following commands:

Create a Kafka Topic
This command creates a new topic named "delivery" with one partition and a replication factor of 1:
```
 ./kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic delivery
```
List All Kafka Topics
To list all available Kafka topics:
```
 ./kafka-topics.sh --list --zookeeper zookeeper:2181
```

Produce Messages to Kafka Topics
Send a message to the bank-transactions topic:

 echo "your bank data" | ./kafka-console-producer.sh --broker-list localhost:9092 --topic bank-transactions

Send a message to the delivery topic:

 echo "your location data" | ./kafka-console-producer.sh --broker-list localhost:9092 --topic delivery

Consume Messages from Kafka Topics
To consume messages from the delivery topic starting from the beginning:
```
 ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic delivery --from-beginning
```

Produce Another Message to the Bank Transactions Topic
To send another message to the bank-transactions topic:

 echo "your bank account is null" | ./kafka-console-producer.sh --broker-list localhost:9092 --topic bank-transactions

If you encounter any issues during setup or have additional questions, check out the official Kafka Documentation or refer to community forums and resources for troubleshooting.

In the next blog, I will walk you through setting up a complete end-to-end project for a distributed system, integrating Kafka with other components to build a robust data pipeline.

Stay tuned for the next update, where we will dive deeper into distributed architectures and practical use cases.

Don't forget to follow me for future updates on!
LinkedIn

Apache Kafka Architecture and Components: Setting Up with Docker Compose on Windows 11

Why we need Kafka and how it solves problems?

How Kafka is Used in Real-World Systems like Uber and Ola

Apache Kafka Architecture:

Apache Kafka Components:

1. Kafka Producer

2. Kafka Consumer

3. Kafka Broker

4. Kafka Topic

4. Kafka Partition

5. Kafka Consumer Group

6. Kafka Zookeeper

Setting Up a Kafka Cluster on Windows 11 with Docker Compose

Install and Run Kafka with Docker Desktop Using Docker Compose

Subscribe to my newsletter

Dixit Rathi

Dixit Rathi

Apache Kafka Architecture and Components: Setting Up with Docker Compose on Windows 11

Why we need Kafka and how it solves problems**?**

How Kafka is Used in Real-World Systems like Uber and Ola

Apache Kafka Architecture:

Apache Kafka Components:

1. Kafka Producer

2. Kafka Consumer

3. Kafka Broker

4. Kafka Topic

4. Kafka Partition

5. Kafka Consumer Group

6. Kafka Zookeeper

Setting Up a Kafka Cluster on Windows 11 with Docker Compose

Install and Run Kafka with Docker Desktop Using Docker Compose

Subscribe to my newsletter

Dixit Rathi

Dixit Rathi

Why we need Kafka and how it solves problems?