Step-by-Step Guide: How to Set Up a Kafka Cluster for High-Performance Distributed Data Processing

blog best suited for Ubuntu 22.04. For different Linux distributions, some commands may vary. Users need to check commands for other Linux distributions.

Kafka & Zookeeper Installation Steps :-

In this blog, We are setting up 3 node clusters on Ubuntu 22.04.

  1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache Kafka website. ( https://kafka.apache.org/downloads )

  2. Need to have Java installed before installing Kafka:- (skip if already installed)

// update the apt repository
sudo apt-get update

// install jdk in local system
sudo apt install default-jdk -y

// check for java version
sudo java --version
  1. Create & export Java profile:-
// create java profile
echo 'JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64/"' >> /etc/profile 

// export java profile as env variable
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64/"
  1. Create folders for Kafka & Zookeeper:- (/opt/data/ is our installation Dir.)
// go to root directory.
cd

// create folder in root directory.
sudo mkdir opt

// navigate to opt and create data folder.
sudo mkdir data

// navigate to data folder and create kafka & zookeeper folder.
sudo mkdir kafka zookeeper

// navigate to kafka folder
cd kafka/

// install Kafka & Zookeeper from the website link
sudo wget https://downloads.apache.org/kafka/3.6.1/kafka_2.12-3.6.1.tgz

// extract the downloaded .tgz file
sudo tar xzf kafka_2.12-3.6.1.tgz

// move extracted kafka.tgz folder to kafka dir => /opt/data/kafka
sudo mv kafka_2.12-3.6.1/* /opt/data/kafka
  1. Create files with zookeeper.service in systemd:-
// Create file with name "zookeeper.service" in this dir "/etc/systemd/system/"
sudo nano /etc/systemd/system/zookeeper.service 

// copy the below content in the above created file.
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/opt/data/kafka/bin/zookeeper-server-start.sh /opt/data/kafka/config/zookeeper.properties
ExecStop=/opt/data/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target
  1. Create files with kafka.service in systemd:-
// Create file with name "kafka.service" in this dir "/etc/systemd/system/"
sudo nano /etc/systemd/system/kafka.service

// copy the below content in the above created file.
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
ExecStart=/opt/data/kafka/bin/kafka-server-start.sh /opt/data/kafka/config/server.properties
ExecStop=/opt/data/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target
  1. Daemon reload to load Kafka and zookeeper in systemctl:-
// reload daemon so that kafka & zookeeper 
sudo systemctl daemon-reload
  1. Navigate to /opt/data/ and create a folder with the name zookeeper.
// Navigate to /opt/data/zookeeper and create file with myid.
cd /opt/data/zookeeper

// create myid value 1 (in server 1)
echo '1' > myid

// create myid value 2 (in server 2)
echo '2' > myid

// create myid value 3 (in server 3)
echo '3' > myid
  1. Update the zookeeper configuration file in zookeeper.properties.
// Navigate to the folder and edit the file 
sudo nano /opt/data/kafka/config/zookeeper.properties

// in the zookeeper.propeties file edit the below fields 
################### CONFIG_START ################

dataDir=/opt/data/zookeeper
clientPort=2181
admin.enableServer=false
maxClientCnxns=300
tickTime=2000
server.1=10.103.5.7:2888:3888 // server ip address of the node-1.
server.2=10.103.5.8:2888:3888 // server ip address of the node-2.
server.3=10.103.5.9:2888:3888 // server ip address of the node-3.
initLimit=40
syncLimit=20

################### CONFIG_END ################

Note:- Add Same Configuration Of Zookeeper In All Nodes. Nothing Specific Changes For Nodes.
  1. Update the Kafka configuration file in server.properties.
// Navigate to the folder and edit the file 
sudo nano /opt/data/kafka/config/server.properties

// in the server.propeties file edit the below fields
################### CONFIG_START ################

// in the server basics block update the brocker id.
broker.id=0 // update broker.id=0 in node-1 server.properties files

broker.id=1 // update broker.id=1 in node-2 server.properties files

broker.id=2 // update broker.id=2 in node-3 server.properties files

// in the Socket Server Settings block update the listeners.
listeners=PLAINTEXT://10.103.5.7:9092 // update node ip address in node-1 server.properties files

listeners=PLAINTEXT://10.103.5.8:9092 // update node ip address in node-2 server.properties files

listeners=PLAINTEXT://10.103.5.9:9092 // update node ip address in node-3 server.properties files

// in the Log Retention Policy block simply uncomment the below 
log.segment.bytes=1073741824

// in the Zookeeper block update the string.
// comment the below line :-
zookeeper.connect=localhost:2181

// add this :-
zookeeper.connect=10.103.5.7:2181,10.103.5.8:2181,10.103.5.9:2181 // all three nodes ip 

################### CONFIG_END ################

Note:- Some Configuration Of Kafka Is Specific For Nodes Eg- Broker.id, Listeners.
  1. start zookeeper and Kafka service in all three nodes.
// start zookeeper and kafka service. 
sudo systemctl daemon-reload
sudo systemctl start zookeeper.service
sudo systemctl start kafka.service
  1. execute all the above (steps) & commands in all three nodes.

After successfully configuration and starting of service, your kafka service is up and running you can check with:- sudo systemctl status kafka.service --no-pager

feel free to ask queries related to this topic. I will be happy to help you.

connect with me:- utkarshsri0701@gmail.com / serv-ar-tistry Studio

0
Subscribe to my newsletter

Read articles from utkarsh srivastava directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

utkarsh srivastava
utkarsh srivastava

A seasoned IT professional with over 10 years of experience, crafting innovative solutions at serv-ar-tistry Studio. I am a passionate leader with a proven track record of success in the cloud, infrastructure, and DevOps space. After gaining extensive experience working for various multinational corporations across diverse industries, I took the leap to pursue my entrepreneurial dream and founded serv-ar-tistry Studio. At serv-ar-tistry Studio, we are dedicated to empowering businesses with cutting-edge cloud infrastructure and DevOps solutions. Our team of skilled professionals leverages their expertise to deliver: Scalable and secure cloud solutions: We help businesses migrate and optimize their infrastructure on leading cloud platforms, ensuring efficient resource utilization and cost-effectiveness. Robust infrastructure management: We design, implement, and manage robust infrastructure solutions that are reliable, secure, and adaptable to evolving business needs. Streamlined DevOps workflows: We automate and optimize development and deployment processes, enabling businesses to deliver software faster and with higher quality.