Step-by-Step Guide: Running Kafka Producers and Consumers on Oracle Cloud VPS with Databricks


In a recent Big Data Analytics lab assignment, we were tasked with ingesting (consuming) data into Databricks from an external system. To tackle this challenge, I explored setting up data producers on a cloud server—leveraging the fact that these servers already have public IPs, making external communication much easier. After diving deep into configurations and testing, I put together this concise guide to help others set up a similar pipeline efficiently.
Allow External Connections on Your VPS
First, we need to enable external connections on our VPS. I’ve already covered that in a separate guide, which you can refer to here.
Setting Up Your Server
Login to your Linux Server using SSH
Install Java
sudo apt install default-jdk
Download Kafka
wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz tar -xzf kafka_2.12-3.2.0.tgz sudo mv kafka_2.12-3.4.0 /usr/local/kafka
Create Sytem file units
nano /etc/systemd/system/zookeeper.service
[Unit] Description=Apache Zookeeper server Documentation=http://zookeeper.apache.org Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
nano /etc/systemd/system/kafka.service
[Unit] Description=Apache Kafka Server Documentation=http://kafka.apache.org/documentation.html Requires=zookeeper.service [Service] Type=simple Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64" ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh [Install] WantedBy=multi-user.target
Modify your java path in Environemt="JAVA_HOME=<Your_Path>“ as required. You can find it using
readlink -f $(which java)
Modify server config
nano /usr/local/kafka/config/server.properties
Paste the following under Socket Server Settings.
listeners=PLAINTEXT://0.0.0.0:9092,PLAINTEXT_EXTERNAL://0.0.0.0:9093 advertised.listeners=PLAINTEXT://localhost:9092,PLAINTEXT_EXTERNAL://<YOUR_PUBLIC_IP>:9093 listener.security.protocol.map=PLAINTEXT:PLAINTEXT,PLAINTEXT_EXTERNAL:PLAINTEXT
replace <YOUR_PUBLIC_IP> with the public IP above.
Reload
systemd daemon
sudo systemctl daemon-reload
Start zookeeper & kafka server
sudo systemctl start zookeeper sudo systemctl start kafka sudo systemctl status kafka
Create a Topic
usr/local/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic myTopic
On Databricks
Create a new notebook after initializing a cluster, create the following cells
%sh
java -version
%sh
wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
ls -ltr ./
%sh
tar -xzf kafka_2.12-3.2.0.tgz
%sh
cd kafka_2.12-3.2.0
bin/kafka-console-consumer.sh --topic myTopic --bootstrap-server <YOUR_PUBLIC_IP>:9093
replace <YOUR_PUBLIC_IP> with the public IP above.
Subscribe to my newsletter
Read articles from Saumya Talwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
