In a recent Big Data Analytics lab assignment, we were tasked with ingesting (consuming) data into Databricks from an external system. To tackle this challenge, I explored setting up data producers on a cloud server—leveraging the fact that these servers already have public IPs, making external communication much easier. After diving deep into configurations and testing, I put together this concise guide to help others set up a similar pipeline efficiently.

Allow External Connections on Your VPS

First, we need to enable external connections on our VPS. I’ve already covered that in a separate guide, which you can refer to here.

Setting Up Your Server

Login to your Linux Server using SSH
Install Java
```
 sudo apt install default-jdk
```

Download Kafka

 wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
 tar -xzf kafka_2.12-3.2.0.tgz
 sudo mv kafka_2.12-3.4.0 /usr/local/kafka

Create Sytem file units

 nano /etc/systemd/system/zookeeper.service

 [Unit]
 Description=Apache Zookeeper server
 Documentation=http://zookeeper.apache.org
 Requires=network.target remote-fs.target
 After=network.target remote-fs.target

 [Service]
 Type=simple
 ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
 ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
 Restart=on-abnormal

 [Install]
 WantedBy=multi-user.target

 nano /etc/systemd/system/kafka.service

 [Unit]
 Description=Apache Kafka Server
 Documentation=http://kafka.apache.org/documentation.html
 Requires=zookeeper.service

 [Service]
 Type=simple
 Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64"
 ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
 ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh

 [Install]
 WantedBy=multi-user.target

Modify your java path in Environemt="JAVA_HOME=<Your_Path>“ as required. You can find it using

 readlink -f $(which java)

Modify server config

 nano /usr/local/kafka/config/server.properties

Paste the following under Socket Server Settings.

 listeners=PLAINTEXT://0.0.0.0:9092,PLAINTEXT_EXTERNAL://0.0.0.0:9093
 advertised.listeners=PLAINTEXT://localhost:9092,PLAINTEXT_EXTERNAL://<YOUR_PUBLIC_IP>:9093
 listener.security.protocol.map=PLAINTEXT:PLAINTEXT,PLAINTEXT_EXTERNAL:PLAINTEXT

replace <YOUR_PUBLIC_IP> with the public IP above.

Reload systemd daemon
```
 sudo systemctl daemon-reload
```

Start zookeeper & kafka server

 sudo systemctl start zookeeper
 sudo systemctl start kafka
 sudo systemctl status kafka

Create a Topic

 usr/local/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic myTopic

On Databricks

Create a new notebook after initializing a cluster, create the following cells

%sh
java -version

%sh
wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
ls -ltr ./

%sh
tar -xzf kafka_2.12-3.2.0.tgz

%sh
cd kafka_2.12-3.2.0
bin/kafka-console-consumer.sh --topic myTopic --bootstrap-server <YOUR_PUBLIC_IP>:9093

replace <YOUR_PUBLIC_IP> with the public IP above.

Step-by-Step Guide: Running Kafka Producers and Consumers on Oracle Cloud VPS with Databricks

Allow External Connections on Your VPS

Setting Up Your Server

On Databricks

Subscribe to my newsletter

Saumya Talwani

Saumya Talwani