Step-by-Step Guide: Running Kafka Producers and Consumers on Oracle Cloud VPS with Databricks

Saumya TalwaniSaumya Talwani
2 min read

In a recent Big Data Analytics lab assignment, we were tasked with ingesting (consuming) data into Databricks from an external system. To tackle this challenge, I explored setting up data producers on a cloud server—leveraging the fact that these servers already have public IPs, making external communication much easier. After diving deep into configurations and testing, I put together this concise guide to help others set up a similar pipeline efficiently.

Allow External Connections on Your VPS

First, we need to enable external connections on our VPS. I’ve already covered that in a separate guide, which you can refer to here.

Setting Up Your Server

  1. Login to your Linux Server using SSH

  2. Install Java

     sudo apt install default-jdk
    
  3. Download Kafka

     wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
     tar -xzf kafka_2.12-3.2.0.tgz
     sudo mv kafka_2.12-3.4.0 /usr/local/kafka
    
  4. Create Sytem file units

     nano /etc/systemd/system/zookeeper.service
    
     [Unit]
     Description=Apache Zookeeper server
     Documentation=http://zookeeper.apache.org
     Requires=network.target remote-fs.target
     After=network.target remote-fs.target
    
     [Service]
     Type=simple
     ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
     ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
     Restart=on-abnormal
    
     [Install]
     WantedBy=multi-user.target
    
     nano /etc/systemd/system/kafka.service
    
     [Unit]
     Description=Apache Kafka Server
     Documentation=http://kafka.apache.org/documentation.html
     Requires=zookeeper.service
    
     [Service]
     Type=simple
     Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64"
     ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
     ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
    
     [Install]
     WantedBy=multi-user.target
    

    Modify your java path in Environemt="JAVA_HOME=<Your_Path>“ as required. You can find it using

     readlink -f $(which java)
    
  5. Modify server config

     nano /usr/local/kafka/config/server.properties
    

    Paste the following under Socket Server Settings.

     listeners=PLAINTEXT://0.0.0.0:9092,PLAINTEXT_EXTERNAL://0.0.0.0:9093
     advertised.listeners=PLAINTEXT://localhost:9092,PLAINTEXT_EXTERNAL://<YOUR_PUBLIC_IP>:9093
     listener.security.protocol.map=PLAINTEXT:PLAINTEXT,PLAINTEXT_EXTERNAL:PLAINTEXT
    

    replace <YOUR_PUBLIC_IP> with the public IP above.

  6. Reload systemd daemon

     sudo systemctl daemon-reload
    
  7. Start zookeeper & kafka server

     sudo systemctl start zookeeper
     sudo systemctl start kafka
     sudo systemctl status kafka
    
  8. Create a Topic

     usr/local/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic myTopic
    

On Databricks

Create a new notebook after initializing a cluster, create the following cells

%sh
java -version
%sh
wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
ls -ltr ./
%sh
tar -xzf kafka_2.12-3.2.0.tgz
%sh
cd kafka_2.12-3.2.0
bin/kafka-console-consumer.sh --topic myTopic --bootstrap-server <YOUR_PUBLIC_IP>:9093

replace <YOUR_PUBLIC_IP> with the public IP above.

0
Subscribe to my newsletter

Read articles from Saumya Talwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Saumya Talwani
Saumya Talwani