Recently, while preparing for a Kafka interview, I stumbled upon a bunch of really insightful questions and answers. Whether you're a beginner just stepping into the Kafka world or someone brushing up for interviews, this guide will help you understand both the basics and some advanced concepts. Here's what I found:

1. How do you create a topic in Kafka using the Confluent CLI?

kafka-topics --bootstrap-server localhost:9092 --create --topic my_topic --partitions 3 --replication-factor 1

2. Explain the role of the Schema Registry in Kafka.

The Schema Registry manages and enforces schemas for data in Kafka. It ensures producers and consumers agree on data structure, supports schema evolution, and avoids serialization issues.

3. How do you register a new schema in the Schema Registry?

You can register a schema using the REST API or tools like the Confluent CLI or Kafka clients (like kafkajs or confluent-kafka-python). Example using curl:

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{"type": "record", "name": "User", "fields": [{"name": "name", "type": "string"}]}"}' http://localhost:8081/subjects/my_topic-value/versions

4. What is the importance of key-value messages in Kafka?

Keys help determine message partitioning. Kafka uses the key to hash and assign messages to partitions, ensuring order per key and enabling parallelism across consumers.

5. Describe a scenario where using a random key for messages is beneficial.

Using a random key is useful when you want to distribute messages evenly across partitions to balance the load and maximize throughput.

6. Provide an example where using a constant key for messages is necessary.

A constant key ensures all messages go to the same partition, which is useful when order must be preserved, like for a specific user session or transaction log.

7. Write a simple Kafka producer code that sends JSON messages to a topic (Python).

from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

data = {"name": "Alice", "age": 30}
producer.send('my_topic', value=data)
producer.flush()

8. How do you serialize a custom object before sending it to a Kafka topic?

Use a serializer function or library (e.g., JSON or Avro) to convert the object to bytes before sending. KafkaProducer requires bytes, so serialization is critical.

9. Describe how you can handle serialization errors in Kafka producers.

Use try-except blocks around the serialization logic.
Log and skip faulty messages.
Use DLQs (Dead Letter Queues) for further inspection.

10. Write a Kafka consumer code that reads messages from a topic and deserializes them from JSON.

from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    'my_topic',
    bootstrap_servers='localhost:9092',
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

for message in consumer:
    print(message.value)

11. How do you handle deserialization errors in Kafka consumers?

Use try-except blocks in the message processing loop to catch and handle faulty messages, logging or sending them to a DLQ.

12. Explain the process of deserializing messages into custom objects.

Deserialize the message bytes using a library (e.g., JSON, Avro), then use the resulting dict to construct custom objects using class constructors or from_dict methods.

13. What is a consumer group in Kafka, and why is it important?

A consumer group allows multiple consumers to read from a topic in parallel. Each partition is read by only one consumer in the group, enabling scalability and fault tolerance.

14. Describe a scenario where multiple consumer groups are used for a single topic.

Analytics, logging, and alerting systems might consume the same topic with different consumer groups to process messages independently.

15. How does Kafka ensure load balancing among consumers in a group?

Kafka assigns partitions evenly across consumers in a group. When consumers join or leave, Kafka rebalances to redistribute partitions.

16. How do you send JSON data to a Kafka topic and ensure it is properly serialized?

Use a JSON serializer when configuring the producer. Ensure the data is JSON-serializable.

value_serializer=lambda v: json.dumps(v).encode('utf-8')

17. Describe the process of consuming JSON data from a Kafka topic and converting it to a usable format.

Use a consumer with a JSON deserializer and parse the data into a Python dict, then map it to your application's data models.

18. Explain how you can work with CSV data in Kafka, including serialization and deserialization.

Serialize CSV by converting rows to comma-separated strings before sending.
Deserialize by splitting the string and mapping to fields or using the csv module.

19. Write a Kafka producer code snippet that sends CSV data to a topic.

from kafka import KafkaProducer

producer = KafkaProducer(bootstrap_servers='localhost:9092')

csv_row = "name,age\nAlice,30"
producer.send('csv_topic', value=csv_row.encode('utf-8'))
producer.flush()

20. Write a Kafka consumer code snippet that reads and processes CSV data from a topic.

from kafka import KafkaConsumer

consumer = KafkaConsumer('csv_topic', bootstrap_servers='localhost:9092')

for message in consumer:
    csv_line = message.value.decode('utf-8')
    print(csv_line.split(','))

This list was curated during my Kafka interview prep. I hope it helps others preparing for interviews or looking to strengthen their Kafka concepts!

Kafka Interview Questions and Answers