Building real-time applications with MongoDB and Redpanda
Table of contents
MongoDB needs no introduction. Since its debut in 2007, the document database has steadily risen in popularity and is now the 5th ranking database in the world according to DB Engines. It now trails only Oracle, MySQL, SQL Server, and PostgreSQL. This is all the more impressive when you consider that the top four are traditional relational databases that came out in the previous century. MongoDB is number one among the new generation of scalable, distributed “NoSQL” databases.
MongoDB’s mercurial growth can be attributed to a strong focus on developer experience. The product itself is simple, easy to use, and provides a happy path from prototype to production. When trying MongoDB, developers tend to be successful in moving working prototypes to full-blown production deployments. That is the story to remember.
While MongoDB simplified database development for millions of developers worldwide, Redpanda is on a similar mission to simplify real-time streaming applications. From the start, Redpanda was built with the same dedication to simplicity, ease of use, and most importantly, developer productivity.
Different but same
While MongoDB and Redpanda address different parts of the tech stack, it is interesting to note the similarities in how they deliver an enjoyable developer experience individually, and how they complement each other when combined. To wit:
- MongoDB ships as a single binary. So does Redpanda. Unlike other comparable streaming platforms, Redpanda has no dependencies on external services like Apache Zookeeper. This makes Redpanda as easy to install via package managers as it is locally on a developer’s laptop. It also ships as a Docker image so you can spin it up via Docker Compose, integrate with CI/CD pipelines, or deploy in Kubernetes with a simple controller.
- Both Redpanda and MongoDB are distributed systems. To achieve scale and resiliency, you simply install the exact same binary on multiple servers and point them at each other. This allows developers and architects to defer decisions on scalability and performance early in the development process. The mental model and code for the application is the same whether running on a laptop, a single-core virtual machine, or on several high-powered production servers.
- Both Redpanda and MongoDB ship with sensible defaults. This minimizes the number of knobs that developers or administrators need to tweak, whether they are bringing up a prototyping environment (as with Docker Compose) or provisioning a production cluster. Redpanda goes even further with the ability to autotune against underlying hardware, allowing it to maximize available CPU, memory, and disk resources with minimal effort.
- The cognitive load for developers is not just on bytes-per-second scalability but also on having to decide what shape to give your data, and what types to choose. MongoDB gives developers the flexibility to evolve the schema over time with a friendly JSON-like data model. Similarly, events and messages sent to Redpanda do not require a schema up-front. However, there is the option to use a schema when needed, and to evolve the schema to match changing business needs.
Connecting MongoDB to Redpanda
MongoDB integrates with Redpanda in two ways: as a sink, whereby Redpanda events are consumed and sent to MongoDB for inserts or updates, or as a CDC source, where MongoDB externalizes its changelog to a Redpanda topic for others (including other MongoDB instances) to consume. The integration is done via Kafka Connect. Since Redpanda is wire compatible with Apache Kafka, the existing Kafka connectors work seamlessly. This ability to leverage the vast Kafka ecosystem is yet another way Redpanda makes developer’s lives easier!
Our friends at MongoDB have put together a stock ticker demo that integrates MongoDB and Redpanda via Kafka Connect. The demo requires Docker Compose, and the docker-compose.yml
file looks something like this:
version: '3.7'
services:
redpanda:
command:
- redpanda
- start
- --smp
- '1'
- --reserve-memory
- 0M
- --overprovisioned
- --node-id
- '0'
- --kafka-addr
- PLAINTEXT://0.0.0.0:9092,OUTSIDE://0.0.0.0:9093
- --advertise-kafka-addr
- PLAINTEXT://redpanda:9092,OUTSIDE://localhost:9093
image: docker.vectorized.io/vectorized/redpanda:v21.9.3
ports:
- 9093:9093
connect:
image: confluentinc/cp-kafka-connect-base:latest
build:
context: .
dockerfile: Dockerfile-MongoConnect
depends_on:
- redpanda
ports:
- "8083:8083"
environment:
CONNECT_BOOTSTRAP_SERVERS: 'redpanda:9092'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: connect-cluster-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
CONNECT_AUTO_CREATE_TOPICS_ENABLE: "true"
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
mongo1:
image: "mongo:5.0.3"
volumes:
- /data/db
ports:
- "27017:27017"
restart: always
nodesvr:
image: node:16
build:
context: .
dockerfile: Dockerfile-Nodesvr
depends_on:
- redpanda
- mongo1
ports:
- "4000:4000"
For a full tutorial and walkthrough, visit the complementary blog post at MongoDB. We welcome everyone from the MongoDB community to try Redpanda and join the Redpanda Community Slack, where you can engage with the engineers who are building Redpanda and shaping the future of real-time streaming!
Further Reading
Subscribe to my newsletter
Read articles from Redpanda Data directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Redpanda Data
Redpanda Data
API-compatible with Apache Kafka, Redpanda is the modern streaming data platform for (all) developers.