RisingWave Turns Four: Our Journey Beyond Democratizing Stream Processing


When we started RisingWave four years ago, we set out with a bold mission: to democratize stream processing (check our original blog here). Back then, building real-time streaming applications felt like climbing a mountain. It required specialized infrastructure, deep engineering know-how, and a hefty operational commitment. Stream processing had incredible potential, but its sheer complexity kept it locked away from most companies.
We didn't think it had to be that way. As businesses increasingly embraced event-driven architectures and real-time analytics, the need for stream processing was exploding. Yet, actually adopting it remained a huge hurdle. Existing tools often demanded large engineering teams just to manage the internal state of streaming jobs, constantly tweak performance, and wrestle with fault tolerance. The simple question we asked ourselves four years ago was: Can we break down these barriers and make stream processing genuinely easy to use?
Fast forward four years, and countless hours of work later, we're incredibly proud of the journey. RisingWave isn't just an idea anymore; it's a robust, production-ready stream processing system trusted by over 1000 enterprises and fast-growing startups.
But the world of data never stands still. Over the past few years, we have seen some significant shifts:
Companies went through aggressive cost optimization cycles with their data infrastructure, though we're seeing spending pick up again recently.
The idea of using "S3 as primary storage" has fundamentally changed how many data systems are built.
Apache Iceberg has quickly become the go-to standard for open table formats in data lakes, altering how organizations store and query data.
There's a growing appetite for developer-friendly systems – tools that simplify workflows, not add more complexity.
Looking ahead, our path is clearer than ever. Making stream processing easy is still central to our mission, but our ambitions have grown alongside the evolving landscape. We now aim to equip every streaming data user with a complete end-to-end platform – one that doesn't just process data, but also serves it, stores it effectively, and plugs seamlessly into the wider data ecosystem.
In this post, I want to share the RisingWave story – how we started, where we stand today, and where we're excited to go next.
Why We Started RisingWave: The Need for Simplicity
The Stream Processing World Before RisingWave
Stream processing isn't new; it's been around for decades. Early pioneers in capital markets and telecom relied on it for things like finding market advantages (alpha) and detecting fraud in real-time. They used powerful enterprise tools like TIBCO and Sybase, but these systems were often notoriously expensive, inflexible, and a headache to manage.
The last decade saw the rise of open-source frameworks like Apache Flink, Spark Streaming, and Apache Samza. These offered more flexibility but still demanded significant engineering muscle to run effectively at scale. Companies using them often needed specialized stream processing engineers just to manage internal state, tune performance, and handle the day-to-day operational challenges. The barrier to entry remained stubbornly high.
At the same time, the demand for real-time insights was skyrocketing. Businesses everywhere realized the power of acting on data as it happens, rather than waiting for overnight batch jobs. The move towards event-driven systems and cloud-native infrastructure made real-time processing seem more achievable. The success of companies like Confluent, marked by its 2021 IPO, sent a clear message: real-time data wasn't a niche anymore; it was becoming essential.
However, despite the clear demand, one challenge remained: stream processing was still too difficult to implement. Many organizations desperate for real-time insights couldn't actually use stream processing effectively because it was so complex. They faced a tough choice: either hire expensive, specialized teams or settle for less powerful workarounds.
The Core Question: Could We Make Stream Processing Easy?
When we started RisingWave, we were driven by a fundamental question: Could we make stream processing accessible to every developer and every company, regardless of size or technical expertise? We believed the answer was yes, but it required rethinking stream processing from the ground up.
Traditional systems seemed built for specialists, assuming users had the time and expertise to fine-tune arcane parameters, optimize performance manually, and manage complex distributed state. We wanted to flip that script and create something:
Easy to pick up: No specialized knowledge needed beyond standard SQL.
Simple to operate: Get rid of the need for constant performance tuning.
Cost-effective and scalable: Leverage modern cloud architecture to adapt resources automatically.
This led us to two core design principles that shaped RisingWave: PostgreSQL compatibility and S3-as-primary-storage architecture.
How We Built RisingWave: Two Key Design Decisions
PostgreSQL Compatibility: Putting Developers First
One of the biggest hurdles to adopting stream processing was the steep learning curve. Developers were often forced to learn entirely new frameworks, APIs, and ways of thinking about operations. We asked ourselves: Why not make stream processing feel familiar, like working with a standard database?
By making RisingWave compatible with PostgreSQL, we ensured that any developer familiar with SQL could immediately start writing streaming queries. This wasn't just about syntax; it meant RisingWave could plug seamlessly into existing data workflows and connect easily with a vast ecosystem of familiar tools like DBeaver, Grafana, Apache Superset, dbt, and countless others.
S3 as Primary Storage: Cloud-Native to the Core
Traditional stream processing systems often juggle state in memory or on local disks. This can be necessary for ultra-low latency scenarios, but it demands constant manual tuning and operational vigilance, adding significant overhead for many common use cases. We took a different approach.
By leveraging S3 as primary storage architecture, RisingWave automatically and durably persists state changes. This design eliminates the burden on users to manually manage checkpoints and state recovery. It inherently provides durability, scalability, and cost-efficiency. And importantly, RisingWave still intelligently uses local memory and disk as caches to ensure high performance where it matters.
These two core decisions allowed RisingWave to dramatically simplify stream processing without sacrificing the power and flexibility needed for real-world applications.
Where We Are Today: A Proven, Production-Ready System
Four years on, RisingWave isn't just an idea anymore. It has grown into a battle-tested, production-grade system used by thousands of companies across various industries. Our original vision of lowering the barrier to stream processing has become a reality – a platform that's powerful, scalable, intuitive, and cost-efficient. This combination has made it a go-to choice for organizations large and small.
RisingWave is no longer an emerging technology; it's a proven solution. We see it in production handling massive real-time analytics at Fortune 50 enterprises, and we also see it empowering high-growth startups who need agility without the operational headaches of older frameworks.
For instance, one of the world's largest multi-trillion-dollar financial institutions has adopted RisingWave internally for mission-critical workloads, and its use continues to expand across divisions. They're driven by the need for faster decisions, scalable processing, and better cost control. Their adoption highlights a key industry shift: businesses are less willing to tolerate the complexity of legacy systems when modern, cloud-native alternatives like RisingWave offer a simpler path.
Patterns in RisingWave Adoption
Watching how organizations adopt RisingWave, we’ve noticed three main patterns emerge, each highlighting different strengths:
1. Companies New to Stream Processing
For businesses just dipping their toes into real-time data, RisingWave often acts as the perfect on-ramp. Unlike systems like Apache Flink or Spark Streaming, which can require significant engineering effort just to get started, RisingWave offers a familiar SQL interface and a developer-friendly experience.
Many of these companies first tried achieving real-time results with batch systems like Snowflake or BigQuery. But they quickly found that even five-minute batch intervals weren't fast enough for today's event-driven needs. They turn to RisingWave for its simplicity, low operational burden, and easy integration with their existing PostgreSQL-based infrastructure.
2. Companies Moving Away from Legacy Stream Processing Systems
A second category of users consists of companies with existing stream processing deployments, often based on Apache Flink, Spark Streaming, or even homegrown systems. These teams already understood the value of real-time, but were bumping up against significant challenges:
Operational pain: Managing clusters and tuning jobs required dedicated, hard-to-find stream processing experts.
State management headaches: Many struggled with checkpointing, state recovery, and backpressure issues, leading to instability and unexpected outages.
Runaway costs: Running self-managed streaming clusters at scale often became prohibitively expensive.
For these organizations, RisingWave represents a simpler, more cost-effective path forward without compromising performance. Unlike systems where developers manually wrestle with complex distributed state, RisingWave's automatic state management using cloud storage drastically reduces the burden on engineers and often significantly cuts infrastructure costs.
3. Companies Operating at Massive Scale
The third group includes global enterprises handling enormous volumes of real-time data. They use RisingWave to power demanding applications like high-frequency trading platforms, real-time risk analysis, sophisticated AI recommendation engines, and large-scale observability systems.
For these users, the headline benefits are raw performance and scalability. RisingWave's ability to process millions of events per second, scale resources up and down elastically, and integrate smoothly with cloud storage and data lakes makes it ideal for these critical workloads. The dynamic scaling, in particular, gives them unprecedented control over resource use and costs.
Through these adoption patterns, RisingWave has carved out its place as a key piece of modern data infrastructure. We've made stream processing accessible, but we know this is just the beginning.
What’s Next: Moving Beyond Just Stream Processing
Looking ahead, we see the role of stream processing evolving. It's no longer just about processing events in real time. Businesses increasingly need end-to-end solutions that handle the entire lifecycle of streaming data. Put simply, they don't just want to process events; they need an integrated way to serve, store, and analyze that data effortlessly.
The rise of AI and ML applications, in particular, places immense pressure on data infrastructure. Organizations need faster insights, lower costs, and real-time decision-making capabilities at a scale we haven't seen before. Traditional batch systems are struggling to keep up, and even standalone streaming tools often aren't enough. Businesses are looking for a holistic approach where stream processing isn't an isolated task but a tightly integrated part of their overall data strategy.
This is why we believe RisingWave's next chapter involves evolving from a stream processing system into a comprehensive streaming data platform. One that not only crunches the numbers but also delivers insights quickly, stores data intelligently, and fuels real-time decisions.
The Next Chapter: An End-to-End Streaming Data Platform
Looking at how companies use RisingWave today, we find that most follow a two-step data flow:
Real-time serving: Many push processed data into low-latency serving layers like Redis to power applications needing instant responses (think fraud detection, live recommendations, financial dashboards).
Long-term storage and analytics: Others sink their streaming results into data lakes (often using formats like Apache Iceberg) for historical analysis, compliance, or training machine learning models over time.
While RisingWave facilitates these patterns today, we believe we can make them much smoother and more integrated. Our roadmap focuses on two major areas:
1. Expanding Real-Time Serving Capabilities
Many real-time applications need more than just pre-computed results; they require ultra-fast responses to ad-hoc queries on the freshest data. While RisingWave already processes data with millisecond latency, we're working to extend its serving power to include:
Sub-10ms query execution for point lookups and simple filters directly within RisingWave. This opens doors for powering interactive dashboards, trading systems, and real-time AI feature lookups without needing a separate serving database in many cases.
Deeper integration with vector databases and AI models, enabling real-time event streams to directly update embeddings, trigger model retraining, or drive immediate intelligent actions.
These enhancements aim to make RisingWave itself a powerful real-time data serving engine for a new wave of applications.
2. Enhancing Iceberg Integration and Storage Optimization
On the storage side, our goal is to deepen our integration with Apache Iceberg, making it the natural, seamless destination for historical streaming data within RisingWave. We already support streaming data into Iceberg tables, but we're adding features like:
Automated data compaction: Tackling the common "small file problem" that plagues streaming writes to data lakes, improving query performance and storage efficiency without manual intervention.
Federated query execution: Allowing users to seamlessly query both fresh, real-time data (in RisingWave's state) and historical data (in Iceberg) using a single query interface, eliminating the need to stitch results together from different systems.
These improvements will help organizations treat their streaming and historical data as one unified resource, simplifying analytics and decision-making.
With these advancements, RisingWave is heading towards being the first truly end-to-end platform designed for the entire streaming data lifecycle – processing, serving, storing, and analyzing, all within one cohesive system.
Conclusion: The Future is Streaming, End-to-End
For four years, RisingWave has focused on making the power of stream processing accessible to everyone. Now, we're embarking on the next stage of our journey. The future isn't just about efficient stream processing; it's about building integrated, end-to-end platforms that seamlessly connect real-time processing with real-time serving and long-term storage.
That's where RisingWave is headed. And honestly, we feel like we're just getting started.
Subscribe to my newsletter
Read articles from Yingjun Wu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
