RisingWave Roadmap Q4 2023

RisingWave LabsRisingWave Labs
4 min read

Tao Wu | Product Manager

One and a half years ago, in April 2022, we open-sourced RisingWave, the distributed SQL streaming database. A quarter ago, in July 2023, we released the first official version of RisingWave, RisingWave 1.0, a battle-tested system that can be used in production. More recently, RisingWave 1.3 has been released

As an open-source streaming database released under Apache 2.0 license, the development team behind RisingWave actively collects feedback from users and strives to democratize stream processing: to make it simple, affordable, and accessible.

As a system that has been deployed in production in dozens of enterprises and fast-growing startups, how will RisingWave evolve? We plan to make it transparent and periodically update our roadmap. Here’s what you can anticipate in the future release of RisingWave.

Note that the roadmap is not final, and we will frequently update our roadmap to reflect the item priority to better serve users.

Short-term goals (within the next 3 months)

  • Adaptive Scaling
    Implement adaptive scaling to automatically adjust materialized view parallelism based on the number of CPU cores in the cluster.

  • Improvements to the Existing External Sinks
    Optimize performance and improve stability of supported external sinks like Doris, Clickhouse, and Elasticsearch. We’ll also expand supported encoding formats for Kafka sink, including Protobuf, Avro, and the support for Schema Registry.

  • Iceberg Sink V2
    We recently introduced a native integration with Iceberg, which is no longer based on the official Java library. It’s fully rewritten by Rust for performance and stability. We plan to stabilize it in the next few months.

  • Enhanced Observability
    Expand system tables and add metrics for stateful operators to provide greater visibility into system health and performance.

  • Improved Open-source Web UI
    Enhance RisingWave's open-source web UI with additional system information and monitoring capabilities.

  • Sink into table
    Users may want to dynamically union the results of multiple views into a single table. For example, a view may correspond to a department in a company while there can be new departments once in a while. With this feature, users can seamlessly merge data from new views as they are added.

  • CDC Connection Sharing
    RisingWave currently creates one CDC connection per table. Each connection will individually consume the replication logs, which consist of transactions not only to the source table but also to other tables in the same database. Therefore, multiple connections will lead to the duplicate consumption and a heavy load on the upstream database. Shared CDC connections can thus reduce the load and improve the stability of the CDC.

  • Recoverable CREATE MATERIALIZED VIEW
    Persist materialized view progress to allow recovering from failures without losing work already completed.

  • CDC Transaction Atomicity
    CDC transactions in RisingWave currently apply by events, which may contain only partial content in a transaction. With the new feature, RisingWave will buffer all CDC events within a transaction until it can be fully applied atomically.

  • Parallel CDC Snapshot Loading
    Introduce parallelism during CDC snapshot loading to improve the user experience for large upstream tables.

Mid-term goals (within the next 6 months)

  • SSL/TLS Secured Connection
    Implement SSL/TLS encryption for client/server communications to enhance security.

  • Alter Materialized View
    Add the ability to modify existing materialized views.

  • Session Window
    Introduce session window functionality for advanced streaming analytics.

  • MemTable Spill
    A refresh to a small table could suddenly cause 1k times amplification on write throughput. Such a case typically happens when there is a 10+ way join. A way to mitigate this is to use the local disk as a buffer for the flooded writes, thus avoiding OOM.

  • Dedicated Computes for Materialized View Creation
    Some users complained that RisingWave’s materialized view creation is too slow, as it requires a resource-intensive ad-hoc computation. On the other hand, since the streaming (incremental computations) is long-running, it requires fewer resources at the same time. As a result, it’s possible to allocate dedicated resources for MV creation separately when needed and deallocate them once finished.

  • More External Sinks
    Redshift Sink and Snowflake Sink are in the plan.

  • Recursive CTE
    Enable recursive common table expressions (CTE) to traverse hierarchical data like the organizational tree in a company.

  • Shared Meta Plane
    Enable RisingWave clusters to share the meta plane, including Etcd (or Postgres in the future), to better utilize compute resources across clusters.

Long-term goals

  • Optimize analytical query performance on third-party systems like Presto and Trino

  • GraphQL API
    To allow retrieving results from RisingWave directly through the browser.

  • Serverless Compaction
    Automatically scale Compactor instances in and out to match workload demands in a serverless model.

CONCLUSION

RisingWave is an open-source streaming database aiming at democratizing stream processing: to make stream processing ease of use and cost-efficient. Its development direction is highly influenced by user requests. We would love to hear from the community and update our agenda accordingly. If you have any questions or comments regarding RisingWave’s roadmap, please don’t hesitate to let us know by commenting here. Your voice will help shape the future of real-time stream processing!

About RisingWave Labs

RisingWave is an open-source distributed SQL database for stream processing. It is designed to reduce the complexity and cost of building real-time applications. RisingWave offers users a PostgreSQL-like experience specifically tailored for distributed stream processing.

Official Website: https://www.risingwave.com/

Documentation: https://docs.risingwave.com/docs/current/intro/

Tutorial:https://tutorials.risingwave.com/

Slack:https://risingwave-community.slack.com

GitHub:https://github.com/risingwavelabs/risingwave

LinkedIn:linkedin.com/company/risingwave-labs

0
Subscribe to my newsletter

Read articles from RisingWave Labs directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

RisingWave Labs
RisingWave Labs

RisingWave is an open-source distributed SQL database for stream processing. It is designed to reduce the complexity and cost of building real-time applications. RisingWave offers users a PostgreSQL-like experience specifically tailored for distributed stream processing. Learn more: https://risingwave.com/github. RisingWave Cloud is a fully managed cloud service that encompasses the entire functionality of RisingWave. By leveraging RisingWave Cloud, users can effortlessly engage in cloud-based stream processing, free from the challenges associated with deploying and maintaining their own infrastructure. Learn more: https://risingwave.cloud/. Talk to us: https://risingwave.com/slack.