Highlights of RisingWave v2.5


We’re excited to announce the release of RisingWave v2.5! This update delivers major enhancements across multiple areas, including deeper Apache Iceberg integration, fine-grained control over materialized view backfill, improved join handling under pressure, new SQL syntax features, and much more.
Follow along to learn more about some of the standout features in this version release. If you are interested in the full list of v2.5 updates, see the full release note.
Enhanced Apache Iceberg Integration
This release brings significant enhancements to our native Apache Iceberg integration, focusing on improved data management, simplified configuration, and more powerful data sinking capabilities.
Iceberg compaction: RisingWave now supports automatic compaction for Iceberg table engine and Iceberg sink. You can enable periodic compaction and snapshot expiration directly in your table or sink configs using simple options like
enable_compaction=true
. This crucial maintenance operation merges small data files into larger, more optimal ones, significantly improving read performance and eliminates the need for external services like AWS Glue or Spark.Custom data partitioning: You can now define a partitioning strategy using the
partition_by
clause when creating an Iceberg table. This allows you to organize data into a logical structure, enabling efficient partition pruning and faster queries on your data.External configuration files: Use the new
enable_config_load
option to load Iceberg properties directly from configuration files on the host system. This keeps your DDL clean and makes it easier to manage credentials and complex settings across sinks, sources, connections, and the table engine.
For more information, see Interact with Iceberg.
Backfill order control
backfill_order option
is now available in the WITH
clause for CREATE MATERIALIZED VIEW
. This feature is designed to mitigate critical performance issues in production workloads, such as join amplification and unnecessary updates.
To control the backfill sequence, use the backfill_order = FIXED(...)
option in the WITH
clause. The ->
operator defines a dependency, ensuring the left-side table finishes backfilling before the right-side table starts.
CREATE TABLE fact(k INT, d1 INT, d2 INT, v INT);
CREATE TABLE dim(k INT);
CREATE TABLE dim2(k INT);
-- Backfill dim, dim2 before fact
EXPLAIN (BACKFILL, FORMAT DOT) CREATE MATERIALIZED VIEW m1 WITH (backfill_order=FIXED(dim->fact, dim2->fact)) AS
SELECT fact.v FROM fact JOIN dim ON fact.d1 = dim.k JOIN dim2 ON fact.d2 = dim2.k;
For observability, you can inspect the fragment-level dependency graph with DESCRIBE FRAGMENTS
and monitor progress via rw_catalog.rw_fragment_backfill_progress
.
For more information, see Backfill behavior and controls.
Isolate high-amplification joins
High join amplification in streaming queries can create significant backpressure, slowing down your entire pipeline. To combat this, RisingWave v2.5 introduces unaligned joins, a powerful optimization that isolates problematic joins from downstream operators.
SET streaming_enable_unaligned_join = true;
When enabled, this feature automatically inserts a buffer after a join operator. The join writes its output to this buffer, which allows checkpoint barriers to pass through immediately instead of waiting for a slow downstream operator to process the data.
This comes with a trade-off: slightly increased end-to-end latency due to buffering. While unaligned joins help with barrier latency, scaling the cluster may still be necessary to handle the high data volume. Monitoring for potential data lag in the buffer is recommended.
For more information, see Isolating high-amplification joins.
Generate embeddings with openai_embedding
RisingWave v2.5 adds native support for generating text embeddings directly in SQL using the new openai_embedding
function. This makes it easy to integrate AI-powered workflows, such as semantic search or recommendation systems, without leaving your database.
openai_embedding ( api_key constant text, model constant text, input text ) → real[]
This makes it straightforward to build retrieval-augmented generation (RAG) pipelines—store and search embeddings inside RisingWave, then feed the most relevant context back into an LLM.
For more information, see openai_embedding
and Building a RAG system on RisingWave.
ALTER DEFAULT PRIVILEGES
command
To create and drop tables and materialized views within a schema, users need to have the corresponding schema’s usage
and create
privileges. If you want to dynamically grant users privileges for newly created schemas or allow them to operate on relations created by other users, the “default privileges” feature is required.
ALTER DEFAULT PRIVILEGES
[ FOR USER target_user [, ...] ]
[ IN SCHEMA schema_name [, ...] ]
abbreviated_grant_or_revoke
The default privilege operates as an automatic privilege assignment template, which only applies to newly created schemas, tables, materialized views, etc. For existing schemas and relations, manual granting is still required.
For more information, see ALTER DEFAULT PRIVILEGES
.
Enhanced WINDOW
clause
The WINDOW
clause allows you to define a named window specification once and reuse it for multiple window functions within the same query. This significantly improves query readability and maintainability by reducing repetition. Consider a query where you need to calculate both the sum and the row number over the same window:
- Without the
WINDOW
clause, the window specification (PARTITION BY col1 ORDER BY col2
) is repeated for each function.
SELECT
sum(col2) OVER (PARTITION BY col1 ORDER BY col2),
row_number() OVER (PARTITION BY col1 ORDER BY col2)
FROM t;
- Now with the
WINDOW
clause, the window specification is defined once, namedw
, and then reused, making the query cleaner.
SELECT
sum(col2) OVER w,
row_number() OVER w
FROM t
WINDOW w AS (PARTITION BY col1 ORDER BY col2);
Meanwhile, WINDOW
is now a reserved keyword since v2.5 and can no longer be used as an unquoted identifier (e.g., a table alias). If you need to use “window” as an alias, you must enclose it in double quotes.
For more information, see Named window.
Conclusion
These are some of the highlight features included in v2.5. To see the entire list of updates, which includes updates to source and sink connectors, please refer to the full release note.
Stay tuned for next month’s updates as we continue to enhance RisingWave with new features. Visit the RisingWave GitHub repository to explore the latest developments and planned releases.
Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and LinkedIn, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide.
Subscribe to my newsletter
Read articles from RisingWave Labs directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

RisingWave Labs
RisingWave Labs
RisingWave is an open-source distributed SQL database for stream processing. It is designed to reduce the complexity and cost of building real-time applications. RisingWave offers users a PostgreSQL-like experience specifically tailored for distributed stream processing. Learn more: https://risingwave.com/github. RisingWave Cloud is a fully managed cloud service that encompasses the entire functionality of RisingWave. By leveraging RisingWave Cloud, users can effortlessly engage in cloud-based stream processing, free from the challenges associated with deploying and maintaining their own infrastructure. Learn more: https://risingwave.cloud/. Talk to us: https://risingwave.com/slack.