Introducing REFRESH TABLE: Manual Control Over Table Refresh

RisingWave is built as a streaming-first system, designed to react to changes as they happen. However, not all real-world data behaves like a stream. Many external batch data sources—such as snapshot-based tables—do not emit change events at all. Even in a streaming engine, these sources can only be observed when they are explicitly reloaded.

This difference in data behavior leads to two fundamentally different source models and motivates REFRESH TABLE as a way to intentionally control data freshness.

Streaming vs. Batch Sources

Streaming sources, such as Kafka, are:

Continuous and event-driven
Able to emit change events as data arrives
Designed for low-latency, incremental processing

Once connected, RisingWave can react to each new event automatically, keeping downstream results continuously up to date.

Batch sources, such as Iceberg or Snowflake, are fundamentally different:

Snapshot-based rather than event-driven
Do not emit change events when data is updated
Updated on schedules or by upstream jobs outside RisingWave’s control

For these sources, RisingWave must explicitly reload the latest snapshot in order to observe updates.

Introducing `REFRESH TABLE`

By introducing REFRESH TABLE, RisingWave bridges these two worlds—preserving its streaming-first architecture while offering a practical, explicit refresh mechanism for real-world batch data.

With REFRESH TABLE, you can manually trigger a full reload of data from an external source for tables created with refresh_mode = 'FULL_RELOAD'. Instead of relying entirely on periodic refresh intervals, you decide exactly when the latest snapshot should be pulled into RisingWave.

This provides:

Immediate access to the freshest data after upstream changes
The flexibility to refresh on demand, even without a scheduled interval
More predictable and accurate query results, as each refresh loads the latest snapshot from the external system

How It Works

Rather than adding new concepts, REFRESH TABLE builds on the table definitions you already have.

Step 1: Create a table with full reload mode

First, define an external batch table—such as an Iceberg table—with refresh_mode = 'FULL_RELOAD'. You may also configure an automatic refresh interval if desired.

CREATE TABLE iceberg_batch_table (
    idintprimary key,
    namevarchar
)WITH (
    connector='iceberg',
    catalog.type='storage',
    warehouse.path='s3://my-data-lake/warehouse',
    table.name='my_iceberg_table',
    database.name='public',
    refresh_mode='FULL_RELOAD',
    refresh_interval_sec='60'
);

Step 2: Trigger a manual refresh

Whenever you need the latest data, run:

REFRESH TABLE iceberg_batch_table;

This immediately reloads the table from the external source.

Step 3: Check refresh status

You can inspect the refresh state using system catalog views:

SELECT
    table_id,
    current_status,
    last_trigger_time,
    last_success_time,
    trigger_interval_secs
FROM rw_catalog.rw_refresh_table_state;

This helps you verify when a refresh was triggered and whether it completed successfully.

What This Unlocks

By adding manual refresh as a first-class capability, REFRESH TABLE unlocks a more flexible and intentional way to work with external data in RisingWave. Instead of treating data refresh as a background process that runs on a fixed schedule, you can now align refresh behavior with real operational needs.

Ad-hoc data synchronization

When upstream data changes unexpectedly or outside a fixed schedule, you can refresh immediately to ensure downstream queries and dashboards reflect the latest state.

Event-driven workflows

REFRESH TABLE can be triggered as part of a larger workflow—such as after a batch job completes, a data quality check passes, or a pipeline stage finishes—making it easier to integrate RisingWave into orchestration tools and automation.

Cost- and performance-aware refreshes

Instead of refreshing on a fixed interval regardless of data changes, you can refresh only when needed, reducing unnecessary compute and I/O overhead.

Testing and validation

In development or staging environments, on-demand refreshes make it easier to test against known data states and validate behavior after upstream changes.

Rethinking Data Refresh

REFRESH TABLE makes data freshness an explicit choice rather than a side effect of scheduling. By letting you decide exactly when external data is reloaded, RisingWave helps you build pipelines that respond to real-world events instead of fixed timers.

Whether you’re reacting to upstream corrections, coordinating snapshot data with streaming workloads, or simply avoiding unnecessary refreshes, REFRESH TABLE gives you a clearer and more intentional way to manage data freshness as part of your workflow.

Get Started with RisingWave

Try REFRESH TABLE in your next RisingWave project and see how manual refresh simplifies your workflow.

For more detailed information, please see the official documentation.
Try RisingWave Today:
- Download the open-sourced version of RisingWave to deploy on your own infrastructure.
- Get started quickly with RisingWave Cloud for a fully managed experience.
Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.
Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.

If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.