Streaming SQL in 2026: Materialized Views, Windows, and What Actually Works

Streaming SQL in 2026: Materialized Views, Windows, and What Actually Works

·

22 min read

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is streaming SQL?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Streaming SQL is a continuous query model where SQL statements define computations that run indefinitely over an unbounded stream of events, maintaining results as new data arrives. Unlike batch SQL, which runs once and returns a fixed result set, a streaming SQL query in a system like RisingWave defines a materialized view that updates incrementally each time new data arrives. You query the view and always get the current state without triggering any computation at query time."
      }
    },
    {
      "@type": "Question",
      "name": "How does streaming SQL differ from batch SQL?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Batch SQL runs once: you submit a query, the database scans the data, returns a result, and the query is done. Streaming SQL defines a continuous query: the query is written once and the database maintains its result as new data arrives. The key difference is that in streaming SQL, the computation happens at write time (when new events arrive), not at read time. When you query a streaming materialized view, the result is pre-computed and served immediately, making it suitable for sub-second latency requirements."
      }
    },
    {
      "@type": "Question",
      "name": "What is a tumbling window in streaming SQL?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A tumbling window divides a stream into fixed-size, non-overlapping time intervals. Every event belongs to exactly one window. In RisingWave, the syntax is: FROM TUMBLE(table, time_column, INTERVAL 'size') GROUP BY ..., window_start, window_end. For example, TUMBLE(events, event_time, INTERVAL '1' HOUR) groups events into one-hour buckets with no overlap."
      }
    },
    {
      "@type": "Question",
      "name": "What is incremental view maintenance in streaming SQL?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Incremental view maintenance means that when new data arrives, only the rows affected by that new data are recomputed, not the entire dataset. In a traditional database, refreshing a materialized view re-runs the full query over all data. In a streaming database like RisingWave, only the delta -- the change caused by the new event -- propagates through the computation graph. If one row changes in a one-billion-row table, only that row's computation runs. This makes streaming materialized views efficient even at large scale."
      }
    },
    {
      "@type": "Question",
      "name": "Is RisingWave better than Flink SQL for streaming SQL workloads?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "They have different strengths. Flink SQL is more powerful for complex stateful processing and custom CEP patterns, especially in Java-native organizations. RisingWave offers PostgreSQL wire protocol compatibility, which means materialized views are directly queryable by any PostgreSQL client without a separate serving layer. For SQL-expressible workloads that need a built-in serving interface, RisingWave simplifies the architecture by eliminating the need for a downstream database."
      }
    }
  ]
}

The idea of querying a stream with SQL is not new. Early streaming SQL research dates back to the late 1990s at Stanford (TelegraphCQ) and Bell Labs. For most of the following two decades, streaming SQL existed as an academic concept or a narrow feature inside proprietary systems.

In 2026, it is a production standard. Teams build pipelines entirely in SQL that would have required custom Java code five years ago. The syntax is standardized enough to be portable. The semantics are well-understood. The tooling (RisingWave, Flink SQL) handles the hard cases around exactly-once processing, watermarks, and late data that previously required deep expertise to get right.

This article covers what streaming SQL is, how it differs from batch SQL, the specific syntax that matters in 2026, and where the genuine limitations still are.

What Streaming SQL Is and How It Differs from Batch SQL

Batch SQL runs once. You submit a SELECT statement, the database scans the relevant data, computes a result, returns it, and the query is finished. The result is a snapshot of the data at the moment the query ran. If the underlying data changes, the result does not change until you run the query again.

Streaming SQL defines a continuous query. You write a CREATE MATERIALIZED VIEW statement once. The database maintains the result of that query as new data arrives, updating only the rows affected by each new event. When you query the view later, the result is already computed and served immediately from the maintained state.

The shift in model is significant. In batch SQL, computation happens at read time: the database computes when you ask it to. In streaming SQL, computation happens at write time: the database recomputes the relevant portion of the result each time new data arrives. Queries against materialized views are served from pre-computed state, making them fast regardless of how much data the underlying pipeline has processed.

This distinction has practical consequences:

  • A batch SQL query over 1 billion rows takes time proportional to the data volume. A streaming materialized view query over the same dataset takes milliseconds, because the result was pre-computed.
  • A batch SQL query returns data as of when it ran. A streaming materialized view returns data as of the last committed event, which can be less than one second ago.
  • Batch SQL requires no ongoing computation between queries. Streaming SQL requires continuous computation as long as the source produces events.

The cost model is inverted: streaming SQL pays at write time (constant computation cost per event), batch SQL pays at read time (computation cost proportional to data scanned). For high-read, frequently-refreshed analytics, streaming SQL is more efficient. For low-read, infrequent queries over large historical datasets, batch SQL may be more appropriate.

The 2026 Standard for Streaming SQL

The streaming SQL implementations that are production-ready in 2026 share a common feature set. Understanding what "mature" looks like helps you evaluate whether a given tool covers your use case.

Time window functions. A mature streaming SQL implementation supports at minimum three window types: TUMBLE (fixed non-overlapping windows), HOP (sliding overlapping windows), and SESSION (gap-based windows). These are the primitives for grouping unbounded event streams into finite chunks that can be aggregated.

Streaming JOINs. Joining two streams, or a stream with a slowly-changing dimension table, must work and produce correct results as both sides update. The implementation must handle the case where one side of the join arrives late relative to the other.

Watermarks and late data handling. A watermark defines how much lateness the system tolerates before advancing the event-time clock. Without watermarks, the system cannot know when a time window is complete, because late events could always arrive. With watermarks, the system emits window results when the watermark passes the window boundary, and drops (or optionally processes) events that arrive after the watermark has moved past their timestamp.

CDC integration. Treating database change logs as streams -- so that a PostgreSQL table can be queried in streaming SQL just like a Kafka topic -- is now a baseline capability, not an advanced feature.

Exactly-once semantics. Results must be correct even when the system restarts. Duplicate events or missed events produce wrong aggregations. Production streaming SQL systems guarantee that each event is processed exactly once, using checkpointing, transaction log replay, or similar mechanisms.

PostgreSQL-compatible query interface. In 2026, the expectation is that a streaming database speaks the PostgreSQL wire protocol so that any existing tool, driver, or application connects without modification. RisingWave runs on port 4566 and is compatible with standard PostgreSQL clients.

RisingWave satisfies all of these requirements. Flink SQL satisfies most of them except the built-in serving layer -- results go to external sinks, not to a query-able database interface.

Window Functions: The Core of Time-Based Streaming SQL

Window functions are how streaming SQL turns an infinite event stream into finite, aggregatable chunks. There are three time-based window types that matter.

TUMBLE: Fixed Non-Overlapping Windows

A tumbling window divides the event stream into equal, non-overlapping intervals. An event at 10:47 goes into the 10:00-11:00 window if the window size is one hour. It goes into the 10:45-10:50 window if the window size is five minutes. Each event belongs to exactly one window.

RisingWave syntax:

SELECT
    user_id,
    COUNT(*) AS event_count,
    window_start,
    window_end
FROM TUMBLE(user_events, event_time, INTERVAL '1' HOUR)
GROUP BY user_id, window_start, window_end;

The TUMBLE() function takes three arguments: the table or source, the time column used as the event timestamp, and the window size. It adds window_start and window_end columns to each row that indicate which window the event falls into. You must include both window_start and window_end in the GROUP BY clause.

Wrapping this in a materialized view makes it continuous:

CREATE MATERIALIZED VIEW hourly_user_events AS
SELECT
    user_id,
    COUNT(*) AS event_count,
    window_start,
    window_end
FROM TUMBLE(user_events, event_time, INTERVAL '1' HOUR)
GROUP BY user_id, window_start, window_end;

RisingWave maintains this view incrementally. As each new event arrives, only the row for the affected (user_id, window_start, window_end) combination is updated -- the rest of the view is untouched.

When to use TUMBLE: Fixed interval reporting (orders per hour, errors per minute, revenue per day), billing cycle aggregations, any use case where events must be counted exactly once within a defined period.

HOP: Sliding Overlapping Windows

A hopping window has a fixed size but starts a new window on a shorter interval, causing windows to overlap. An event can appear in multiple windows, which is what makes HOP useful for rolling metrics.

A HOP window with a size of one hour and a hop of five minutes produces 12 overlapping windows per hour, each one hour wide, advancing by five minutes at a time.

RisingWave syntax:

SELECT
    user_id,
    COUNT(*) AS event_count,
    window_start,
    window_end
FROM HOP(user_events, event_time, INTERVAL '5' MINUTE, INTERVAL '1' HOUR)
GROUP BY user_id, window_start, window_end;
-- 1-hour window that slides every 5 minutes

The HOP function takes four arguments: the table, the time column, the hop interval (how often a new window starts), and the window size (how wide each window is).

Practical example -- rolling error rate:

CREATE MATERIALIZED VIEW rolling_error_rate_5min AS
SELECT
    service_name,
    COUNT(*) AS total_requests,
    COUNT(*) FILTER (WHERE status_code >= 500) AS error_count,
    ROUND(
        COUNT(*) FILTER (WHERE status_code >= 500)::DECIMAL
        / NULLIF(COUNT(*), 0) * 100, 2
    ) AS error_rate_pct,
    window_start,
    window_end
FROM HOP(api_requests, request_time, INTERVAL '1' MINUTE, INTERVAL '5' MINUTE)
GROUP BY service_name, window_start, window_end;

This produces a rolling five-minute error rate updated every minute. Because windows overlap, a spike in errors appears in multiple windows, giving you better sensitivity to short-duration incidents than a non-overlapping tumbling window.

When to use HOP: Moving averages, rolling SLO metrics, trend detection where you need more frequent updates than the window size, any case where you want to smooth out noise from a tumbling window.

SESSION: Gap-Based Dynamic Windows

A session window closes when no event arrives within a defined gap period. The window size is not fixed -- it grows as long as events keep arriving within the gap, and closes when the gap is exceeded.

Session windows are natural for user activity analysis: a session ends when a user is inactive for more than 30 minutes, regardless of how long they were active before.

In RisingWave, session windows are supported in batch mode and in streaming mode with the EMIT ON WINDOW CLOSE clause:

SELECT
    user_id,
    COUNT(*) AS pages_viewed,
    MIN(event_time) AS session_start,
    MAX(event_time) AS session_end,
    window_start,
    window_end
FROM SESSION(page_views, event_time, INTERVAL '30' MINUTE)
GROUP BY user_id, window_start, window_end
EMIT ON WINDOW CLOSE;

When to use SESSION: User session analytics, IoT device communication bursts, call center interactions, any case where the natural grouping unit is a period of activity separated by gaps.

Streaming JOINs: What Works and What to Watch For

Joining streams is one of the most powerful features of streaming SQL and one of the most subtle. Understanding the different join types prevents correctness bugs in production.

Stream-Stream JOIN (Both Sides Are Streams)

When you join two streaming sources in RisingWave, the result updates whenever either side receives a new event. If a new purchase event arrives and its matching product record was loaded earlier, the join result includes both.

CREATE MATERIALIZED VIEW purchases_with_products AS
SELECT
    p.purchase_id,
    p.user_id,
    p.quantity,
    p.purchase_time,
    pr.name AS product_name,
    pr.price,
    p.quantity * pr.price AS line_total
FROM purchases p
JOIN products pr ON p.product_id = pr.product_id;

If products is a CDC-backed table that receives updates when product prices change, and purchases is a Kafka-backed stream of new purchase events, this materialized view joins them correctly. A new purchase gets the current product price; a price update does not retroactively change past purchase records (because the join is based on when events arrive, not on a time-travel semantic).

The state accumulation concern: A stream-stream join with no time bounds accumulates state indefinitely. Every row from the left side must be retained until a matching row from the right side arrives, and vice versa. For streams where both sides grow without bound, this can consume significant memory.

Adding time-bounded conditions limits state growth:

CREATE MATERIALIZED VIEW purchases_with_products_bounded AS
SELECT
    p.purchase_id,
    p.user_id,
    p.quantity * pr.price AS line_total
FROM purchases p
JOIN products pr ON p.product_id = pr.product_id
AND p.purchase_time BETWEEN pr.valid_from AND COALESCE(pr.valid_until, NOW());

Temporal JOIN: Stream with a Slowly-Changing Dimension

A temporal join in streaming SQL matches each event with the version of a dimension record that was current at the time of the event. This is the correct approach when a dimension table changes over time and you want historical accuracy.

CREATE MATERIALIZED VIEW purchases_with_historical_price AS
SELECT
    p.purchase_id,
    p.user_id,
    p.quantity,
    p.purchase_time,
    pr.price AS price_at_purchase_time,
    p.quantity * pr.price AS line_total
FROM purchases p
JOIN products FOR SYSTEM_TIME AS OF p.purchase_time pr
    ON p.product_id = pr.product_id;

This ensures that if a product's price changes after a purchase was made, the historical purchase record reflects the price at the time of purchase, not the current price.

What Streaming JOINs Cannot Do

Streaming JOINs cannot guarantee that all matching records have arrived before emitting a result. If a purchase arrives before the matching product record, the join result may be incomplete or delayed until the product record arrives.

For workloads where this arrival order matters, designing the pipeline so that dimension data arrives before fact data (typically by loading dimensions from CDC before starting the event stream) is the most reliable approach.

Incremental Materialized Views: The Most Important Concept

Incremental view maintenance is what makes streaming SQL practical at scale, and it is worth understanding precisely.

Traditional materialized views (in PostgreSQL, Oracle, and similar databases) are computed by re-running the full defining query over all data on each refresh. If a view aggregates 1 billion rows and one new row arrives, the refresh re-scans all 1 billion rows plus the new one. Refresh cost is proportional to total data volume.

Streaming materialized views in RisingWave maintain a computation graph that tracks which output rows depend on which input rows. When a new event arrives, RisingWave propagates only the delta -- the change caused by that event -- through the computation graph. If one row changes in a 1-billion-row aggregation, RisingWave recomputes only the affected aggregate groups, not the entire dataset.

This makes streaming materialized views efficient for large-scale aggregations that would be impractical to refresh from scratch. A COUNT(*) over 1 billion rows, maintained incrementally, takes microseconds per new event: the count goes up by one, the count for the relevant GROUP BY key is updated, and the view reflects the change immediately.

Chaining materialized views: RisingWave allows materialized views to be defined on top of other materialized views. Each view in the chain maintains only its incremental delta, so a three-level chain of aggregations is still efficient.

-- Level 1: per-event enrichment
CREATE MATERIALIZED VIEW purchase_enriched AS
SELECT
    p.purchase_id,
    p.user_id,
    p.product_id,
    p.quantity * pr.price AS line_total,
    p.purchase_time,
    pr.category
FROM purchases p
JOIN products pr ON p.product_id = pr.product_id;

-- Level 2: per-user daily aggregation (built on level 1)
CREATE MATERIALIZED VIEW user_daily_spend AS
SELECT
    user_id,
    purchase_time::DATE AS day,
    SUM(line_total) AS daily_spend,
    COUNT(*) AS purchase_count
FROM purchase_enriched
GROUP BY user_id, purchase_time::DATE;

-- Level 3: cross-user daily summary (built on level 2)
CREATE MATERIALIZED VIEW daily_platform_metrics AS
SELECT
    day,
    COUNT(DISTINCT user_id) AS active_buyers,
    SUM(daily_spend) AS total_revenue,
    AVG(daily_spend) AS avg_spend_per_buyer
FROM user_daily_spend
GROUP BY day;

When a new purchase event arrives, it propagates through all three levels: level 1 enriches it, level 2 updates the user's daily spend, level 3 updates the platform's daily metrics. All three updates happen incrementally, without re-scanning historical data.

CDC as a Stream: Treating Your Database as a Source

One of the most practically useful features of streaming SQL in 2026 is treating a relational database's change log as a stream. Instead of building a batch pipeline to copy data from PostgreSQL to an analytical system, you define a CDC source and the streaming database continuously applies the changes.

Setting up a CDC source in RisingWave:

CREATE SOURCE pg_source WITH (
    connector = 'postgres-cdc',
    hostname = 'localhost',
    port = '5432',
    username = 'postgres',
    password = '${POSTGRES_PASSWORD}',
    database.name = 'mydb',
    schema.name = 'public',
    publication.name = 'my_pub'
);

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at TIMESTAMPTZ,
    updated_at TIMESTAMPTZ
) FROM pg_source TABLE 'public.orders';

After these two statements, the orders table in RisingWave mirrors the orders table in PostgreSQL in real time. Inserts, updates, and deletes in PostgreSQL are reflected in RisingWave within milliseconds. The orders table can then be used in streaming SQL exactly like any other table or source.

Building analytics on CDC data:

-- Real-time order status distribution
CREATE MATERIALIZED VIEW order_status_counts AS
SELECT
    status,
    COUNT(*) AS count,
    SUM(amount) AS total_amount
FROM orders
GROUP BY status;

-- Revenue per 15-minute window
CREATE MATERIALIZED VIEW revenue_per_15min AS
SELECT
    SUM(amount) AS revenue,
    COUNT(*) AS order_count,
    window_start,
    window_end
FROM TUMBLE(orders, created_at, INTERVAL '15' MINUTE)
GROUP BY window_start, window_end;

The CDC source handles the operational complexity: reading the PostgreSQL replication slot, decoding the WAL (write-ahead log), handling reconnections, and maintaining exactly-once delivery. From the streaming SQL perspective, orders is just a table that happens to update continuously.

RisingWave's native CDC connectors support:

  • PostgreSQL (reads from a replication slot)
  • MySQL (reads the binary log)
  • MongoDB (reads the change stream)
  • SQL Server (reads the change data capture log)

For each of these, no Debezium, no Kafka, and no intermediate format conversion is needed. The CDC pipeline is defined in two SQL statements.

It is worth being precise about the comparison, because both tools are used in production and serve overlapping but distinct use cases.

What Flink SQL gets right:

Flink SQL has the most mature implementation of the streaming SQL standard. It handles complex stateful operations that go beyond SQL: pattern matching with MATCH_RECOGNIZE, complex event processing with the DataStream API underneath, and custom windowing logic that does not fit the standard TUMBLE/HOP/SESSION primitives.

Flink's fault tolerance model (checkpointing to distributed storage) is battle-tested at very large scale. Its state backend options (RocksDB for local state, remote storage for large state) give operators fine-grained control over memory and disk usage.

Where RisingWave is different:

The fundamental difference is the serving model. Flink SQL is a stream processor: it reads input streams, applies transformations, and writes results to external sinks (Kafka topics, JDBC tables, Iceberg files). There is no built-in query interface. To query Flink's results, you need a downstream system (ClickHouse, PostgreSQL, Redis) that Flink writes to, and you query that system.

RisingWave is a streaming database: it processes streams and serves results via the PostgreSQL wire protocol. You connect with psql and query a materialized view directly. The view is maintained incrementally, so the query returns immediately.

This difference eliminates one component from the architecture for SQL-expressible workloads. Instead of Flink + ClickHouse (stream processor + serving database), you have RisingWave (stream processor and serving database). If your use case fits within SQL and you need the results to be directly queryable, RisingWave simplifies the stack.

Choosing between them:

Use Flink if:

  • You need the Java DataStream API for custom operators
  • You have complex CEP requirements that MATCH_RECOGNIZE does not cover
  • You have an existing Flink infrastructure investment
  • Your team has deep JVM and Flink operational expertise

Use RisingWave if:

  • You want SQL-only streaming with a built-in serving layer
  • You need PostgreSQL wire protocol compatibility for existing applications
  • You want built-in CDC connectors without routing through Kafka
  • You need built-in vector support for AI workloads

Practical Limitations: Be Honest About What Still Requires Care

Streaming SQL in 2026 is production-ready, but production-ready does not mean problem-free. Understanding the genuine limitations prevents architectural mistakes.

Unbounded State Accumulation

Any streaming JOIN without time bounds accumulates state indefinitely. A JOIN between two event streams keeps all records from both sides in memory (or on disk) until they either match or are explicitly expired. In a long-running pipeline over high-volume streams, this state can grow to hundreds of gigabytes.

The solution is to add time-bounded conditions that allow old state to be expired:

-- Unbounded: state grows forever
SELECT * FROM stream_a a JOIN stream_b b ON a.key = b.key;

-- Bounded: state is limited to a 1-hour window
SELECT * FROM stream_a a JOIN stream_b b
    ON a.key = b.key
    AND a.event_time BETWEEN b.event_time - INTERVAL '1' HOUR
                         AND b.event_time + INTERVAL '1' HOUR;

The business logic usually dictates what the time bound should be. A fraud detection JOIN between transactions and login events might only care about events within the same 30-minute session. An inventory JOIN might care about events within the same business day.

High-Cardinality GROUP BY

A GROUP BY on a column with billions of unique values (such as raw session IDs or device fingerprints) creates a row in the materialized view for each unique key. At billions of rows, the memory footprint of maintaining that view becomes a significant resource planning concern.

The approach depends on the use case. If you need per-key metrics for every key, pre-aggregate at a higher granularity (by user, by region, by hour) in an upstream view before the high-cardinality view. If you only need the top-N or recent keys, add a filter or use a window to limit the rows maintained.

Watermark Configuration and Late Data

Watermarks are necessary for correct time-window semantics, but they require tuning. Setting the watermark tolerance too low causes late events to be dropped and windows to close before all data arrives, producing incorrect aggregations. Setting it too high delays window results.

The right watermark setting depends on your event time distribution: how late can a valid event arrive? This is a question about your source systems, network latency, and acceptable result delay, not a question the streaming SQL engine can answer for you.

In RisingWave, watermarks are defined on the source:

CREATE SOURCE events (
    event_id BIGINT,
    user_id BIGINT,
    event_time TIMESTAMPTZ,
    WATERMARK FOR event_time AS event_time - INTERVAL '30' SECOND
) WITH (
    connector = 'kafka',
    topic = 'events',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

This configuration tells RisingWave to tolerate up to 30 seconds of lateness before advancing the watermark. Events that arrive more than 30 seconds late relative to the current watermark are dropped.

Consistency Across Multiple Materialized Views

If your application queries multiple materialized views in a single request and expects them to reflect the same point in time, you need to understand how those views update relative to each other.

In RisingWave, each materialized view is updated independently as events arrive. Two views derived from the same source will be close in time but not guaranteed to be at exactly the same offset at the moment of any given query. For most use cases this is acceptable. For use cases requiring strict cross-view consistency, design the application to query a single view that contains all the relevant data, or accept a small window of inconsistency between views.

The 2026 Maturity Assessment

Streaming SQL in 2026 is production-ready. This is not a hedged statement. The tools work, the semantics are sound, and the operational patterns are established.

The landscape looks like this:

What is solved: Window functions (TUMBLE, HOP, SESSION), incremental materialized views, exactly-once semantics, CDC integration, PostgreSQL wire protocol serving, Iceberg sinks with exactly-once delivery, watermarks and late data handling. These features work reliably in production at RisingWave and Flink, with a large enough user base that edge cases are known and documented.

What requires configuration and knowledge: State management for unbounded joins, watermark tuning for late data, memory planning for high-cardinality aggregations, understanding window close timing. These are not unsolved problems -- they have clear solutions -- but they require understanding the streaming execution model to get right.

What remains research or specialized: Sub-millisecond latency at very high throughput with complex stateful operations, full automatic state size optimization, semantic correctness guarantees across highly complex multi-stream joins. For most production use cases, these are not constraints. They matter at the extreme edges of scale.

The main remaining challenge for teams adopting streaming SQL is not the tooling. It is the operational knowledge: understanding event time versus processing time, configuring watermarks correctly, designing materialized views that bound their state, and building the monitoring to detect when a streaming pipeline is falling behind.

That knowledge is learnable, documentable, and increasingly well-understood in the community. The era of needing a streaming expert on the team to operate a streaming SQL pipeline is ending. What was specialist knowledge in 2022 is becoming standard data engineering practice in 2026.

Putting It All Together: A Complete Streaming SQL Pipeline

Here is a complete streaming SQL pipeline that combines CDC, Kafka, window functions, joins, and serving, illustrating the full stack in practice:

-- CDC source: product catalog from PostgreSQL
CREATE SOURCE pg_source WITH (
    connector = 'postgres-cdc',
    hostname = 'postgres.internal',
    port = '5432',
    username = 'risingwave',
    password = '${POSTGRES_PASSWORD}',
    database.name = 'store',
    schema.name = 'public',
    publication.name = 'rw_pub'
);

CREATE TABLE products (
    product_id BIGINT PRIMARY KEY,
    name VARCHAR,
    category VARCHAR,
    price DECIMAL
) FROM pg_source TABLE 'public.products';

-- Kafka source: real-time order events
CREATE SOURCE orders_stream (
    order_id BIGINT,
    user_id BIGINT,
    product_id BIGINT,
    quantity INT,
    order_time TIMESTAMPTZ,
    WATERMARK FOR order_time AS order_time - INTERVAL '10' SECOND
) WITH (
    connector = 'kafka',
    topic = 'orders',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

-- Enriched order stream: join Kafka events with CDC dimension table
CREATE MATERIALIZED VIEW orders_enriched AS
SELECT
    o.order_id,
    o.user_id,
    o.quantity,
    o.order_time,
    p.name AS product_name,
    p.category,
    p.price,
    o.quantity * p.price AS line_total
FROM orders_stream o
JOIN products p ON o.product_id = p.product_id;

-- Real-time revenue by category in 5-minute tumbling windows
CREATE MATERIALIZED VIEW category_revenue_5min AS
SELECT
    category,
    SUM(line_total) AS revenue,
    COUNT(*) AS order_count,
    window_start,
    window_end
FROM TUMBLE(orders_enriched, order_time, INTERVAL '5' MINUTE)
GROUP BY category, window_start, window_end;

-- Rolling 1-hour revenue with 5-minute hop (for trend smoothing)
CREATE MATERIALIZED VIEW category_revenue_rolling_1h AS
SELECT
    category,
    SUM(line_total) AS revenue,
    COUNT(*) AS order_count,
    window_start,
    window_end
FROM HOP(orders_enriched, order_time, INTERVAL '5' MINUTE, INTERVAL '1' HOUR)
GROUP BY category, window_start, window_end;

-- Application query: current top categories (last 5 minutes)
SELECT
    category,
    revenue,
    order_count
FROM category_revenue_5min
WHERE window_end > NOW() - INTERVAL '10 minutes'
ORDER BY revenue DESC
LIMIT 10;

This pipeline has no batch jobs, no ETL schedules, no separate serving database, and no Java. The CDC connector keeps products synchronized with PostgreSQL. The Kafka source ingests order events continuously. The materialized views maintain the aggregations incrementally. The application queries the views over a standard PostgreSQL connection.

Conclusion

Streaming SQL in 2026 is a complete discipline with known syntax, established tooling, and well-understood operational patterns. The days of needing to choose between "write Java for streaming" and "accept hours of latency" are over.

The key concepts to internalize are: materialized views as the primary primitive (not ad hoc streaming queries), incremental view maintenance as the efficiency model, window functions as the tool for time-based aggregation, watermarks as the mechanism for handling late data, and CDC as the bridge between operational databases and the streaming world.

RisingWave and Flink SQL are the two mature implementations of streaming SQL in production in 2026. They serve overlapping but distinct use cases. For SQL-expressible workloads that need a built-in serving layer and PostgreSQL wire protocol compatibility, RisingWave eliminates the need for a separate downstream serving database. For complex stateful processing requiring the Java DataStream API, Flink is more powerful.

In either case, the fundamental streaming SQL primitives -- TUMBLE, HOP, SESSION, streaming JOINs, incremental materialized views, and CDC integration -- work reliably in production today. The remaining challenges are operational knowledge challenges, not tooling challenges.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.