From Flink to SQL: How to Simplify Your Stream Processing Stack

From Flink to SQL: How to Simplify Your Stream Processing Stack

Your streaming pipeline is running. Events flow in from Kafka, windowed aggregations compute, and metrics land in your downstream database on schedule. On paper, success.

But the hidden costs are accumulating. The aggregation that took one data engineer three weeks to write in Flink's DataStream API now requires a full-time platform engineer to operate. RocksDB state backends need tuning. Checkpoints fail at 2 AM. JVM garbage collection pauses create backpressure cascades that wake someone up. Every new streaming requirement means a Java ticket, a code review, a JAR build, a deployment, and a waiting period measured in weeks rather than hours.

This is the from-Flink-to-SQL simplification story that teams across data engineering are living right now. The path is straightforward: replace Flink's Java-based processing framework with a SQL-first streaming database. This guide shows you exactly what changes, why the tradeoff is worth it for most workloads, and how to execute the migration step by step.

Before switching anything, it helps to name the specific friction points. Flink's complexity is real and documented, not just a perception problem.

The Java Topology Problem

A Flink job is a directed acyclic graph (DAG) of operators written in Java or Scala. Adding a new streaming metric means touching Java source, managing Maven dependencies, running tests, building a fat JAR, and deploying it to a cluster. The feedback loop is long.

Here is what a canonical Flink DataStream job looks like for a windowed revenue aggregation, a task that comes up in almost every e-commerce or SaaS streaming stack:

// Flink DataStream API: 5-minute tumbling window revenue aggregation
// ~80 lines of Java for what is logically a GROUP BY + SUM
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;

public class RevenueByCategory {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env =
            StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<Order> orders = env
            .addSource(new FlinkKafkaConsumer<>(
                "orders",
                new OrderDeserializationSchema(),
                kafkaProperties));

        DataStream<CategoryRevenue> revenue = orders
            .assignTimestampsAndWatermarks(
                WatermarkStrategy
                    .<Order>forBoundedOutOfOrderness(Duration.ofSeconds(10))
                    .withTimestampAssigner((e, t) -> e.getOrderTime().toEpochMilli()))
            .keyBy(Order::getCategory)
            .window(TumblingEventTimeWindows.of(Time.minutes(5)))
            .aggregate(new RevenueAggregator());

        revenue.addSink(new FlinkKafkaProducer<>(
            "revenue-output",
            new RevenueSerializationSchema(),
            kafkaProperties));

        env.execute("Revenue By Category");
    }

    // AggregateFunction requires four type parameters and three method overrides
    static class RevenueAggregator implements
            AggregateFunction<Order, double[], CategoryRevenue> {

        @Override
        public double[] createAccumulator() { return new double[]{0, 0}; }

        @Override
        public double[] add(Order order, double[] acc) {
            acc[0] += order.getAmount();
            acc[1]++;
            return acc;
        }

        @Override
        public CategoryRevenue getResult(double[] acc) {
            return new CategoryRevenue(acc[0], (long) acc[1]);
        }

        @Override
        public double[] merge(double[] a, double[] b) {
            return new double[]{a[0] + b[0], a[1] + b[1]};
        }
    }
}

This is a stripped-down version. A production job adds error handling, metrics emission, dead-letter queues, and retry logic. The final line count is typically 200-400 lines for logic that is conceptually simple: sum revenue by category in 5-minute windows.

The JVM Operations Burden

Beyond the code itself, a Flink cluster in production carries a significant operational surface area. A minimal production deployment requires:

  • A JobManager (two for high availability) running as a JVM process
  • Multiple TaskManagers with tuned heap sizes, managed memory fractions, and network buffer configurations
  • A RocksDB state backend per TaskManager, with its own block cache and write buffer settings
  • Checkpoint coordination with S3 or HDFS, including timeout tuning and minimum pause intervals
  • ZooKeeper or etcd for HA coordination

When a TaskManager fails, all jobs on that node restart simultaneously. This can cascade: state restoration saturates your storage backend, causing other TaskManagers to miss heartbeats, triggering additional restarts. Platform teams call this a "thundering herd" recovery scenario. It is not hypothetical; it is a regular occurrence in large Flink deployments.

JVM garbage collection is a separate category of pain. Flink's TaskManagers create enormous numbers of short-lived objects (event records, serialization buffers, intermediate aggregations). Stop-the-world GC pauses translate directly into checkpoint timeouts and backpressure. Even with modern collectors like ZGC or Shenandoah, you spend real engineering time tuning GC flags to minimize pause frequency.

For a deeper look at the operational cost comparison, see Flink vs RisingWave: Total Cost of Ownership.

The SQL-First Alternative

RisingWave is a streaming database that uses PostgreSQL-compatible SQL as its only interface. Instead of DataStream operators compiled into JARs, you write CREATE MATERIALIZED VIEW statements. Instead of managing RocksDB state backends, state lives on object storage automatically. Instead of deploying a JobManager and TaskManagers, you connect with psql and run SQL.

The architecture is built from scratch in Rust, which matters for two reasons: no JVM means no garbage collection pauses, and compute-storage separation means state scales on S3 without local disk provisioning.

To understand how this changes the developer experience, walk through the same use cases from the Flink code above, rewritten as SQL that runs and produces verified output.

Building the Demo Pipeline

The examples below are verified against a local RisingWave 2.8.0 instance. All queries run exactly as shown.

Step 1: Define the Event Source

First, create a table representing the incoming order event stream. In a production setup this would connect to Kafka using CREATE SOURCE; here we use a regular table to demonstrate the logic without connector configuration:

CREATE TABLE f2sql_orders (
    order_id     BIGINT,
    customer_id  BIGINT,
    category     VARCHAR,
    region       VARCHAR,
    amount       NUMERIC,
    order_time   TIMESTAMP
);

Output:

CREATE_TABLE

In production, replacing this with a Kafka source requires only swapping the DDL command and adding the connection details. The downstream materialized views require no changes because they reference the table name, not its connector.

Step 2: Running Aggregation with a Materialized View

The equivalent of Flink's keyed aggregation with no window is a streaming materialized view. This view continuously updates as new orders arrive:

CREATE MATERIALIZED VIEW f2sql_order_stats_by_region AS
SELECT
    region,
    COUNT(*)                          AS total_orders,
    SUM(amount)                       AS total_revenue,
    AVG(amount)                       AS avg_order_value,
    MAX(amount)                       AS max_order_value
FROM f2sql_orders
GROUP BY region;

Output:

CREATE_MATERIALIZED_VIEW

After inserting sample data, querying the view returns live results with no manual refresh:

SELECT region, total_orders, total_revenue, ROUND(avg_order_value, 2) AS avg_order_value
FROM f2sql_order_stats_by_region
ORDER BY total_revenue DESC;

Output:

   region   | total_orders | total_revenue | avg_order_value
------------+--------------+---------------+-----------------
 us-west    |            4 |       3428.97 |          857.24
 us-east    |            4 |       1449.44 |          362.36
 eu-central |            3 |        539.99 |          180.00
(3 rows)

The view is always up to date. There is no batch job, no scheduled refresh, and no cache invalidation to manage.

Step 3: Windowed Aggregation with TUMBLE

The same 5-minute tumbling window aggregation that required the 80-line Java AggregateFunction above becomes a single SQL statement:

CREATE MATERIALIZED VIEW f2sql_order_window_stats AS
SELECT
    category,
    window_start,
    window_end,
    COUNT(*)       AS order_count,
    SUM(amount)    AS window_revenue
FROM TUMBLE(f2sql_orders, order_time, INTERVAL '5 MINUTES')
GROUP BY category, window_start, window_end;

Output:

CREATE_MATERIALIZED_VIEW

Query the results:

SELECT category, window_start, window_end, order_count, window_revenue
FROM f2sql_order_window_stats
ORDER BY window_start, category;

Output:

  category   |    window_start     |     window_end      | order_count | window_revenue
-------------+---------------------+---------------------+-------------+----------------
 books       | 2026-04-01 10:00:00 | 2026-04-01 10:05:00 |           2 |          48.98
 clothing    | 2026-04-01 10:00:00 | 2026-04-01 10:05:00 |           3 |         207.45
 electronics | 2026-04-01 10:00:00 | 2026-04-01 10:05:00 |           3 |        1347.99
 books       | 2026-04-01 10:05:00 | 2026-04-01 10:10:00 |           1 |          14.99
 electronics | 2026-04-01 10:05:00 | 2026-04-01 10:10:00 |           2 |        3798.99
(5 rows)

The window boundaries are computed automatically. RisingWave tracks which window each event belongs to as it arrives, without requiring you to define watermark strategies in code.

Step 4: Cascading Materialized Views for Multi-Stage Logic

One of the most powerful patterns in RisingWave is cascading materialized views: a view that reads from another view. This lets you build multi-stage pipelines entirely in SQL. In Flink, this requires chaining DataStream operators, which means more Java and another deployment unit.

The following view joins the raw orders stream with the running regional averages computed in the previous view, and identifies orders that are more than twice the regional average. This is a common pattern for anomaly detection or high-value order flagging:

CREATE MATERIALIZED VIEW f2sql_high_value_orders AS
SELECT
    o.order_id,
    o.customer_id,
    o.category,
    o.region,
    o.amount,
    s.avg_order_value                                               AS region_avg,
    ROUND((o.amount / NULLIF(s.avg_order_value, 0) - 1) * 100, 1) AS pct_above_avg
FROM f2sql_orders AS o
JOIN f2sql_order_stats_by_region AS s
  ON o.region = s.region
WHERE o.amount > s.avg_order_value * 2;

Output:

CREATE_MATERIALIZED_VIEW

Query results:

SELECT order_id, customer_id, category, region, amount,
       ROUND(region_avg, 2) AS region_avg, pct_above_avg
FROM f2sql_high_value_orders
ORDER BY pct_above_avg DESC;

Output:

 order_id | customer_id |  category   |   region   | amount  | region_avg | pct_above_avg
----------+-------------+-------------+------------+---------+------------+---------------
     1009 |         209 | electronics | us-east    | 1299.00 |     362.36 |         258.5
     1011 |         211 | electronics | us-west    | 2499.99 |     857.24 |         191.6
     1006 |         206 | electronics | eu-central |  449.00 |     180.00 |         149.4
(3 rows)

Three orders are flagged as anomalies, each more than 149% above their regional average. The view updates incrementally: when a new order arrives, RisingWave recalculates only the affected rows, not the entire result set.

In Flink, this multi-stage logic would require two separate jobs (or a complex side-output chain), each with its own state backend, checkpoint configuration, and deployment lifecycle.

To understand how incremental maintenance works under the hood, see Incremental Materialized Views Explained.

What the Migration Actually Looks Like

Moving a Flink pipeline to RisingWave is not a lift-and-shift of code. It is a conceptual shift from building a processing topology to writing SQL queries against a streaming database.

Flink ConceptRisingWave Equivalent
DataStream API (Java/Scala)CREATE MATERIALIZED VIEW (SQL)
Flink SQLPostgreSQL-compatible SQL
KeyedState / RocksDBHummock (S3-based, automatic)
Checkpointing to S3Built-in barrier-based checkpointing
CREATE TABLE (source)CREATE SOURCE or CREATE TABLE
CREATE TABLE (sink)CREATE SINK
TUMBLE / HOP TVFsTUMBLE() / HOP() table functions
JobManager + TaskManagersSingle binary or Kubernetes pods
SavepointsAutomatic S3 snapshots
Watermark strategiesWATERMARK FOR clause in source DDL

The Migration Steps

Audit your Flink jobs. List every job running in production. For each one, identify what it reads, what transformations it applies, and what it writes. This audit will surface which jobs are straightforward SQL candidates (aggregations, windowing, joins) and which might need custom logic (MATCH_RECOGNIZE for CEP, low-level ProcessFunction with timers).

Rewrite sources as CREATE SOURCE or CREATE TABLE. Flink uses CREATE TABLE ... WITH (connector = 'kafka', ...) for both sources and sinks. RisingWave separates these. A Kafka source becomes CREATE SOURCE with a FORMAT ... ENCODE ... clause. If you need to query the raw source data as well as use it in materialized views, use CREATE TABLE with the connector option.

Translate DataStream logic to materialized views. Each Flink job becomes one or more CREATE MATERIALIZED VIEW statements. Multi-stage jobs with intermediate topics become cascading views. You eliminate the intermediate Kafka topics entirely in many cases.

Point your serving layer at RisingWave directly. Because RisingWave is a PostgreSQL-compatible database, your existing BI tools, dashboards, and application queries can connect directly without a separate serving database. This removes an entire infrastructure tier that Flink setups typically require.

Validate results. Run both systems in parallel for a period, comparing output. RisingWave's exactly-once semantics match Flink's, so numerical results should be identical for the same input.

For a complete walkthrough with more complex pipeline examples, see Migrating from Apache Flink to RisingWave: A Step-by-Step Guide.

This guide is not arguing that Flink is obsolete. It has capabilities that RisingWave does not match:

Complex Event Processing (CEP). Flink's MATCH_RECOGNIZE clause and the DataStream CEP library let you detect patterns across event sequences with full control over timing and state. RisingWave does not support MATCH_RECOGNIZE. If your use case depends on sequence matching across events, Flink remains the right tool.

Custom Java operators. The DataStream API lets you write arbitrary Java operators with access to raw state, timers, and side outputs. RisingWave offers UDFs in Python, Java, JavaScript, and Rust for extending SQL, but these run as external processes, not as native streaming operators.

Batch unification. Flink supports batch and streaming in a single API. RisingWave is a streaming database that also supports historical queries via SQL, but its primary design is incremental streaming computation.

For workloads that need CEP or deeply customized operator logic, evaluate Flink alongside RisingWave. For the majority of analytical streaming workloads (aggregations, windowing, joins, CDC, real-time dashboards), SQL-first streaming removes more complexity than it adds.

The Developer Experience Shift

The clearest way to measure the improvement is to count what disappears from your engineering stack:

  • No JAR build step before testing a streaming logic change
  • No Maven dependency conflict resolution
  • No JVM heap size configuration
  • No RocksDB tuning parameters
  • No checkpoint timeout alerts
  • No dedicated platform engineer to operate the cluster
  • No separate serving database for query results

What replaces all of this is a psql connection and SQL. Data engineers who already know SQL can write and deploy streaming pipelines independently. Iteration cycles go from days to minutes. Schema changes become an ALTER TABLE statement rather than a code change and redeployment.

For a broader look at the developer experience comparison across multiple dimensions, see Apache Flink vs RisingWave: A Practical Comparison for 2026.

Frequently Asked Questions

Does RisingWave support exactly-once processing like Flink?

Yes. RisingWave uses barrier-based checkpointing with automatic recovery to guarantee exactly-once semantics. State is durably stored in S3-compatible object storage and recovered automatically after any failure. Unlike Flink, where checkpoint interval configuration is a common source of operational issues, RisingWave manages checkpoint barriers internally with a default interval of one second.

What happens to my Kafka topics when I migrate to RisingWave?

Your Kafka topics remain unchanged. RisingWave connects to Kafka as a consumer via CREATE SOURCE, reading events without modifying your existing topic configurations. You can run RisingWave alongside your Flink jobs during a parallel validation period, with both systems reading from the same topics independently.

Can I use my existing dashboards and BI tools with RisingWave?

Yes. RisingWave is PostgreSQL-compatible, which means any tool that supports PostgreSQL works without modification: Grafana, Metabase, Superset, Redash, Tableau, and standard JDBC/ODBC drivers. Your dashboards can query materialized view results directly, removing the need for a separate downstream database tier.

What if some of my Flink jobs use MATCH_RECOGNIZE for complex event processing?

MATCH_RECOGNIZE is not supported in RisingWave. For jobs that depend on CEP pattern matching, keep those workloads on Flink and migrate the rest. A hybrid approach is valid: Flink handles CEP pipelines, RisingWave handles the aggregation and serving workloads. The two systems can coexist in the same data architecture.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.