Stream Processing for PostgreSQL Users: Why Not Flink

If you are a PostgreSQL developer who needs real-time data processing, you have probably been told to look at Apache Flink. Flink is the industry standard for stream processing for PostgreSQL users who want to analyze data the moment it arrives. But Flink was built by and for Java engineers, and the gap between what you know and what Flink requires is wider than most tutorials admit.

This guide is for PostgreSQL developers who already know how to write queries, model data, and operate a database. It explains exactly why Flink is a difficult match for your skill set, what a better alternative looks like, and how to build a working stream processing pipeline using nothing but the SQL you already know.

All SQL examples in this post are verified against RisingWave 2.8.0.

What PostgreSQL Developers Already Know

Before looking at what Flink demands, it helps to inventory what you already have.

A PostgreSQL developer can:

Write CREATE TABLE, INSERT, UPDATE, and DELETE statements from memory
Model data with foreign keys, indexes, and appropriate types
Write complex SELECT queries with JOIN, GROUP BY, WINDOW, and subqueries
Connect to a database with psql, JDBC, psycopg2, or any PostgreSQL-compatible driver
Operate a database: configure connections, monitor slow queries, manage backups

These skills transfer almost entirely to RisingWave, a streaming database that uses PostgreSQL-compatible SQL as its only interface. They do not transfer to Apache Flink.

Why Flink Is a Poor Match for PostgreSQL Developers

Flink is a distributed stream processing framework, not a database. That distinction matters more than it might appear.

Flink Requires a Different Programming Language

Flink's primary API is Java. A realistic streaming job requires a Maven or Gradle project, explicit type annotations, a StreamExecutionEnvironment, sources and sinks defined in code, and a JAR submission to a running cluster.

Here is what even a simple per-minute aggregation looks like in Flink's DataStream API:

StreamExecutionEnvironment env =
    StreamExecutionEnvironment.getExecutionEnvironment();

KafkaSource<String> source = KafkaSource.<String>builder()
    .setBootstrapServers("broker:9092")
    .setTopics("orders")
    .setStartingOffsets(OffsetsInitializer.earliest())
    .setValueOnlyDeserializer(new SimpleStringSchema())
    .build();

DataStream<String> stream = env.fromSource(
    source,
    WatermarkStrategy.noWatermarks(),
    "Kafka Source"
);

// Parse JSON, key by region, apply window, aggregate...
// ~80 more lines of Java before you get results

Flink SQL was added later to address exactly this complaint. But Flink SQL is not a standalone tool. You still need a running Flink cluster (JobManager and TaskManagers), a catalog to register connectors, and connector JARs on the classpath. The SQL layer sits on top of the Java infrastructure, not instead of it.

Flink Requires a Different Mental Model

PostgreSQL developers think in terms of tables, rows, indexes, and queries. You write a SELECT statement, it returns a result set, done.

Flink thinks in terms of directed acyclic graphs (DAGs) of streaming operators. A Flink job is a compiled dataflow where each node is a parallel operator and edges are data channels. You reason about:

Parallelism - How many parallel instances of each operator to run
Keying - How to partition state across operator instances with keyBy()
Windowing - How to group unbounded streams into finite time buckets
Watermarks - How Flink estimates event-time progress across partitions
State backends - Where operator state lives (heap, RocksDB, remote storage)
Checkpointing - How Flink periodically snapshots state for fault tolerance

These are not concepts that map to anything in PostgreSQL. You are learning a new discipline, not extending an existing one.

Flink Requires Separate Infrastructure to Operate

Running Flink in production means running a cluster. A minimal production setup includes:

JobManager - Coordinates job submission, checkpointing, and failure recovery
TaskManagers - Execute the actual operators; you provision one per parallel slot
ZooKeeper (for HA) - Provides leader election and high-availability coordination
State backend - Either in-memory, RocksDB on local disk, or S3 via Flink 2.0's ForSt
Checkpoint storage - HDFS or S3 for periodic state snapshots

When a TaskManager fails, every job that had operators on that node restarts from the last checkpoint. If your checkpoints are large or infrequent, recovery can take minutes. Checkpoint failures cascade: a failed checkpoint causes a retry, which can saturate storage, which delays the next checkpoint.

Maintaining this infrastructure is a full-time job for a platform engineer. For a PostgreSQL developer who just wants to query streaming data, it is a significant distraction from the actual work.

How RisingWave Transfers Your PostgreSQL Knowledge

RisingWave is a streaming database. It exposes the PostgreSQL wire protocol and accepts standard PostgreSQL-compatible SQL. You connect to it exactly like you connect to PostgreSQL.

psql -h localhost -p 4566 -U root -d dev

That is the entire connection step. No cluster configuration, no connector JARs, no DAG definition.

Tables Work Like PostgreSQL Tables

Create a table for order data the same way you would in PostgreSQL:

CREATE TABLE pguser_orders (
    order_id    BIGINT,
    customer_id BIGINT,
    region      VARCHAR,
    product     VARCHAR,
    amount      NUMERIC,
    status      VARCHAR,
    created_at  TIMESTAMPTZ
);

Insert rows the same way:

INSERT INTO pguser_orders VALUES
    (1, 101, 'us-east', 'widget-pro',   149.99, 'completed', NOW() - INTERVAL '5 minutes'),
    (2, 102, 'us-west', 'widget-basic',  49.99, 'completed', NOW() - INTERVAL '4 minutes'),
    (3, 103, 'eu-west', 'widget-pro',   149.99, 'completed', NOW() - INTERVAL '3 minutes'),
    (4, 101, 'us-east', 'gadget-x',     299.99, 'pending',   NOW() - INTERVAL '2 minutes'),
    (5, 104, 'ap-south','widget-basic',  49.99, 'completed', NOW() - INTERVAL '1 minute'),
    (6, 102, 'us-west', 'gadget-x',     299.99, 'completed', NOW());

Query the table the same way:

SELECT
    order_id,
    customer_id,
    region,
    amount,
    status
FROM pguser_orders
WHERE status = 'completed'
ORDER BY amount DESC
LIMIT 5;

 order_id | customer_id | region  | amount |  status   
----------+-------------+---------+--------+-----------
        6 |         102 | us-west | 299.99 | completed
        3 |         103 | eu-west | 149.99 | completed
        1 |         101 | us-east | 149.99 | completed
        2 |         102 | us-west |  49.99 | completed
        5 |         104 | ap-south|  49.99 | completed
(5 rows)

Everything you already know works. The difference comes when you want to process data as it arrives.

Materialized Views Replace Streaming Jobs

In PostgreSQL, a materialized view is a snapshot of a query result that you refresh manually with REFRESH MATERIALIZED VIEW. In RisingWave, a materialized view is continuously and incrementally maintained as new data arrives. There is no refresh step - the results are always current.

A materialized view is a precomputed query result that RisingWave updates incrementally as upstream data changes, without recomputing from scratch.

To compute revenue by region in real time:

CREATE MATERIALIZED VIEW pguser_revenue_by_region AS
SELECT
    region,
    COUNT(*)             AS total_orders,
    SUM(amount)          AS total_revenue,
    ROUND(AVG(amount),2) AS avg_order_value
FROM pguser_orders
GROUP BY region;

Query it immediately:

SELECT * FROM pguser_revenue_by_region ORDER BY total_revenue DESC;

  region  | total_orders | total_revenue | avg_order_value 
----------+--------------+---------------+-----------------
 us-east  |            2 |        449.98 |          224.99
 us-west  |            2 |        349.98 |          174.99
 eu-west  |            1 |        149.99 |          149.99
 ap-south |            1 |         49.99 |           49.99
(4 rows)

Now insert two more orders:

INSERT INTO pguser_orders VALUES
    (7, 105, 'us-east', 'gadget-x', 299.99, 'completed', NOW()),
    (8, 106, 'eu-west', 'gadget-x', 299.99, 'completed', NOW());

Query the view again without any refresh:

SELECT * FROM pguser_revenue_by_region ORDER BY total_revenue DESC;

  region  | total_orders | total_revenue | avg_order_value 
----------+--------------+---------------+-----------------
 us-east  |            3 |        749.97 |          249.99
 eu-west  |            2 |        449.98 |          224.99
 us-west  |            2 |        349.98 |          174.99
 ap-south |            1 |         49.99 |           49.99
(4 rows)

The us-east row updated from 2 orders / $449.98 to 3 orders / $749.97. The eu-west row updated from 1 order / $149.99 to 2 orders / $449.98. RisingWave applied the new rows to the existing aggregation incrementally - it did not recompute from all 8 rows each time.

For more on how incremental maintenance works under the hood, see Incremental Materialized Views Explained.

Cascading Materialized Views Replace Multi-Stage Pipelines

In Flink, a multi-stage pipeline is a chain of operators: source, filter, keyBy, window, aggregate, sink. Each stage is compiled and linked at job submission time.

In RisingWave, you build multi-stage pipelines by creating materialized views that read from other materialized views. Consider adding a per-product breakdown on top of the regional revenue view:

CREATE MATERIALIZED VIEW pguser_order_stats AS
SELECT
    product,
    status,
    COUNT(*)    AS order_count,
    SUM(amount) AS total_amount
FROM pguser_orders
GROUP BY product, status;

SELECT * FROM pguser_order_stats ORDER BY product, status;

   product    |  status   | order_count | total_amount 
--------------+-----------+-------------+--------------
 gadget-x     | completed |           3 |       899.97
 gadget-x     | pending   |           1 |       299.99
 widget-basic | completed |           2 |        99.98
 widget-pro   | completed |           2 |       299.98
(4 rows)

Then build a summary view on top of that:

CREATE MATERIALIZED VIEW pguser_top_products AS
SELECT
    product,
    SUM(total_amount) AS revenue
FROM pguser_order_stats
GROUP BY product
ORDER BY revenue DESC;

SELECT * FROM pguser_top_products ORDER BY revenue DESC;

   product    | revenue  
--------------+----------
 gadget-x     | 1199.96
 widget-pro   |  299.98
 widget-basic |   99.98
(3 rows)

Each materialized view is independently queryable and independently maintained. This is exactly how you would structure a multi-step analytics pipeline in PostgreSQL - staging tables and views - except every layer stays current automatically.

Time-Windowed Aggregations: Streaming-Specific SQL

Stream processing adds one concept that PostgreSQL does not have: time-windowed aggregations over unbounded data. RisingWave extends standard SQL with TUMBLE, HOP, and SESSION window functions. The syntax is close to what you would write for a date_trunc group-by in PostgreSQL, but it handles streaming semantics automatically.

A tumbling window divides a stream into non-overlapping, fixed-size time buckets. To count orders and revenue per minute per region:

CREATE MATERIALIZED VIEW pguser_orders_per_minute AS
SELECT
    window_start,
    window_end,
    region,
    COUNT(*)    AS order_count,
    SUM(amount) AS total_revenue
FROM TUMBLE(pguser_orders, created_at, INTERVAL '1 MINUTE')
GROUP BY window_start, window_end, region;

SELECT window_start, window_end, region, order_count, total_revenue
FROM pguser_orders_per_minute
ORDER BY window_start, region;

       window_start        |        window_end         |  region  | order_count | total_revenue 
---------------------------+---------------------------+----------+-------------+---------------
 2026-04-02 07:22:00+00:00 | 2026-04-02 07:23:00+00:00 | us-east  |           1 |        149.99
 2026-04-02 07:23:00+00:00 | 2026-04-02 07:24:00+00:00 | us-west  |           1 |         49.99
 2026-04-02 07:24:00+00:00 | 2026-04-02 07:25:00+00:00 | eu-west  |           1 |        149.99
 2026-04-02 07:25:00+00:00 | 2026-04-02 07:26:00+00:00 | us-east  |           1 |        299.99
 2026-04-02 07:26:00+00:00 | 2026-04-02 07:27:00+00:00 | ap-south |           1 |         49.99
 2026-04-02 07:27:00+00:00 | 2026-04-02 07:28:00+00:00 | us-west  |           1 |        299.99
 2026-04-02 07:31:00+00:00 | 2026-04-02 07:32:00+00:00 | eu-west  |           1 |        299.99
 2026-04-02 07:31:00+00:00 | 2026-04-02 07:32:00+00:00 | us-east  |           1 |        299.99
(8 rows)

Compare this to the Flink SQL equivalent:

-- Flink SQL - requires a running Flink cluster and catalog
CREATE TABLE flink_orders (
    order_id    BIGINT,
    region      STRING,
    amount      DOUBLE,
    created_at  TIMESTAMP(3),
    WATERMARK FOR created_at AS created_at - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kafka',
    'topic' = 'orders',
    'properties.bootstrap.servers' = 'broker:9092',
    'format' = 'json'
);

-- Then the aggregation:
SELECT
    TUMBLE_START(created_at, INTERVAL '1' MINUTE) AS window_start,
    TUMBLE_END(created_at, INTERVAL '1' MINUTE)   AS window_end,
    region,
    COUNT(*),
    SUM(amount)
FROM TABLE(TUMBLE(TABLE flink_orders, DESCRIPTOR(created_at), INTERVAL '1' MINUTE))
GROUP BY window_start, window_end, region;

The RisingWave version is shorter and reads more like standard SQL. More importantly, the RisingWave version runs against a system you already know how to connect to and operate. For a detailed side-by-side comparison of windowing syntax and other SQL features, see Flink SQL vs RisingWave SQL: Syntax and Feature Comparison.

Connecting to Real Streaming Sources

In production, you typically ingest from Kafka rather than direct inserts. In RisingWave, a Kafka source looks like a table with a WITH clause:

-- Connect to a Kafka topic as a streaming source
CREATE TABLE orders_stream (
    order_id    BIGINT,
    customer_id BIGINT,
    region      VARCHAR,
    product     VARCHAR,
    amount      NUMERIC,
    status      VARCHAR,
    created_at  TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'orders',
    properties.bootstrap.servers = 'broker:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

After that, everything is the same SQL you already wrote. You CREATE MATERIALIZED VIEW on top of orders_stream the same way you would on a regular table. The streaming source is an implementation detail; your analytics logic is just SQL.

RisingWave also supports direct CDC ingestion from PostgreSQL using logical replication, without requiring Kafka or Debezium as middleware. If your data starts in PostgreSQL, you can stream changes directly into RisingWave. See PostgreSQL CDC to Streaming SQL: A Complete Tutorial for step-by-step instructions.

What You Do Not Need With RisingWave

Here is a concrete list of what Flink requires and RisingWave removes:

Capability	Apache Flink	RisingWave
Language	Java / Scala (primary), Python / SQL (secondary)	SQL only
Cluster setup	JobManager + TaskManagers + ZooKeeper	Single deployment or managed cloud
State backend	RocksDB configuration, checkpoint tuning	Automatic (S3-backed via Hummock)
Driver / client	Flink CLI, REST API, or Java code	Any PostgreSQL client (psql, JDBC, psycopg2)
Mental model	DAG of streaming operators	Tables and materialized views
Serving layer	External database (e.g., PostgreSQL, Redis)	Built-in - query results directly
Recovery	Checkpoint restore (can take minutes)	Automatic, sub-second
Code deployment	JAR packaging, cluster submission	`CREATE MATERIALIZED VIEW` statement

The serving layer difference is particularly important for PostgreSQL developers. In a Flink pipeline, the aggregation results go to a sink - usually an external database like Redis or PostgreSQL. You maintain two systems: Flink for processing and a database for serving. With RisingWave, the processing and serving happen in the same system. You query materialized views directly, exactly like you query tables in PostgreSQL.

When Flink Still Makes Sense

RisingWave is not a drop-in replacement for every Flink use case. Flink has genuine advantages in specific scenarios:

Custom Java operators - If your pipeline requires logic that cannot be expressed in SQL, Flink's DataStream API gives you full programmatic control over state, timers, and side outputs.
Complex event processing (CEP) - Flink's MATCH_RECOGNIZE clause enables pattern detection across event sequences that has no equivalent in standard SQL.
Extreme throughput at scale - At millions of events per second with very large state, Flink's mature operator model and checkpointing have a longer track record in production.
Existing Java/Scala investment - If your team has years of Flink expertise and established tooling, rewriting in SQL has a cost that may not be justified.

For teams whose primary expertise is SQL and database operations, and whose use cases involve aggregations, joins, CDC, and time-windowed analytics, RisingWave covers the majority of real-world stream processing workloads with significantly less operational overhead. Read 5 Reasons Teams Switch from Flink to Streaming SQL for concrete migration stories from teams that made this transition.

Architecture: How RisingWave Handles the Hard Parts

When PostgreSQL developers ask "but how does it stay consistent?" or "what happens when a node fails?", the answer is similar to how PostgreSQL handles these problems - through a well-designed internal architecture, not through user configuration.

flowchart LR
    A[psql / JDBC / psycopg2] -->|PostgreSQL protocol| B[RisingWave Frontend]
    B --> C[Compute Nodes]
    B --> D[Meta Service]
    C -->|streaming operators| E[Hummock - LSM-tree]
    E -->|object storage| F[S3 / GCS / MinIO]
    G[Compactor Nodes] -->|background compaction| E

Frontend accepts SQL over the PostgreSQL wire protocol and generates query plans
Compute nodes run streaming operators and serve batch queries
Meta service coordinates checkpointing, scheduling, and failure recovery
Hummock is a purpose-built LSM-tree storage engine that writes all state to object storage (S3, GCS, or MinIO)
Compactor nodes handle background storage compaction without blocking query serving

Because all state lives in object storage, node failures are non-catastrophic. A replacement node starts from the last checkpoint in S3, and recovery happens automatically without operator intervention. There is no RocksDB to tune, no disk capacity to plan, and no checkpoint interval to configure. The defaults work for most workloads.

Getting Started

If you have PostgreSQL installed, you can run RisingWave locally in one command:

brew install risingwave
risingwave single-node

Then connect exactly as you would to PostgreSQL:

psql -h localhost -p 4566 -U root -d dev

From that point, every CREATE TABLE, INSERT, SELECT, and CREATE MATERIALIZED VIEW statement in this post runs without modification.

For a managed deployment without any local setup, RisingWave Cloud provides a free tier that includes all features covered in this post.

FAQ

What is stream processing for PostgreSQL users?

Stream processing for PostgreSQL users is real-time data processing using familiar SQL syntax and tools. RisingWave allows PostgreSQL developers to build streaming pipelines with the same SQL knowledge they already have, connecting via the PostgreSQL wire protocol and using CREATE MATERIALIZED VIEW instead of Flink jobs.

Why is Apache Flink difficult for PostgreSQL developers?

Apache Flink requires Java or Scala for its primary API, a separate cluster infrastructure (JobManager, TaskManagers, ZooKeeper), and a mental model based on DAGs of streaming operators - none of which maps to PostgreSQL concepts. Flink SQL helps but still requires the cluster infrastructure underneath.

How does RisingWave's materialized view differ from PostgreSQL's materialized view?

In PostgreSQL, a materialized view stores a snapshot that you refresh manually with REFRESH MATERIALIZED VIEW. In RisingWave, a materialized view is incrementally maintained: as new rows arrive or existing rows change, RisingWave applies the delta to the precomputed result without recomputing from scratch. The view is always current with no refresh step required.

Can RisingWave connect to the same Kafka topics Flink uses?

Yes. RisingWave connects to Kafka using a CREATE TABLE statement with connector = 'kafka'. You can point it at any Kafka topic your Flink jobs currently consume, without changing producers or topic configuration. RisingWave also supports Kinesis, Pulsar, and direct PostgreSQL CDC as streaming sources.

Ready to try this yourself? Try RisingWave Cloud free, no credit card required. Sign up →

Join our Slack community to ask questions and connect with other stream processing developers.