If you are a PostgreSQL developer who needs real-time data processing, you have probably been told to look at Apache Flink. Flink is the industry standard for stream processing for PostgreSQL users who want to analyze data the moment it arrives. But Flink was built by and for Java engineers, and the gap between what you know and what Flink requires is wider than most tutorials admit.
This guide is for PostgreSQL developers who already know how to write queries, model data, and operate a database. It explains exactly why Flink is a difficult match for your skill set, what a better alternative looks like, and how to build a working stream processing pipeline using nothing but the SQL you already know.
All SQL examples in this post are verified against RisingWave 2.8.0.
What PostgreSQL Developers Already Know
Before looking at what Flink demands, it helps to inventory what you already have.
A PostgreSQL developer can:
- Write
CREATE TABLE,INSERT,UPDATE, andDELETEstatements from memory - Model data with foreign keys, indexes, and appropriate types
- Write complex
SELECTqueries withJOIN,GROUP BY,WINDOW, and subqueries - Connect to a database with
psql, JDBC, psycopg2, or any PostgreSQL-compatible driver - Operate a database: configure connections, monitor slow queries, manage backups
These skills transfer almost entirely to RisingWave, a streaming database that uses PostgreSQL-compatible SQL as its only interface. They do not transfer to Apache Flink.
Why Flink Is a Poor Match for PostgreSQL Developers
Flink is a distributed stream processing framework, not a database. That distinction matters more than it might appear.
Flink Requires a Different Programming Language
Flink's primary API is Java. A realistic streaming job requires a Maven or Gradle project, explicit type annotations, a StreamExecutionEnvironment, sources and sinks defined in code, and a JAR submission to a running cluster.
Here is what even a simple per-minute aggregation looks like in Flink's DataStream API:
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("broker:9092")
.setTopics("orders")
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new SimpleStringSchema())
.build();
DataStream<String> stream = env.fromSource(
source,
WatermarkStrategy.noWatermarks(),
"Kafka Source"
);
// Parse JSON, key by region, apply window, aggregate...
// ~80 more lines of Java before you get results
Flink SQL was added later to address exactly this complaint. But Flink SQL is not a standalone tool. You still need a running Flink cluster (JobManager and TaskManagers), a catalog to register connectors, and connector JARs on the classpath. The SQL layer sits on top of the Java infrastructure, not instead of it.
Flink Requires a Different Mental Model
PostgreSQL developers think in terms of tables, rows, indexes, and queries. You write a SELECT statement, it returns a result set, done.
Flink thinks in terms of directed acyclic graphs (DAGs) of streaming operators. A Flink job is a compiled dataflow where each node is a parallel operator and edges are data channels. You reason about:
- Parallelism - How many parallel instances of each operator to run
- Keying - How to partition state across operator instances with
keyBy() - Windowing - How to group unbounded streams into finite time buckets
- Watermarks - How Flink estimates event-time progress across partitions
- State backends - Where operator state lives (heap, RocksDB, remote storage)
- Checkpointing - How Flink periodically snapshots state for fault tolerance
These are not concepts that map to anything in PostgreSQL. You are learning a new discipline, not extending an existing one.
Flink Requires Separate Infrastructure to Operate
Running Flink in production means running a cluster. A minimal production setup includes:
- JobManager - Coordinates job submission, checkpointing, and failure recovery
- TaskManagers - Execute the actual operators; you provision one per parallel slot
- ZooKeeper (for HA) - Provides leader election and high-availability coordination
- State backend - Either in-memory, RocksDB on local disk, or S3 via Flink 2.0's ForSt
- Checkpoint storage - HDFS or S3 for periodic state snapshots
When a TaskManager fails, every job that had operators on that node restarts from the last checkpoint. If your checkpoints are large or infrequent, recovery can take minutes. Checkpoint failures cascade: a failed checkpoint causes a retry, which can saturate storage, which delays the next checkpoint.
Maintaining this infrastructure is a full-time job for a platform engineer. For a PostgreSQL developer who just wants to query streaming data, it is a significant distraction from the actual work.
How RisingWave Transfers Your PostgreSQL Knowledge
RisingWave is a streaming database. It exposes the PostgreSQL wire protocol and accepts standard PostgreSQL-compatible SQL. You connect to it exactly like you connect to PostgreSQL.
psql -h localhost -p 4566 -U root -d dev
That is the entire connection step. No cluster configuration, no connector JARs, no DAG definition.
Tables Work Like PostgreSQL Tables
Create a table for order data the same way you would in PostgreSQL:
CREATE TABLE pguser_orders (
order_id BIGINT,
customer_id BIGINT,
region VARCHAR,
product VARCHAR,
amount NUMERIC,
status VARCHAR,
created_at TIMESTAMPTZ
);
Insert rows the same way:
INSERT INTO pguser_orders VALUES
(1, 101, 'us-east', 'widget-pro', 149.99, 'completed', NOW() - INTERVAL '5 minutes'),
(2, 102, 'us-west', 'widget-basic', 49.99, 'completed', NOW() - INTERVAL '4 minutes'),
(3, 103, 'eu-west', 'widget-pro', 149.99, 'completed', NOW() - INTERVAL '3 minutes'),
(4, 101, 'us-east', 'gadget-x', 299.99, 'pending', NOW() - INTERVAL '2 minutes'),
(5, 104, 'ap-south','widget-basic', 49.99, 'completed', NOW() - INTERVAL '1 minute'),
(6, 102, 'us-west', 'gadget-x', 299.99, 'completed', NOW());
Query the table the same way:
SELECT
order_id,
customer_id,
region,
amount,
status
FROM pguser_orders
WHERE status = 'completed'
ORDER BY amount DESC
LIMIT 5;
order_id | customer_id | region | amount | status
----------+-------------+---------+--------+-----------
6 | 102 | us-west | 299.99 | completed
3 | 103 | eu-west | 149.99 | completed
1 | 101 | us-east | 149.99 | completed
2 | 102 | us-west | 49.99 | completed
5 | 104 | ap-south| 49.99 | completed
(5 rows)
Everything you already know works. The difference comes when you want to process data as it arrives.
Materialized Views Replace Streaming Jobs
In PostgreSQL, a materialized view is a snapshot of a query result that you refresh manually with REFRESH MATERIALIZED VIEW. In RisingWave, a materialized view is continuously and incrementally maintained as new data arrives. There is no refresh step - the results are always current.
A materialized view is a precomputed query result that RisingWave updates incrementally as upstream data changes, without recomputing from scratch.
To compute revenue by region in real time:
CREATE MATERIALIZED VIEW pguser_revenue_by_region AS
SELECT
region,
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
ROUND(AVG(amount),2) AS avg_order_value
FROM pguser_orders
GROUP BY region;
Query it immediately:
SELECT * FROM pguser_revenue_by_region ORDER BY total_revenue DESC;
region | total_orders | total_revenue | avg_order_value
----------+--------------+---------------+-----------------
us-east | 2 | 449.98 | 224.99
us-west | 2 | 349.98 | 174.99
eu-west | 1 | 149.99 | 149.99
ap-south | 1 | 49.99 | 49.99
(4 rows)
Now insert two more orders:
INSERT INTO pguser_orders VALUES
(7, 105, 'us-east', 'gadget-x', 299.99, 'completed', NOW()),
(8, 106, 'eu-west', 'gadget-x', 299.99, 'completed', NOW());
Query the view again without any refresh:
SELECT * FROM pguser_revenue_by_region ORDER BY total_revenue DESC;
region | total_orders | total_revenue | avg_order_value
----------+--------------+---------------+-----------------
us-east | 3 | 749.97 | 249.99
eu-west | 2 | 449.98 | 224.99
us-west | 2 | 349.98 | 174.99
ap-south | 1 | 49.99 | 49.99
(4 rows)
The us-east row updated from 2 orders / $449.98 to 3 orders / $749.97. The eu-west row updated from 1 order / $149.99 to 2 orders / $449.98. RisingWave applied the new rows to the existing aggregation incrementally - it did not recompute from all 8 rows each time.
For more on how incremental maintenance works under the hood, see Incremental Materialized Views Explained.
Cascading Materialized Views Replace Multi-Stage Pipelines
In Flink, a multi-stage pipeline is a chain of operators: source, filter, keyBy, window, aggregate, sink. Each stage is compiled and linked at job submission time.
In RisingWave, you build multi-stage pipelines by creating materialized views that read from other materialized views. Consider adding a per-product breakdown on top of the regional revenue view:
CREATE MATERIALIZED VIEW pguser_order_stats AS
SELECT
product,
status,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM pguser_orders
GROUP BY product, status;
SELECT * FROM pguser_order_stats ORDER BY product, status;
product | status | order_count | total_amount
--------------+-----------+-------------+--------------
gadget-x | completed | 3 | 899.97
gadget-x | pending | 1 | 299.99
widget-basic | completed | 2 | 99.98
widget-pro | completed | 2 | 299.98
(4 rows)
Then build a summary view on top of that:
CREATE MATERIALIZED VIEW pguser_top_products AS
SELECT
product,
SUM(total_amount) AS revenue
FROM pguser_order_stats
GROUP BY product
ORDER BY revenue DESC;
SELECT * FROM pguser_top_products ORDER BY revenue DESC;
product | revenue
--------------+----------
gadget-x | 1199.96
widget-pro | 299.98
widget-basic | 99.98
(3 rows)
Each materialized view is independently queryable and independently maintained. This is exactly how you would structure a multi-step analytics pipeline in PostgreSQL - staging tables and views - except every layer stays current automatically.
Time-Windowed Aggregations: Streaming-Specific SQL
Stream processing adds one concept that PostgreSQL does not have: time-windowed aggregations over unbounded data. RisingWave extends standard SQL with TUMBLE, HOP, and SESSION window functions. The syntax is close to what you would write for a date_trunc group-by in PostgreSQL, but it handles streaming semantics automatically.
A tumbling window divides a stream into non-overlapping, fixed-size time buckets. To count orders and revenue per minute per region:
CREATE MATERIALIZED VIEW pguser_orders_per_minute AS
SELECT
window_start,
window_end,
region,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue
FROM TUMBLE(pguser_orders, created_at, INTERVAL '1 MINUTE')
GROUP BY window_start, window_end, region;
SELECT window_start, window_end, region, order_count, total_revenue
FROM pguser_orders_per_minute
ORDER BY window_start, region;
window_start | window_end | region | order_count | total_revenue
---------------------------+---------------------------+----------+-------------+---------------
2026-04-02 07:22:00+00:00 | 2026-04-02 07:23:00+00:00 | us-east | 1 | 149.99
2026-04-02 07:23:00+00:00 | 2026-04-02 07:24:00+00:00 | us-west | 1 | 49.99
2026-04-02 07:24:00+00:00 | 2026-04-02 07:25:00+00:00 | eu-west | 1 | 149.99
2026-04-02 07:25:00+00:00 | 2026-04-02 07:26:00+00:00 | us-east | 1 | 299.99
2026-04-02 07:26:00+00:00 | 2026-04-02 07:27:00+00:00 | ap-south | 1 | 49.99
2026-04-02 07:27:00+00:00 | 2026-04-02 07:28:00+00:00 | us-west | 1 | 299.99
2026-04-02 07:31:00+00:00 | 2026-04-02 07:32:00+00:00 | eu-west | 1 | 299.99
2026-04-02 07:31:00+00:00 | 2026-04-02 07:32:00+00:00 | us-east | 1 | 299.99
(8 rows)
Compare this to the Flink SQL equivalent:
-- Flink SQL - requires a running Flink cluster and catalog
CREATE TABLE flink_orders (
order_id BIGINT,
region STRING,
amount DOUBLE,
created_at TIMESTAMP(3),
WATERMARK FOR created_at AS created_at - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'orders',
'properties.bootstrap.servers' = 'broker:9092',
'format' = 'json'
);
-- Then the aggregation:
SELECT
TUMBLE_START(created_at, INTERVAL '1' MINUTE) AS window_start,
TUMBLE_END(created_at, INTERVAL '1' MINUTE) AS window_end,
region,
COUNT(*),
SUM(amount)
FROM TABLE(TUMBLE(TABLE flink_orders, DESCRIPTOR(created_at), INTERVAL '1' MINUTE))
GROUP BY window_start, window_end, region;
The RisingWave version is shorter and reads more like standard SQL. More importantly, the RisingWave version runs against a system you already know how to connect to and operate. For a detailed side-by-side comparison of windowing syntax and other SQL features, see Flink SQL vs RisingWave SQL: Syntax and Feature Comparison.
Connecting to Real Streaming Sources
In production, you typically ingest from Kafka rather than direct inserts. In RisingWave, a Kafka source looks like a table with a WITH clause:
-- Connect to a Kafka topic as a streaming source
CREATE TABLE orders_stream (
order_id BIGINT,
customer_id BIGINT,
region VARCHAR,
product VARCHAR,
amount NUMERIC,
status VARCHAR,
created_at TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'orders',
properties.bootstrap.servers = 'broker:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
After that, everything is the same SQL you already wrote. You CREATE MATERIALIZED VIEW on top of orders_stream the same way you would on a regular table. The streaming source is an implementation detail; your analytics logic is just SQL.
RisingWave also supports direct CDC ingestion from PostgreSQL using logical replication, without requiring Kafka or Debezium as middleware. If your data starts in PostgreSQL, you can stream changes directly into RisingWave. See PostgreSQL CDC to Streaming SQL: A Complete Tutorial for step-by-step instructions.
What You Do Not Need With RisingWave
Here is a concrete list of what Flink requires and RisingWave removes:
| Capability | Apache Flink | RisingWave |
| Language | Java / Scala (primary), Python / SQL (secondary) | SQL only |
| Cluster setup | JobManager + TaskManagers + ZooKeeper | Single deployment or managed cloud |
| State backend | RocksDB configuration, checkpoint tuning | Automatic (S3-backed via Hummock) |
| Driver / client | Flink CLI, REST API, or Java code | Any PostgreSQL client (psql, JDBC, psycopg2) |
| Mental model | DAG of streaming operators | Tables and materialized views |
| Serving layer | External database (e.g., PostgreSQL, Redis) | Built-in - query results directly |
| Recovery | Checkpoint restore (can take minutes) | Automatic, sub-second |
| Code deployment | JAR packaging, cluster submission | CREATE MATERIALIZED VIEW statement |
The serving layer difference is particularly important for PostgreSQL developers. In a Flink pipeline, the aggregation results go to a sink - usually an external database like Redis or PostgreSQL. You maintain two systems: Flink for processing and a database for serving. With RisingWave, the processing and serving happen in the same system. You query materialized views directly, exactly like you query tables in PostgreSQL.
When Flink Still Makes Sense
RisingWave is not a drop-in replacement for every Flink use case. Flink has genuine advantages in specific scenarios:
- Custom Java operators - If your pipeline requires logic that cannot be expressed in SQL, Flink's DataStream API gives you full programmatic control over state, timers, and side outputs.
- Complex event processing (CEP) - Flink's
MATCH_RECOGNIZEclause enables pattern detection across event sequences that has no equivalent in standard SQL. - Extreme throughput at scale - At millions of events per second with very large state, Flink's mature operator model and checkpointing have a longer track record in production.
- Existing Java/Scala investment - If your team has years of Flink expertise and established tooling, rewriting in SQL has a cost that may not be justified.
For teams whose primary expertise is SQL and database operations, and whose use cases involve aggregations, joins, CDC, and time-windowed analytics, RisingWave covers the majority of real-world stream processing workloads with significantly less operational overhead. Read 5 Reasons Teams Switch from Flink to Streaming SQL for concrete migration stories from teams that made this transition.
Architecture: How RisingWave Handles the Hard Parts
When PostgreSQL developers ask "but how does it stay consistent?" or "what happens when a node fails?", the answer is similar to how PostgreSQL handles these problems - through a well-designed internal architecture, not through user configuration.
flowchart LR
A[psql / JDBC / psycopg2] -->|PostgreSQL protocol| B[RisingWave Frontend]
B --> C[Compute Nodes]
B --> D[Meta Service]
C -->|streaming operators| E[Hummock - LSM-tree]
E -->|object storage| F[S3 / GCS / MinIO]
G[Compactor Nodes] -->|background compaction| E
- Frontend accepts SQL over the PostgreSQL wire protocol and generates query plans
- Compute nodes run streaming operators and serve batch queries
- Meta service coordinates checkpointing, scheduling, and failure recovery
- Hummock is a purpose-built LSM-tree storage engine that writes all state to object storage (S3, GCS, or MinIO)
- Compactor nodes handle background storage compaction without blocking query serving
Because all state lives in object storage, node failures are non-catastrophic. A replacement node starts from the last checkpoint in S3, and recovery happens automatically without operator intervention. There is no RocksDB to tune, no disk capacity to plan, and no checkpoint interval to configure. The defaults work for most workloads.
Getting Started
If you have PostgreSQL installed, you can run RisingWave locally in one command:
brew install risingwave
risingwave single-node
Then connect exactly as you would to PostgreSQL:
psql -h localhost -p 4566 -U root -d dev
From that point, every CREATE TABLE, INSERT, SELECT, and CREATE MATERIALIZED VIEW statement in this post runs without modification.
For a managed deployment without any local setup, RisingWave Cloud provides a free tier that includes all features covered in this post.
FAQ
What is stream processing for PostgreSQL users?
Stream processing for PostgreSQL users is real-time data processing using familiar SQL syntax and tools. RisingWave allows PostgreSQL developers to build streaming pipelines with the same SQL knowledge they already have, connecting via the PostgreSQL wire protocol and using CREATE MATERIALIZED VIEW instead of Flink jobs.
Why is Apache Flink difficult for PostgreSQL developers?
Apache Flink requires Java or Scala for its primary API, a separate cluster infrastructure (JobManager, TaskManagers, ZooKeeper), and a mental model based on DAGs of streaming operators - none of which maps to PostgreSQL concepts. Flink SQL helps but still requires the cluster infrastructure underneath.
How does RisingWave's materialized view differ from PostgreSQL's materialized view?
In PostgreSQL, a materialized view stores a snapshot that you refresh manually with REFRESH MATERIALIZED VIEW. In RisingWave, a materialized view is incrementally maintained: as new rows arrive or existing rows change, RisingWave applies the delta to the precomputed result without recomputing from scratch. The view is always current with no refresh step required.
Can RisingWave connect to the same Kafka topics Flink uses?
Yes. RisingWave connects to Kafka using a CREATE TABLE statement with connector = 'kafka'. You can point it at any Kafka topic your Flink jobs currently consume, without changing producers or topic configuration. RisingWave also supports Kinesis, Pulsar, and direct PostgreSQL CDC as streaming sources.
Ready to try this yourself? Try RisingWave Cloud free, no credit card required. Sign up →
Join our Slack community to ask questions and connect with other stream processing developers.

