If you have ever stood up an Apache Flink cluster just to run a few aggregation jobs on Kafka topics, you know the pain. There is the JVM tuning, the cluster manager, the serialization boilerplate, and the multi-hundred-line Java class that does little more than count orders per minute. For teams whose core expertise is SQL and data modeling rather than distributed Java systems, Flink's power comes with a steep operational tax.
The good news: you do not need Flink or Java to process Kafka streams anymore. A streaming database like RisingWave lets you connect to Kafka, define transformations, and serve real-time results, all with standard PostgreSQL-compatible SQL. No JVM. No cluster manager. No custom serialization code.
In this article, you will see a concrete side-by-side comparison of Flink Java versus RisingWave SQL for a realistic Kafka processing pipeline. You will also walk through verified SQL examples, learn how materialized views replace stateful operators, and understand when this approach makes sense for your team.
The Flink Java Tax: What You Are Really Paying For
Apache Flink is a powerful, battle-tested stream processing framework. It handles complex event processing, exactly-once semantics, and massive scale. But for a large category of Kafka processing workloads, teams pay a disproportionate cost in operational complexity.
Here is what a typical Flink-based Kafka pipeline requires:
- JVM infrastructure - A running Flink cluster (JobManager + TaskManagers), either on Kubernetes, YARN, or standalone.
- Java/Scala codebase - Even a simple aggregation requires a full Maven or Gradle project, dependency management, and a build pipeline.
- Serialization boilerplate - Custom deserializers, POJO classes, and type information for every Kafka topic schema.
- State management configuration - RocksDB tuning, checkpoint intervals, savepoint strategies.
- Deployment pipeline - JAR packaging, cluster submission, rolling upgrades for every logic change.
For teams that primarily work in SQL (analytics engineers, data engineers, BI developers), this stack is a significant barrier. Changing a single aggregation window from 1 minute to 5 minutes means modifying Java code, rebuilding the JAR, and redeploying to the cluster.
Side-by-Side: Flink Java vs. RisingWave SQL
Let us compare the two approaches for a common use case: consuming order events from Kafka and computing revenue per minute by region.
Flink Java Approach
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import com.fasterxml.jackson.databind.ObjectMapper;
public class RevenuePerMinute {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
// 1. Configure Kafka source
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("broker:9092")
.setTopics("order_events")
.setGroupId("revenue-consumer")
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new SimpleStringSchema())
.build();
// 2. Ingest and parse JSON
DataStream<String> raw = env.fromSource(
source,
WatermarkStrategy.forMonotonousTimestamps(),
"Kafka Source"
);
DataStream<OrderEvent> orders = raw.map(json -> {
ObjectMapper mapper = new ObjectMapper();
return mapper.readValue(json, OrderEvent.class);
});
// 3. Key by region, window, aggregate
orders
.keyBy(OrderEvent::getRegion)
.window(TumblingEventTimeWindows.of(Time.minutes(1)))
.aggregate(new RevenueAggregator()) // custom AggregateFunction
.print();
env.execute("Revenue Per Minute");
}
}
// Plus: OrderEvent POJO, RevenueAggregator class,
// pom.xml with ~10 dependencies, build scripts...
This is roughly 40 lines of application code, and it does not include the OrderEvent POJO class, the RevenueAggregator custom aggregate function, the Maven pom.xml with Flink and Kafka connector dependencies, or the deployment configuration. A realistic project clocks in at 150-300 lines across multiple files.
RisingWave SQL Approach
-- 1. Connect to Kafka and define the schema (one statement)
CREATE TABLE order_events (
order_id VARCHAR,
customer_id INT,
product_id VARCHAR,
quantity INT,
unit_price NUMERIC,
event_time TIMESTAMP,
region VARCHAR
) WITH (
connector = 'kafka',
topic = 'order_events',
properties.bootstrap.server = 'broker:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
-- 2. Compute revenue per minute by region (one statement)
CREATE MATERIALIZED VIEW revenue_per_minute AS
SELECT
region,
window_start,
window_end,
SUM(quantity * unit_price) AS total_revenue,
COUNT(*) AS order_count
FROM TUMBLE(order_events, event_time, INTERVAL '1' MINUTE)
GROUP BY region, window_start, window_end;
Two SQL statements. No JVM. No build pipeline. No custom serializer. The CREATE TABLE with the Kafka connector handles ingestion. The CREATE MATERIALIZED VIEW defines the continuous computation. RisingWave incrementally maintains the result as new events arrive from Kafka.
Comparison at a Glance
| Dimension | Flink Java | RisingWave SQL |
| Language | Java/Scala | PostgreSQL-compatible SQL |
| Lines of code | 150-300+ (across files) | 15-20 |
| Kafka connector setup | Maven dependency + config class | WITH (connector='kafka', ...) |
| Schema definition | POJO class + deserializer | Column list in CREATE TABLE |
| Windowed aggregation | keyBy().window().aggregate() | TUMBLE() in a SELECT |
| State management | RocksDB config, checkpoints | Automatic (built-in) |
| Deployment | Build JAR, submit to cluster | Run SQL via psql |
| Change a window size | Edit Java, rebuild, redeploy | ALTER or recreate the view |
| Runtime dependency | JVM + Flink cluster | RisingWave single binary |
How RisingWave Processes Kafka Data Under the Hood
RisingWave is a streaming database that is wire-compatible with PostgreSQL. You connect to it with psql, DBeaver, or any PostgreSQL driver. But unlike a traditional database, it continuously ingests from streaming sources (Kafka, Pulsar, Kinesis, and others) and incrementally maintains materialized views as new data arrives.
Here is the data flow:
graph LR
A[Kafka Topic] -->|connector='kafka'| B[RisingWave Table]
B --> C[Materialized View 1]
B --> D[Materialized View 2]
C --> E[Query via psql / app]
D --> E
When you create a table with the Kafka connector, RisingWave spawns a background consumer that reads from the specified topic. Each new message is parsed according to the FORMAT and ENCODE you specified (JSON, Avro, Protobuf are all supported) and inserted into the table.
Materialized views defined on top of that table are updated incrementally. RisingWave does not re-scan the entire table on every new event. Instead, it uses incremental computation to propagate only the changes through the dataflow graph. This is the same fundamental approach Flink uses (dataflow processing), but expressed entirely in SQL.
What "Incremental" Means in Practice
Suppose your revenue_per_minute view currently shows us-east with $59.98 for the 10:01 window. When a new order arrives for us-east in that same window, RisingWave does not recompute the entire window from scratch. It adds the new order's revenue to the existing sum. This is identical to how Flink's stateful operators work internally, but you never have to think about ValueState, ReducingState, or checkpoint configuration.
Hands-On: Building a Real-Time Order Analytics Pipeline
Let us build a complete pipeline step by step. All SQL below has been verified on RisingWave 2.8.0.
Step 1: Connect to Kafka and Define Your Schema
CREATE TABLE order_events (
order_id VARCHAR,
customer_id INT,
product_id VARCHAR,
quantity INT,
unit_price NUMERIC,
event_time TIMESTAMP,
region VARCHAR
) WITH (
connector = 'kafka',
topic = 'order_events',
properties.bootstrap.server = 'broker:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
This single statement tells RisingWave to continuously consume from the order_events Kafka topic, parse each message as JSON, and map the fields to the specified columns. If your Kafka cluster uses authentication, you can add properties.sasl.mechanism and properties.security.protocol in the WITH clause. RisingWave supports SASL/SCRAM, SSL/TLS, and AWS IAM authentication for Kafka.
Step 2: Create Materialized Views for Real-Time Analytics
With the data flowing in, you can define any number of materialized views. Each one is a continuous query that RisingWave keeps up to date automatically.
Revenue per minute by region:
CREATE MATERIALIZED VIEW revenue_per_minute AS
SELECT
region,
window_start,
window_end,
SUM(quantity * unit_price) AS total_revenue,
COUNT(*) AS order_count
FROM TUMBLE(order_events, event_time, INTERVAL '1' MINUTE)
GROUP BY region, window_start, window_end;
Top products by revenue:
CREATE MATERIALIZED VIEW top_products AS
SELECT
product_id,
SUM(quantity) AS total_units_sold,
SUM(quantity * unit_price) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM order_events
GROUP BY product_id;
High-value order alerts (orders over $200):
CREATE MATERIALIZED VIEW high_value_alerts AS
SELECT
order_id,
customer_id,
product_id,
quantity * unit_price AS order_total,
event_time,
region
FROM order_events
WHERE quantity * unit_price > 200;
Step 3: Enrich Streams With Dimension Data
One of the most common Kafka processing patterns is enriching events with reference data. In Flink, this typically requires an AsyncDataStream or a BroadcastState pattern, both of which add significant code complexity.
In RisingWave, it is a standard SQL JOIN:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR,
tier VARCHAR
);
CREATE MATERIALIZED VIEW enriched_orders AS
SELECT
o.order_id,
c.name AS customer_name,
c.tier AS customer_tier,
o.product_id,
o.quantity * o.unit_price AS order_total,
o.event_time,
o.region
FROM order_events o
JOIN customers c ON o.customer_id = c.customer_id;
The customers table can be populated from a PostgreSQL CDC source, a CSV upload, or manual inserts. The materialized view continuously joins every incoming order event with the customer record. When you query it:
order_id | customer_name | customer_tier | product_id | order_total | event_time | region
----------+---------------+---------------+------------+-------------+---------------------+----------
ORD-1001 | Alice Chen | gold | SKU-A100 | 59.98 | 2026-04-01 10:01:00 | us-east
ORD-1002 | Bob Martinez | silver | SKU-B200 | 149.00 | 2026-04-01 10:01:30 | eu-west
ORD-1003 | Alice Chen | gold | SKU-A100 | 89.97 | 2026-04-01 10:02:15 | us-east
ORD-1004 | Carol Wang | platinum | SKU-C300 | 499.99 | 2026-04-01 10:02:45 | ap-south
ORD-1005 | Bob Martinez | silver | SKU-A100 | 149.95 | 2026-04-01 10:03:10 | eu-west
ORD-1006 | David Kim | bronze | SKU-B200 | 298.00 | 2026-04-01 10:03:55 | us-east
Step 4: Query Results Like a Regular Database
Every materialized view in RisingWave is queryable with standard SELECT statements. You can use psql, connect a BI tool like Grafana or Metabase, or use any PostgreSQL client library in your application code.
-- Which region generated the most revenue in the last window?
SELECT * FROM revenue_per_minute
ORDER BY total_revenue DESC
LIMIT 5;
region | window_start | window_end | total_revenue | order_count
----------+---------------------+---------------------+---------------+-------------
ap-south | 2026-04-01 10:02:00 | 2026-04-01 10:03:00 | 499.99 | 1
us-east | 2026-04-01 10:03:00 | 2026-04-01 10:04:00 | 298.00 | 1
eu-west | 2026-04-01 10:03:00 | 2026-04-01 10:04:00 | 149.95 | 1
eu-west | 2026-04-01 10:01:00 | 2026-04-01 10:02:00 | 149.00 | 1
us-east | 2026-04-01 10:02:00 | 2026-04-01 10:03:00 | 89.97 | 1
-- Top products ranked by revenue
SELECT * FROM top_products ORDER BY total_revenue DESC;
product_id | total_units_sold | total_revenue | unique_customers
------------+------------------+---------------+------------------
SKU-C300 | 1 | 499.99 | 1
SKU-B200 | 3 | 447.00 | 2
SKU-A100 | 10 | 299.90 | 2
These results update continuously as new Kafka messages arrive. There is no batch job to schedule, no refresh button to click.
When to Use SQL-Based Kafka Processing (and When Not To)
SQL-based stream processing with RisingWave is an excellent fit for:
- Real-time analytics dashboards - Aggregate Kafka events into materialized views that power Grafana, Metabase, or custom dashboards.
- Event-driven microservices - Use RisingWave's sink connectors to push processed results to downstream Kafka topics, PostgreSQL, or other systems.
- Data enrichment pipelines - Join streaming events with dimension tables (customer records, product catalogs, geo-IP databases).
- Alerting and anomaly detection - Define threshold-based alerts as materialized views or use
CREATE SINKto push filtered events to notification systems. - Teams with SQL expertise - If your team is primarily data engineers and analysts who work in SQL daily, the learning curve is near zero.
Flink or a custom Java solution may still be the better choice if:
- You need custom ML model inference in the processing pipeline - Flink's DataStream API supports arbitrary Java/Python UDFs. RisingWave supports user-defined functions in Python, Java, and JavaScript, but deeply custom model serving may be more natural in Flink.
- You have complex event processing (CEP) requirements - Flink's CEP library handles multi-event pattern matching natively. RisingWave can express many of these patterns with SQL window functions and self-joins, but Flink's CEP DSL is purpose-built for it.
- You are processing millions of events per second per node - Both systems handle high throughput, but Flink has a longer track record at extreme scale. RisingWave's architecture is optimized for a wide range of streaming workloads, but you should benchmark your specific throughput requirements.
What About ksqlDB?
If you are evaluating SQL alternatives to Flink, ksqlDB likely came up in your research. ksqlDB is Confluent's SQL interface for Kafka Streams. It shares the "SQL over Kafka" concept with RisingWave, but there are important differences:
- Architecture: ksqlDB embeds Kafka Streams under the hood, so it still runs on the JVM and is tightly coupled to the Kafka ecosystem. RisingWave is an independent streaming database with its own storage engine.
- Query capabilities: RisingWave supports full PostgreSQL-compatible SQL, including complex joins, subqueries, and window functions. ksqlDB has a more limited SQL dialect.
- Storage and serving: RisingWave stores materialized view results in its own storage layer, so you can query results with sub-millisecond latency without hitting Kafka. ksqlDB queries go through Kafka's storage.
- Connectors: ksqlDB is Kafka-only. RisingWave supports Kafka, Pulsar, Kinesis, PostgreSQL CDC, MySQL CDC, S3, and more.
- License: ksqlDB uses the Confluent Community License, which has usage restrictions. RisingWave is Apache 2.0 licensed.
For a deeper comparison, see our comprehensive guide to ksqlDB.
How Does RisingWave Handle Kafka Consumer Groups and Offsets?
RisingWave manages Kafka consumer offsets automatically. When you create a table with the Kafka connector, RisingWave creates a consumer group and tracks offsets as part of its internal checkpointing mechanism. If RisingWave restarts, it resumes from the last committed offset, providing exactly-once processing semantics for the data it has ingested.
You do not need to manually configure group.id, manage offset commits, or implement custom checkpoint logic. This is all handled internally, similar to how Flink manages checkpoints, but without requiring you to configure checkpoint intervals, state backends, or savepoint directories.
You can control the starting position with scan.startup.mode:
earliest- Start from the beginning of the topiclatest- Start from the most recent offsettimestamp- Start from a specific timestamp
Can RisingWave Replace Flink for All Kafka Processing Use Cases?
RisingWave can replace Flink for a large category of Kafka processing workloads, particularly those that center on SQL-expressible transformations: aggregations, joins, filters, windowed computations, and data enrichment. For many data engineering teams, this covers 80-90% of their Kafka processing needs.
Where Flink retains an advantage is in use cases that require arbitrary imperative code in the processing pipeline (custom ML model inference, complex event processing with Flink CEP, or highly custom stateful operators written in Java). Even in these scenarios, some teams adopt a hybrid approach: use RisingWave for the SQL-heavy parts of the pipeline and Flink for the parts that genuinely need imperative code.
How Do You Get Started Processing Kafka Data With RisingWave?
Getting started takes minutes, not days. Here is the fastest path:
Install RisingWave - A single binary, no JVM required:
# macOS brew install risingwave # Or use Docker docker run -it --pull=always -p 4566:4566 risingwavelabs/risingwave:latest single_nodeConnect with psql - Any PostgreSQL client works:
psql -h localhost -p 4566 -U root -d devCreate a Kafka source - Point RisingWave at your Kafka topic:
CREATE TABLE my_events ( ... ) WITH (connector='kafka', topic='...', properties.bootstrap.server='...') FORMAT PLAIN ENCODE JSON;Define your processing logic - Create materialized views:
CREATE MATERIALIZED VIEW my_aggregation AS SELECT ... FROM my_events GROUP BY ...;Query the results - Directly with SQL, or connect Grafana, Metabase, or your application.
For a complete setup guide, see the RisingWave quickstart.
Conclusion
Processing Kafka streams does not require a Flink cluster, a Java codebase, or a team of JVM specialists. For the many workloads that are fundamentally SQL-expressible (aggregations, joins, filters, windowed computations, enrichment), a streaming database like RisingWave offers a dramatically simpler alternative.
Key takeaways:
- Two SQL statements replace hundreds of lines of Java - A
CREATE TABLEwith the Kafka connector and aCREATE MATERIALIZED VIEWis all you need to go from raw Kafka topic to continuously updated analytics. - No JVM, no cluster manager - RisingWave runs as a single binary or a simple Docker container. Connect with
psql. - Full PostgreSQL compatibility - Use your existing SQL skills, BI tools, and PostgreSQL client libraries.
- Incremental computation built in - Materialized views update automatically as new Kafka events arrive, with no manual state management.
- Not a toy - RisingWave supports exactly-once semantics, complex joins, window functions, and sinks to downstream systems.
Ready to try this yourself? Get started with RisingWave in 5 minutes. Quickstart ->
Join our Slack community to ask questions and connect with other stream processing developers.

