Kafka Stream Processing with SQL: Beyond ksqlDB (2026)

Kafka Stream Processing with SQL: Beyond ksqlDB (2026)

Kafka Stream Processing with SQL: Beyond ksqlDB (2026)

You can process Apache Kafka data with SQL using ksqlDB, Apache Flink SQL, or RisingWave. While ksqlDB is the default choice for Kafka-centric teams, it has significant limitations: non-standard SQL, no data shuffling, tight Kafka coupling, and restrictive licensing. RisingWave offers a more capable alternative — PostgreSQL-compatible SQL, native CDC support beyond Kafka, managed state on S3, and open-source Apache 2.0 licensing.

This guide compares all three SQL-over-Kafka options and shows how to process Kafka streams with standard SQL.

Why SQL Over Kafka?

Apache Kafka is the dominant event streaming platform, but Kafka itself is not a processing engine. It stores and transports events. To transform, aggregate, join, and analyze Kafka data, you need a processing layer.

Traditionally, this meant writing Java applications using Kafka Streams or Apache Flink. SQL-based alternatives emerged because most data teams know SQL, not Java, and SQL dramatically reduces the time from concept to production streaming pipeline.

Three Ways to Process Kafka with SQL

ksqlDB: Built for Kafka, Limited to Kafka

ksqlDB is Confluent's SQL engine built on Kafka Streams. It speaks KSQL (a non-standard SQL dialect) and processes data from Kafka topics.

Strengths:

  • Purpose-built for Kafka — tight integration
  • Pull and push queries
  • Managed option on Confluent Cloud

Limitations:

  • Non-standard SQL. KSQL is not PostgreSQL or ANSI SQL. Standard database tools don't connect.
  • No data shuffling. Tasks run on a single Kafka partition without redistribution. Complex joins and aggregations require manual repartition topics.
  • Kafka-only. All data must flow through Kafka. No direct database CDC.
  • Resource-heavy state. Every state change writes a Kafka changelog topic, consuming several times more resources than alternatives.
  • Scaling constraints. Maximum 40 persistent queries per Confluent Cloud cluster. No dynamic scaling.
  • No UDFs on Cloud. User-defined functions are unavailable on Confluent Cloud.
  • Licensing. Confluent Community License restricts offering ksqlDB as a competing service.

Apache Flink SQL provides a standards-based SQL interface over Flink's distributed stream processing engine.

Strengths:

  • Rich SQL feature set with MATCH_RECOGNIZE for complex event processing
  • Broad connector ecosystem
  • Apache 2.0 open source

Limitations:

  • Requires a Flink cluster. Deploying and managing Flink is a significant operational burden.
  • State management complexity. RocksDB tuning, checkpoint configuration, savepoint management.
  • Fragile upgrades. Changing a SQL query can break savepoint compatibility.
  • No built-in serving. Results must be sunk to an external database for querying.

RisingWave: PostgreSQL-Compatible Streaming Database

RisingWave connects to Kafka as a data source and processes streams using PostgreSQL-compatible SQL.

Strengths:

  • PostgreSQL compatibility. Connect with psql, use any PostgreSQL driver.
  • Beyond Kafka. Also ingests directly from PostgreSQL CDC, MySQL CDC, S3, and other sources.
  • Managed state on S3. No RocksDB tuning. Elastic scaling. Fast recovery.
  • Built-in serving. Query materialized views directly — no downstream database needed.
  • Apache 2.0 license. No restrictions.

Processing Kafka Data with RisingWave

Connect to Kafka

CREATE SOURCE user_events (
  user_id INT,
  event_type VARCHAR,
  page_url VARCHAR,
  event_time TIMESTAMP WITH TIME ZONE
) WITH (
  connector = 'kafka',
  topic = 'user-events',
  properties.bootstrap.server = 'kafka:9092',
  scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Real-Time Aggregations

CREATE MATERIALIZED VIEW events_per_minute AS
SELECT
  window_start,
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users
FROM TUMBLE(user_events, event_time, INTERVAL '1 MINUTE')
GROUP BY window_start, event_type;

Stream Joins

-- Join Kafka events with CDC data from PostgreSQL
CREATE MATERIALIZED VIEW enriched_events AS
SELECT
  e.user_id,
  e.event_type,
  e.page_url,
  u.name,
  u.subscription_plan,
  u.country
FROM user_events e
JOIN users u ON e.user_id = u.user_id;

Sink Back to Kafka

CREATE SINK alerts_to_kafka AS
SELECT user_id, event_type, event_time
FROM user_events
WHERE event_type = 'error'
WITH (
  connector = 'kafka',
  topic = 'error-alerts',
  properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Head-to-Head Comparison

FeatureRisingWaveksqlDBFlink SQL
SQL dialectPostgreSQLKSQL (non-standard)Flink SQL (ANSI-like)
Connect withpsql, any PG driverksqlDB CLI, RESTFlink SQL Client
Beyond KafkaYes (CDC, S3, etc.)No (Kafka only)Yes (many connectors)
State storageS3 (managed)RocksDB + KafkaRocksDB or Heap
Built-in servingYes (PG protocol)Yes (pull queries)No
Complex joinsMulti-stream, efficientManual repartition neededYes (state-heavy)
UDFsPython, Java, RustJava (not on Cloud)Java, Python
Max queriesUnlimited40 per cluster (Cloud)Unlimited
LicenseApache 2.0Confluent CommunityApache 2.0
Cluster managementMinimalMediumHigh

When to Choose What

Choose RisingWave if:

  • You want PostgreSQL-compatible SQL over Kafka
  • You also need CDC from databases — not just Kafka
  • You want built-in serving without a downstream database
  • Open-source licensing matters

Choose ksqlDB if:

  • You are fully committed to Confluent Cloud
  • Your needs are simple: filters, basic aggregations, straightforward joins
  • You want tight Kafka ecosystem integration above all else

Choose Flink SQL if:

  • You need MATCH_RECOGNIZE for complex event processing
  • You have an existing Flink platform team
  • You need the broadest connector ecosystem

Frequently Asked Questions

Can I process Kafka data without Java?

Yes. RisingWave, ksqlDB, and Flink SQL all provide SQL interfaces for processing Kafka data without writing Java code. RisingWave uses PostgreSQL-compatible SQL, ksqlDB uses its own KSQL dialect, and Flink SQL uses an ANSI-like SQL dialect.

Is ksqlDB open source?

ksqlDB uses the Confluent Community License, which is not a true open-source license. It restricts you from offering ksqlDB as a competing SaaS service. RisingWave (Apache 2.0) and Flink SQL (Apache 2.0) are fully open source with no usage restrictions.

Can RisingWave replace ksqlDB?

For most use cases, yes. RisingWave processes Kafka streams with PostgreSQL-compatible SQL, supports complex multi-stream joins without manual repartitioning, offers built-in serving, and has no query limits. The main scenario where ksqlDB is preferable is when you need deep Confluent Cloud integration.

Do I need Kafka to use RisingWave?

No. While RisingWave integrates seamlessly with Kafka, it can also ingest directly from PostgreSQL CDC, MySQL CDC, Pulsar, Kinesis, S3, and other sources. This makes RisingWave more flexible than Kafka-only solutions like ksqlDB.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.