Kafka → Iceberg

Kafka to Iceberg — Stream, Transform, and Sink with SQL

Stream Kafka topics into Apache Iceberg tables with SQL transformations. Replace Kafka Connect + Spark with a single SQL-based streaming pipeline.

Try RisingWave Free Iceberg Streaming

3→1

Systems Consolidated

Replace Kafka Connect + Spark + Airflow with a single SQL-based streaming pipeline

SQL

Replaces Java

Write streaming transforms in SQL instead of Java DataStream API or Connect configs

Exactly-Once

Built-In Guarantee

Coordinated checkpointing ensures no duplicates without complex multi-system configuration

Transform

Filter, Join, Aggregate

Clean, enrich, and reshape Kafka data with full SQL before it reaches Iceberg

The Problem

Why is Kafka Connect not enough for production Iceberg pipelines?

Kafka Connect sinks data to Iceberg but cannot transform it. Production pipelines need filtering, joins, aggregations, and schema mapping — which forces teams to bolt on Spark or Flink jobs, creating a fragile multi-tool architecture. Each additional tool adds latency, operational burden, and failure modes.

→Kafka Connect has no transformation capabilities — it is a pure data mover
→Adding Spark for transforms creates a batch layer that increases latency to hours
→Multi-tool pipelines (Connect + Spark + Airflow) triple the operational surface area
→Schema mismatches between Kafka and Iceberg require custom glue code
→Exactly-once semantics across multiple tools are extremely difficult to guarantee

The Solution

How does RisingWave simplify the Kafka-to-Iceberg pipeline?

RisingWave replaces the entire Kafka Connect + Spark + Airflow stack with a single SQL statement. You define a source (Kafka topic), write SQL transformations, and create an Iceberg sink — all in one system. RisingWave handles state management, exactly-once delivery, and schema evolution automatically.

SQL Transforms

Filter, join, aggregate, and reshape Kafka data using standard SQL before it reaches Iceberg

Schema Evolution

Automatically handle schema changes from Kafka Schema Registry without pipeline restarts

Partitioning

Leverage Iceberg hidden partitioning with time-based and bucket transforms for optimized queries

Compaction-Friendly

Write data in optimally-sized Parquet files that minimize Iceberg compaction overhead

Architecture

What does a production Kafka-to-Iceberg architecture look like?

A production architecture typically involves Kafka Connect, Flink, or RisingWave to move and transform data. Kafka Connect requires companion tools for any transformation. Flink offers full power but demands Java expertise. RisingWave provides the same capabilities through SQL with significantly lower operational complexity.

Factor	Kafka Connect	Apache Flink	RisingWave
Language	Config (JSON)	Java / Scala	SQL
Transforms	None (SMTs are limited)	Full (DataStream API)	Full (SQL joins, aggs, UDFs)
State Management	N/A	Manual (RocksDB)	Automatic
Exactly-Once	Partial (sink-dependent)	Yes (complex config)	Yes (built-in)
Ops Complexity	Low (no transforms)	High (JVM tuning, checkpoints)	Low (SQL-only)

Frequently Asked Questions

Can RisingWave filter and transform Kafka data before sinking to Iceberg?

Does RisingWave support Kafka Schema Registry?

How does partitioning work for Iceberg sinks?

Can I join multiple Kafka topics before sinking?

Ready to stream Kafka to Iceberg?

Replace your multi-tool pipeline with a single SQL statement.

Start Kafka to Iceberg Pipeline

Iceberg Streaming →Iceberg Materialized Views →