Kafka → Iceberg

Kafka to Iceberg — Stream, Transform, and Sink with SQL

Stream Kafka topics into Apache Iceberg tables with SQL transformations. Replace Kafka Connect + Spark with a single SQL-based streaming pipeline.

3→1
Systems Consolidated
Replace Kafka Connect + Spark + Airflow with a single SQL-based streaming pipeline
SQL
Replaces Java
Write streaming transforms in SQL instead of Java DataStream API or Connect configs
Exactly-Once
Built-In Guarantee
Coordinated checkpointing ensures no duplicates without complex multi-system configuration
Transform
Filter, Join, Aggregate
Clean, enrich, and reshape Kafka data with full SQL before it reaches Iceberg

The Problem

Why is Kafka Connect not enough for production Iceberg pipelines?

Kafka Connect sinks data to Iceberg but cannot transform it. Production pipelines need filtering, joins, aggregations, and schema mapping — which forces teams to bolt on Spark or Flink jobs, creating a fragile multi-tool architecture. Each additional tool adds latency, operational burden, and failure modes.

  • Kafka Connect has no transformation capabilities — it is a pure data mover
  • Adding Spark for transforms creates a batch layer that increases latency to hours
  • Multi-tool pipelines (Connect + Spark + Airflow) triple the operational surface area
  • Schema mismatches between Kafka and Iceberg require custom glue code
  • Exactly-once semantics across multiple tools are extremely difficult to guarantee

The Solution

How does RisingWave simplify the Kafka-to-Iceberg pipeline?

RisingWave replaces the entire Kafka Connect + Spark + Airflow stack with a single SQL statement. You define a source (Kafka topic), write SQL transformations, and create an Iceberg sink — all in one system. RisingWave handles state management, exactly-once delivery, and schema evolution automatically.

SQL Transforms

Filter, join, aggregate, and reshape Kafka data using standard SQL before it reaches Iceberg

Schema Evolution

Automatically handle schema changes from Kafka Schema Registry without pipeline restarts

Partitioning

Leverage Iceberg hidden partitioning with time-based and bucket transforms for optimized queries

Compaction-Friendly

Write data in optimally-sized Parquet files that minimize Iceberg compaction overhead

Architecture

What does a production Kafka-to-Iceberg architecture look like?

A production architecture typically involves Kafka Connect, Flink, or RisingWave to move and transform data. Kafka Connect requires companion tools for any transformation. Flink offers full power but demands Java expertise. RisingWave provides the same capabilities through SQL with significantly lower operational complexity.

FactorKafka ConnectApache FlinkRisingWave
LanguageConfig (JSON)Java / ScalaSQL
TransformsNone (SMTs are limited)Full (DataStream API)Full (SQL joins, aggs, UDFs)
State ManagementN/AManual (RocksDB)Automatic
Exactly-OncePartial (sink-dependent)Yes (complex config)Yes (built-in)
Ops ComplexityLow (no transforms)High (JVM tuning, checkpoints)Low (SQL-only)

Frequently Asked Questions

Can RisingWave filter and transform Kafka data before sinking to Iceberg?
Does RisingWave support Kafka Schema Registry?
How does partitioning work for Iceberg sinks?
Can I join multiple Kafka topics before sinking?

Ready to stream Kafka to Iceberg?

Replace your multi-tool pipeline with a single SQL statement.

Start Kafka to Iceberg Pipeline
Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.