Kafka Has Reached a Turning Point

Current 2024—the premier conference on Kafka and data streaming—wrapped up last week in Austin. It was my third time attending, and as always, the venue was packed with experts from across the globe. But this year felt different—there was a palpable sense of uncertainty surrounding Kafka’s future. Where is this project headed? How will the ecosystem evolve, and what role will Kafka play in the rapidly evolving AI landscape? These questions were top of mind as I reflected on Kafka's future in a much more complex environment.

Kafka Is No Longer What It Was 10 Years Ago

Created at LinkedIn about 13 years ago, Kafka was initially positioned as a distributed, persistent, high-throughput messaging system designed to collect and deliver massive volumes of data with low latency. It was open-sourced in 2011 and later donated to the Apache Foundation, evolving into one of the most successful projects in modern software. In 2014, its creators left LinkedIn to found Confluent, which went public in 2021 and has become a leader in data streaming. However, the world has changed drastically since Kafka’s early days, and Kafka itself is navigating a much more complex ecosystem today.

!(https://risingwave9.wpcomstaging.com/wp-content/uploads/2024/09/image-4.png)

Back in 2011, Kafka was primarily used for three things:

Collecting log data
Feeding log data to online applications
Deploying derived data into production environments

Kafka’s log-structured design made perfect sense in this context. But while its core use cases remain similar today, the surrounding infrastructure, applications, and data stacks have evolved rapidly. This evolution has created new challenges and opportunities for Kafka as it faces new demands from modern data architectures.

!(https://risingwave9.wpcomstaging.com/wp-content/uploads/2024/09/image-5.png)

A Few Key Shifts

Diverse latency requirements: The latency expectations for modern systems have become more polarized. While financial services demand microsecond-level latency for stock trading, other use cases—such as logging or syncing data between operational databases and analytical systems—are fine with second-level latency. A one-size-fits-all solution doesn’t work anymore. Why should a company using Kafka for simple logging pay the same costs as one building mission-critical low-latency applications?
Batch systems are building their own ingestion tools: Platforms like Snowflake with Snowpipe, Amazon Redshift with its noETL tool and ClickHouse, which recently acquired PeerDB, now offer built-in streaming data ingestion. These developments reduce the need for Kafka as the go-to system for moving data between environments. Kafka is no longer the only option for feeding data into analytical systems, leading to natural fragmentation in its traditional use cases.
Cloud infrastructure has made storage cheaper: Object storage solutions like Amazon S3 have become significantly more affordable than compute nodes such as EC2. This makes it increasingly hard to justify using more expensive storage options, especially in a world where companies are constantly optimizing their cloud costs. As a result, Kafka needs to embrace architectures that take advantage of cheaper storage options or risk becoming an overly expensive component in data pipelines.

Given these shifts, Kafka can no longer rely on its original architecture. It needs to evolve and adapt to the demands of the modern data ecosystem to remain relevant in an increasingly cloud-first world.

Kafka Will Inevitably Become 10x Cheaper

One of the most exciting announcements at Current 2024 was the acquisition of WarpStream, a Kafka-compatible system that reduces the cost of running Kafka by a staggering 10x. The concept is both simple and elegant: WarpStream uses S3 as the primary storage, eliminating the need for local disks and significantly reducing network transfer fees. By offloading the storage layer to cheaper, cloud-based object stores, WarpStream makes Kafka much more affordable.

The strategic goal behind acquiring WarpStream is clear—absorb a rising competitor before it disrupts the market and forces everyone’s margins lower. By integrating WarpStream’s cost-effective architecture into the broader Kafka ecosystem, Confluent can maintain its leadership position without being undercut by cheaper alternatives.

However, it’s not just WarpStream making Kafka more affordable. Here’s a look at other companies pushing Kafka costs down:

Company	Description
Aiven	Provides open-source Kafka hosting with tiered storage, ensuring scalability and flexibility.
Buf	Specializes in Protobuf-first Kafka and APIs, enhancing communication and data serialization.
Confluent	Delivers managed Kafka services, focusing on cost efficiency and enterprise-level scalability.
NATS	A lightweight, secure, and high-performance data layer for cloud-native apps, IoT, and microservices.
Redpanda	A cost-effective, Kafka-compatible alternative built for simplicity and high performance.
StreamNative	A Kafka-compatible platform designed for scalability, performance, and optimized cost efficiency.
WarpStream	Uses S3-based storage to provide a Kafka-compatible system that drastically reduces expenses.

Running Kafka will inevitably become 10x cheaper, especially for users who only need Kafka for logging or system decoupling. For these users, adopting an architecture where S3 serves as the primary storage layer is a clear win, cutting costs by eliminating the need for expensive local disks.

Kafka Has to Enter the Batch World

Kafka has always been the go-to tool for real-time streaming, but today, it’s clear that Kafka must expand its role to include batch processing. The opportunity is huge, but there’s also a threat—data warehouses and analytical databases are increasingly building their own ingestion tools, which means users don’t necessarily need Kafka for moving data.

Data Warehouse / Analytical System	Ingestion Tool	Description
Amazon Redshift	noETL	Simplifies streaming data ingestion directly into Redshift without needing Kafka.
ClickHouse	PeerDB	Allows direct ingestion from operational databases, bypassing Kafka.
Snowflake	Snowpipe	Enables real-time data ingestion, reducing reliance on Kafka for moving data.

If Kafka doesn’t evolve to include batch processing, its importance in the data ecosystem may gradually decline.

As data architectures shift toward storing cold data on S3 to save costs, Kafka must position itself to handle both real-time and historical data. This is where Apache Iceberg comes in. Iceberg is an open table format designed for handling large-scale datasets, and it's rapidly gaining popularity as a critical component in modern data lakes. By embracing Iceberg, Kafka can offer users a seamless way to store and query data in both streaming and batch modes.

Rather than relying solely on Kafka for short-term data retention while using a separate data warehouse or lakehouse for historical data, Kafka could serve as the central repository for all data—both streaming and batch. By extending its capabilities to handle long-term storage without the need for retention policies, Kafka could evolve into a true data lake. Iceberg plays a pivotal role in this transformation, offering users benefits such as schema evolution, partitioning, and optimized query performance. As Iceberg adoption grows, organizations could leverage it to query Kafka data in open formats, unlocking new use cases for both streaming and batch workloads.

Without embracing this batch functionality and integrating with formats like Iceberg, Kafka risks being sidelined in favor of systems that can do both batch and streaming in a unified, cost-effective way. Iceberg’s ability to bridge this gap is one of the main reasons for its growing popularity among data engineers and architects.

Kafka Needs a New Query Engine

To fully realize Kafka's potential in both streaming and batch worlds, it needs a query engine that treats streaming data as a first-class citizen. While batch-first query engines like Trino are excellent at running ad-hoc queries on large datasets, they fall short when it comes to continuous queries—a critical requirement for Kafka users. This gap creates an opportunity for the emergence of streaming-first query engines specifically designed for Kafka’s needs.

Several engines have stepped up to fill this gap. RisingWave and Flink are prominent examples of streaming-first platforms that excel at processing real-time, continuously updated data. RisingWave is particularly well-suited for running SQL queries on event streams, making it easier for users to gain insights in real-time without dealing with the complexity of traditional stream processing. Similarly, Flink’s distributed stream processing capabilities make it a powerful tool for low-latency, high-throughput applications.

Meanwhile, ksqlDB is an extension of Kafka that offers a simple, SQL-based interface for querying and processing Kafka streams in real time. Its deep integration with Kafka makes it an attractive option for those already invested in the Kafka ecosystem.

Technology	Description	Key Features	Use Cases
Flink	A distributed stream processing framework with support for event-driven applications.	True stream processing (event-by-event), low latency, exactly-once semantics, and fault tolerance.	Real-time data pipelines, low-latency event-driven applications, stateful stream processing.
ksqlDB	A streaming SQL engine for Kafka, enabling real-time queries on Kafka streams.	SQL interface for Kafka streams, integrates directly with Kafka, supports real-time and historical queries.	Real-time data processing on Kafka, stream transformations, event-driven microservices.
RisingWave	A streaming SQL engine designed to run real-time queries on event streams.	SQL-based, scalable, cloud-native, focuses on simplifying stream processing through SQL.	Real-time analytics, streaming SQL queries, continuous data integration, and feature engineering.
Spark Streaming	A micro-batch stream processing engine built on top of Apache Spark.	Integrates with Spark’s batch engine, fault-tolerant, and scalable. Supports real-time and batch processing via micro-batches.	Real-time analytics, ETL pipelines, and machine learning on streaming data.

These streaming-first engines are critical for making Kafka data instantly queryable—whether the data is recent or from days ago. By unlocking the ability to seamlessly query both real-time and historical datasets, RisingWave, Flink, and ksqlDB are paving the way for Kafka to remain relevant as the lines between streaming and batch processing blur.

Kafka stands at a crossroads. As the data landscape changes, with cloud-native infrastructure, cheaper object storage like S3, and new batch ingestion tools, Kafka must evolve to keep pace. The rise of more cost-efficient solutions, like Warpstream, signals a shift toward cheaper Kafka deployments. Additionally, expanding into the batch world and supporting both real-time and historical data queries will be key to its future relevance.

Critical to this evolution are streaming-first query engines like RisingWave, Flink, and ksqlDB. These technologies are leading the way in making continuous queries and real-time insights a reality, something Kafka needs to fully embrace.

By adapting to these trends—lowering costs, integrating batch processing, and adopting next-gen query engines—Kafka can remain at the heart of modern data infrastructures for years to come.