PostgreSQL CDC Without Kafka: RisingWave vs Debezium + Kafka Connect
PostgreSQL CDC without Kafka is possible and increasingly practical. RisingWave connects directly to PostgreSQL's logical replication stream using the Debezium Embedded Engine, eliminating the need for Kafka brokers, Kafka Connect workers, and connector configuration. If you only need one downstream consumer of your change data, removing Kafka simplifies the stack significantly.
The Standard Debezium + Kafka Stack
Most PostgreSQL CDC tutorials describe this architecture:
PostgreSQL (WAL/logical replication)
↓
Debezium PostgreSQL Connector (Kafka Connect worker)
↓
Kafka broker (topic: server.schema.table)
↓
Consumer application / stream processor / data warehouse
This stack is mature and battle-tested. It handles high throughput, supports multiple independent consumers, and retains events for replay. It also requires operating:
- One or more Kafka brokers (with ZooKeeper or KRaft)
- Kafka Connect workers
- Connector configuration and management
- Topic retention and compaction policies
- Network connectivity between all components
For many teams, this is appropriate. For teams that need CDC primarily for analytics, materialized views, or real-time queries — with a single downstream consumer — it is significant overhead.
The Full Kafka Setup for PostgreSQL CDC
Here is what the Debezium + Kafka path looks like end to end.
Step 1: Configure PostgreSQL
-- postgresql.conf
-- wal_level = logical (requires restart)
-- In psql
CREATE PUBLICATION debezium_pub FOR TABLE orders, customers, products;
ALTER ROLE debezium_user REPLICATION LOGIN;
Step 2: Deploy Kafka and Kafka Connect
# docker-compose.yml (simplified)
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on: [zookeeper]
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
kafka-connect:
image: debezium/connect:2.5
depends_on: [kafka]
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_status
Step 3: Register the Debezium Connector
curl -X POST http://kafka-connect:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "pg-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "debezium_user",
"database.password": "secret",
"database.dbname": "shop",
"database.server.name": "shop_server",
"table.include.list": "public.orders,public.customers,public.products",
"plugin.name": "pgoutput",
"slot.name": "debezium_slot",
"publication.name": "debezium_pub",
"topic.prefix": "shop_server"
}
}'
Step 4: Build a Consumer
With events flowing into Kafka, you now need a consumer. For analytics, this might be a Spark Structured Streaming job, a Flink application, ksqlDB, or a custom application that reads from the topic and writes to a data warehouse.
That consumer is a separate piece of infrastructure to build, deploy, and maintain.
The RisingWave Path (No Kafka)
RisingWave connects to PostgreSQL directly. It uses the Debezium Embedded Engine internally, so it does the same log reading that Debezium would do — without the Kafka layer.
Step 1: Configure PostgreSQL (Same as Before)
ALTER SYSTEM SET wal_level = logical;
-- (requires PostgreSQL restart)
CREATE PUBLICATION risingwave_pub FOR TABLE orders, customers, products;
ALTER ROLE risingwave_user REPLICATION LOGIN;
Step 2: Create a Source in RisingWave
CREATE SOURCE pg_shop WITH (
connector = 'postgres-cdc',
hostname = 'postgres',
port = '5432',
username = 'risingwave_user',
password = 'secret',
database.name = 'shop',
slot.name = 'risingwave_slot',
publication.name = 'risingwave_pub'
);
RisingWave creates the replication slot automatically and begins the snapshot process.
Step 3: Create Tables from the Source
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT,
total_amt DECIMAL(10, 2),
status VARCHAR,
created_at TIMESTAMPTZ
) FROM pg_shop TABLE 'public.orders';
CREATE TABLE customers (
id BIGINT PRIMARY KEY,
email VARCHAR,
region VARCHAR,
status VARCHAR
) FROM pg_shop TABLE 'public.customers';
Step 4: Write SQL Queries
-- Continuously maintained aggregation
CREATE MATERIALIZED VIEW revenue_by_region AS
SELECT
c.region,
SUM(o.total_amt) AS total_revenue,
COUNT(*) AS order_count,
COUNT(DISTINCT o.customer_id) AS unique_customers
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
GROUP BY c.region;
-- Query it like any table
SELECT * FROM revenue_by_region ORDER BY total_revenue DESC;
This view reflects the current state of the PostgreSQL tables. As rows are inserted, updated, or deleted in PostgreSQL, revenue_by_region updates automatically — typically within milliseconds.
Architecture Comparison
| Debezium + Kafka | RisingWave (built-in CDC) | |
| Components to operate | PostgreSQL, Kafka, ZooKeeper/KRaft, Kafka Connect, consumer app | PostgreSQL, RisingWave |
| Config files / APIs | docker-compose, connector JSON, consumer code | SQL |
| Fan-out to N consumers | Yes | No |
| SQL analytics built in | No (need separate processor) | Yes |
| Materialized views | No (need separate processor) | Yes |
| Event replay | Yes (Kafka retention) | No |
| End-to-end latency | 50–500 ms (typical) | Under 100 ms (typical) |
| Operational expertise needed | Kafka administration | SQL |
When Removing Kafka Makes Sense
Removing Kafka is the right call when all of the following are true:
- There is one downstream consumer of the CDC stream. Kafka's fan-out value is zero if only one system reads the topic.
- The destination is a SQL system. RisingWave's entire interface is SQL. If you want to query CDC data, write materialized views, or join multiple tables, SQL is more expressive than managing Kafka consumer code.
- Your team does not already operate Kafka. If Kafka is not already in the stack, standing it up for a single CDC pipeline is significant infrastructure investment.
- Replay requirements are limited. If you don't need to re-consume historical events for new consumers, Kafka's retention model adds no value.
Keeping Kafka makes sense when:
- Multiple downstream systems read from the same change stream.
- Some downstream consumers are not SQL-based (search engines, cache invalidation, notification services).
- Long-term event retention and replay are required.
- Kafka is already in the infrastructure and marginal cost is low.
Managed PostgreSQL Considerations
The CREATE SOURCE syntax above works identically for managed PostgreSQL offerings, with minor prerequisite differences:
Amazon RDS for PostgreSQL:
-- Run on RDS instance (requires rds_superuser)
SELECT rds_enable_logical_replication_slot('risingwave_slot', 'pgoutput');
-- Or set parameter group: rds.logical_replication = 1
Amazon Aurora PostgreSQL:
-- Aurora requires a custom parameter group
-- aurora_logical_replication = 1 (cluster parameter)
-- wal_level is set automatically when logical replication is enabled
Google Cloud SQL for PostgreSQL:
-- Enable: cloudsql.logical_decoding = on (instance flag)
-- Create publication as normal
Once the source database is configured, the RisingWave CREATE SOURCE statement is identical regardless of provider.
FAQ
Do I need to install anything extra to use RisingWave's PostgreSQL CDC connector? No. The CDC connector is built into RisingWave. No separate installation, plugin, or worker process is required beyond RisingWave itself.
How does RisingWave handle the initial snapshot for large tables? RisingWave (via the Debezium Embedded Engine) performs a consistent snapshot using a repeatable-read transaction. For very large tables, this can take minutes. During the snapshot, changes are buffered in the replication slot so nothing is missed.
What is the WAL retention risk with RisingWave's replication slot?
PostgreSQL holds WAL segments until the replication slot's LSN advances. If RisingWave stops consuming (e.g., is offline), WAL accumulates. Monitor pg_replication_slots.confirmed_flush_lsn and set max_slot_wal_keep_size in postgresql.conf to cap WAL retention.
Can I run Debezium Standalone and RisingWave CDC on the same PostgreSQL instance simultaneously? Yes. Each requires its own replication slot. Both will receive the full change stream independently. Monitor total WAL retention pressure, as each slot extends WAL independently.
Does RisingWave support TRUNCATE events from PostgreSQL CDC?
TRUNCATE is handled as a special case. PostgreSQL's logical replication does emit truncate events (in PostgreSQL 11+). RisingWave processes them by truncating the corresponding internal table state.

