E-Commerce CDC: Syncing Your Database to Real-Time Analytics

E-Commerce CDC: Syncing Your Database to Real-Time Analytics

Change Data Capture (CDC) lets you stream every INSERT, UPDATE, and DELETE from your transactional database into a streaming analytics layer in real time — without modifying your application code, without scheduled exports, and without the stale-data problem of periodic ETL jobs.

Why This Matters for E-Commerce

Most e-commerce systems are built on a PostgreSQL or MySQL transactional database. Orders, customers, products, inventory — everything lives there. The analytics team wants to analyze this data, but querying the transactional database directly for analytics risks degrading application performance.

The standard solution is ETL: periodically export data to a data warehouse and run analytics there. But ETL introduces latency. A nightly export means analytics are 24 hours stale. Hourly exports improve this but create operational overhead and still leave a gap.

CDC solves this cleanly. Instead of periodic exports, CDC reads the database's binary log (binlog in MySQL, WAL in PostgreSQL) and streams every change as an event. Your analytics layer processes these events continuously, maintaining an always-current replica.

For e-commerce, this means:

  • Order status changes in your application database appear in analytics within seconds
  • Inventory adjustments made by your ERP system stream into inventory analytics immediately
  • Customer profile updates sync to your CLV and segmentation models in real time
  • Price changes in your product catalog propagate to pricing analytics instantly

No Kafka producer code required — RisingWave reads directly from the database binary log.

How Streaming SQL Solves This

RisingWave is a PostgreSQL-compatible streaming database with built-in CDC connectors for PostgreSQL and MySQL. You configure a source pointing to your transactional database, and RisingWave reads the binary log stream directly. Change events arrive as upserts (INSERTs and UPDATEs) and deletes, which RisingWave applies to its internal state.

On top of the CDC source, you define materialized views exactly as you would for a Kafka-based pipeline. RisingWave handles the complexity of upsert semantics — ensuring that when a row changes in the source database, the downstream aggregations update correctly.

Step-by-Step Tutorial

Step 1: Data Source — PostgreSQL CDC

Connect RisingWave directly to your PostgreSQL transactional database using the postgres-cdc connector. Enable logical replication on your PostgreSQL instance first (wal_level = logical).

-- CDC source from PostgreSQL orders table
CREATE SOURCE orders_cdc
WITH (
    connector       = 'postgres-cdc',
    hostname        = 'prod-db.internal',
    port            = '5432',
    username        = 'replicator',
    password        = 'secret',
    database.name   = 'ecommerce',
    schema.name     = 'public',
    table.name      = 'orders',
    slot.name       = 'risingwave_orders_slot'
);

-- CDC source from MySQL inventory table (binlog streaming)
CREATE SOURCE inventory_cdc
WITH (
    connector       = 'mysql-cdc',
    hostname        = 'mysql-db.internal',
    port            = '3306',
    username        = 'replicator',
    password        = 'secret',
    database.name   = 'warehouse',
    table.name      = 'inventory_ledger',
    server.id       = '1'
);

Step 2: Core Materialized View — Real-Time Order Analytics from CDC

Build analytics directly on CDC-sourced data. The orders_cdc source reflects the current state of your orders table — every change propagates immediately.

-- Real-time order summary built on CDC source
CREATE MATERIALIZED VIEW order_analytics_cdc AS
SELECT
    DATE_TRUNC('hour', created_at)  AS hour_bucket,
    status,
    channel,
    COUNT(id)                       AS order_count,
    SUM(total_amount)               AS revenue,
    AVG(total_amount)               AS avg_order_value,
    COUNT(DISTINCT customer_id)     AS unique_customers
FROM orders_cdc
GROUP BY DATE_TRUNC('hour', created_at), status, channel;

-- Customer order history from CDC (always current)
CREATE MATERIALIZED VIEW customer_order_summary_cdc AS
SELECT
    customer_id,
    COUNT(id)                               AS total_orders,
    SUM(total_amount)                       AS lifetime_value,
    MAX(created_at)                         AS last_order_ts,
    AVG(total_amount)                       AS avg_order_value,
    COUNT(id) FILTER (WHERE status = 'cancelled') AS cancelled_orders
FROM orders_cdc
GROUP BY customer_id;

Step 3: Cross-Database Joins — Orders with Inventory

One of the most powerful CDC patterns: join change streams from multiple databases to create analytics that span your transactional systems.

-- Join order CDC with inventory CDC for real-time availability analytics
CREATE MATERIALIZED VIEW order_inventory_view AS
SELECT
    o.id              AS order_id,
    o.product_id,
    o.quantity        AS ordered_quantity,
    o.status          AS order_status,
    i.quantity_on_hand,
    i.reorder_point,
    CASE
        WHEN i.quantity_on_hand < i.reorder_point THEN 'below_reorder_point'
        WHEN i.quantity_on_hand <= 0              THEN 'stockout'
        ELSE 'healthy'
    END               AS inventory_status,
    o.created_at
FROM orders_cdc o
LEFT JOIN inventory_cdc i ON o.product_id = i.sku;

-- Flag orders where inventory health is concerning
CREATE MATERIALIZED VIEW orders_with_inventory_risk AS
SELECT
    order_id,
    product_id,
    ordered_quantity,
    quantity_on_hand,
    inventory_status,
    created_at
FROM order_inventory_view
WHERE inventory_status IN ('below_reorder_point', 'stockout')
  AND order_status NOT IN ('cancelled', 'refunded');

Step 4: Serving Layer — Sink Enriched Analytics to a Reporting DB

Push CDC-derived analytics to a reporting database separate from your transactional system — giving analytics teams full query access without impacting application performance.

CREATE SINK order_analytics_sink
FROM order_analytics_cdc
WITH (
    connector   = 'jdbc',
    jdbc.url    = 'jdbc:postgresql://analyticsdb:5432/reporting?user=rw&password=secret',
    table.name  = 'order_analytics_hourly',
    type        = 'upsert',
    primary_key = 'hour_bucket,status,channel'
);

CREATE SINK customer_summary_sink
FROM customer_order_summary_cdc
WITH (
    connector   = 'jdbc',
    jdbc.url    = 'jdbc:postgresql://analyticsdb:5432/reporting?user=rw&password=secret',
    table.name  = 'customer_order_summary_live',
    type        = 'upsert',
    primary_key = 'customer_id'
);

Comparison Table

ApproachLatencyApp ImpactSetup ComplexityHandles Deletes
Nightly ETL export24 hoursLow (off-hours)LowSoft deletes only
Hourly incremental export1 hourLowMediumSoft deletes only
Kafka producer in app codeSecondsMedium (code changes)HighYes (manual)
CDC (RisingWave postgres-cdc)SecondsNoneLowYes (native)

FAQ

Q: Does CDC require changes to my application code? No. CDC reads the database binary log (WAL for PostgreSQL, binlog for MySQL) at the database level. Your application continues writing to the database normally. RisingWave's CDC connector reads the log stream independently without any application-side code changes.

Q: How does RisingWave handle UPDATEs and DELETEs in CDC sources? RisingWave applies upsert semantics for UPDATEs and propagates DELETE events through the materialized view pipeline. Downstream aggregations adjust automatically — if an order row is updated from status = 'pending' to status = 'completed', the materialized view shifts the count from pending to completed incrementally.

Q: What's the replication lag for CDC sources? Replication lag depends on network latency between your transactional database and RisingWave. In a co-located deployment, CDC lag is typically under 1 second. In cross-region setups, latency is bounded by network round-trip time. For most e-commerce analytics use cases, sub-10-second lag is achievable.

Key Takeaways

  • CDC lets you stream your existing PostgreSQL or MySQL transactional data into real-time analytics without modifying application code
  • RisingWave's built-in postgres-cdc and mysql-cdc connectors handle binary log reading, upsert semantics, and DELETE propagation natively
  • Cross-database CDC joins enable analytics that span your order system, inventory system, and other transactional databases in a single SQL layer
  • The analytics database receiving CDC-derived results is completely isolated from your transactional database — no query performance impact
  • SHOPLINE's migration from Lambda to Kappa architecture demonstrates the end state: a single streaming SQL layer handling all data transformations, replacing separate batch and streaming paths with one unified pipeline

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.