How to Implement Slowly Changing Dimensions in Streaming

How to Implement Slowly Changing Dimensions in Streaming

How to Implement Slowly Changing Dimensions in Streaming

Slowly Changing Dimensions (SCDs) track how dimensional data (customer attributes, product categories, pricing tiers) changes over time. In streaming, SCDs are handled naturally through CDC — every change is captured as an event, and streaming SQL can maintain both current and historical state.

SCD Types in Streaming

SCD TypeBehaviorStreaming Implementation
Type 1Overwrite (no history)CDC table = always current
Type 2Add new row (keep history)Event stream + window function
Type 3Add columns (limited history)CDC table with previous_value column

Type 1 (Current State Only)

-- CDC table automatically reflects current state
CREATE TABLE customers (...) FROM pg_source TABLE 'public.customers';
-- Always current, no history
SELECT * FROM customers WHERE customer_id = 123;

Type 2 (Full History)

-- Track all changes with effective dates
CREATE MATERIALIZED VIEW customer_history AS
SELECT customer_id, name, plan, updated_at as effective_from,
  LEAD(updated_at) OVER (PARTITION BY customer_id ORDER BY updated_at) as effective_to,
  CASE WHEN LEAD(updated_at) OVER (PARTITION BY customer_id ORDER BY updated_at) IS NULL
    THEN true ELSE false END as is_current
FROM customer_changes;

Frequently Asked Questions

Does streaming handle SCD Type 2 well?

Yes. CDC captures every change as an event. Streaming SQL with window functions (LAG, LEAD) naturally tracks effective dates and current flags — no batch SCD job needed.

How does SCD in streaming compare to dbt snapshots?

dbt snapshots run on a schedule (hourly/daily) and may miss rapid changes. Streaming CDC captures every change immediately, providing complete history without gaps.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.