Debezium Oracle CDC: Streaming Oracle Changes in Real Time

Debezium Oracle CDC: Streaming Oracle Changes in Real Time

The Debezium Oracle connector uses Oracle LogMiner to read the redo log and emit row-level change events for every INSERT, UPDATE, and DELETE. These events flow into Kafka and can be consumed by RisingWave as a continuous SQL-queryable stream, turning your Oracle database into a real-time data source without any application changes.

Why Oracle CDC Is Challenging—and Why It's Worth It

Oracle databases house some of the most critical enterprise data on the planet: financial ledgers, supply chain records, insurance claims. Getting that data out in near real time has historically required expensive Oracle GoldenGate licenses or brittle trigger-based solutions.

Debezium's Oracle connector changes the equation. It reads Oracle's redo log through LogMiner, an Oracle-provided utility, to reconstruct change events. The result is a standards-based, open-source pipeline with no per-row fees and no application-level triggers.

How Debezium Oracle CDC Works

Oracle CDC relies on two foundational Oracle features:

Supplemental logging ensures that the redo log contains enough column data to reconstruct a full before/after row image. Without it, Debezium can only see the changed columns, not the complete row.

LogMiner is an Oracle-provided PL/SQL API (DBMS_LOGMNR) that parses the binary redo log and returns SQL-like change records. Debezium connects to LogMiner, registers the redo log files, and reads V$LOGMNR_CONTENTS to extract change events.

SCN (System Change Number) is Oracle's monotonically increasing transaction ID. Debezium stores the last processed SCN as its offset, enabling precise resumption after a restart.

PDB/CDB support: For Oracle 19c+ multitenant deployments, Debezium can connect to either a pluggable database (PDB) or the container database (CDB) root, depending on the database.pdb.name configuration.

Step-by-Step Tutorial

Step 1: Configure Oracle for CDC

-- Enable supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

-- Enable full supplemental logging for the target table
ALTER TABLE hr.employees ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

-- Create a Debezium user with required privileges
CREATE USER c##dbzuser IDENTIFIED BY dbzpassword
  DEFAULT TABLESPACE users
  QUOTA UNLIMITED ON users
  CONTAINER = ALL;

GRANT CREATE SESSION TO c##dbzuser CONTAINER = ALL;
GRANT SET CONTAINER TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ON V_$DATABASE TO c##dbzuser CONTAINER = ALL;
GRANT FLASHBACK ANY TABLE TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ANY TABLE TO c##dbzuser CONTAINER = ALL;
GRANT SELECT_CATALOG_ROLE TO c##dbzuser CONTAINER = ALL;
GRANT EXECUTE_CATALOG_ROLE TO c##dbzuser CONTAINER = ALL;
GRANT SELECT ANY TRANSACTION TO c##dbzuser CONTAINER = ALL;
GRANT LOGMINING TO c##dbzuser CONTAINER = ALL;

For a non-CDB Oracle instance, the CONTAINER = ALL clauses and c## prefix are not needed.

Step 2: Deploy the Debezium Oracle Connector

{
  "name": "oracle-hr-connector",
  "config": {
    "connector.class": "io.debezium.connector.oracle.OracleConnector",
    "database.hostname": "oracle-host",
    "database.port": "1521",
    "database.user": "c##dbzuser",
    "database.password": "dbzpassword",
    "database.dbname": "ORCLCDB",
    "database.pdb.name": "ORCLPDB1",
    "database.server.name": "oraserver1",
    "table.include.list": "HR.EMPLOYEES,HR.DEPARTMENTS",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.oracle",
    "snapshot.mode": "initial",
    "log.mining.strategy": "online_catalog",
    "log.mining.continuous.mine": "true"
  }
}

log.mining.strategy: online_catalog is the recommended mode for most deployments; it uses the online data dictionary to resolve object names.

Step 3: Connect RisingWave to Oracle CDC Stream

-- For Debezium → Kafka → RisingWave pipeline:
CREATE SOURCE employees_cdc (
    employee_id   BIGINT,
    first_name    VARCHAR,
    last_name     VARCHAR,
    email         VARCHAR,
    salary        NUMERIC,
    department_id BIGINT,
    hire_date     DATE,
    _op           VARCHAR  -- debezium op field: c/u/d/r
) WITH (
    connector = 'kafka',
    topic = 'oraserver1.HR.EMPLOYEES',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;

Step 4: Query in Real Time

-- Headcount by department, updated as changes stream in
CREATE MATERIALIZED VIEW headcount_by_dept AS
SELECT
    department_id,
    COUNT(*)         AS headcount,
    AVG(salary)      AS avg_salary,
    MAX(salary)      AS max_salary
FROM employees_cdc
GROUP BY department_id;

-- Recent high-salary hires in last 7 days
CREATE MATERIALIZED VIEW recent_hires AS
SELECT employee_id, first_name, last_name, salary, hire_date
FROM employees_cdc
WHERE hire_date >= CURRENT_DATE - INTERVAL '7' DAY
  AND salary > 100000
ORDER BY hire_date DESC;

Comparison Table

AspectDebezium + LogMinerOracle GoldenGateTrigger-based CDC
CostFree (open source)Very high license costFree
LatencySub-secondSub-secondSub-second
Impact on OracleLow (LogMiner reads redo)LowMedium-high (trigger overhead)
Schema changesAutomatic with schema historyRequires reconfigurationManual
Supported Oracle versions11g, 12c, 19c, 21c11g+Any
CDB/PDB supportYesYesLimited

FAQ

Q: Does supplemental logging affect Oracle performance? Enabling supplemental logging increases the volume of redo log data, which adds a small amount of I/O overhead—typically 1–5% depending on update frequency. ALL COLUMNS supplemental logging has a larger footprint than MINIMAL; use it only on tables that need full before-images.

Q: What is the difference between log.mining.strategy: online_catalog and redo_log_catalog? online_catalog uses the live database dictionary to resolve table/column names—fast but can't handle DDL changes that occurred in archived logs. redo_log_catalog embeds the dictionary in the redo log, supporting DDL history replay at the cost of increased log volume.

Q: How does Debezium recover if the SCN falls outside the retained redo logs? Debezium will fail with an error if the stored SCN is older than the earliest available redo log. You'll need to trigger a new snapshot (snapshot.mode: initial after resetting the connector offsets). Set LOG_ARCHIVE_MIN_SUCCEED_DEST and archive log retention high enough to cover your longest expected downtime.

Key Takeaways

  • Supplemental logging must be enabled at both the database and table levels before the connector can produce complete change events.
  • SCN offsets give Debezium a reliable position marker for resumption without data loss or duplication.
  • For Oracle 19c+ multitenant setups, configure database.pdb.name to target the correct pluggable database.
  • RisingWave's FORMAT DEBEZIUM ENCODE JSON automatically handles the Oracle Debezium envelope, including before/after payloads for updates and deletes.
  • For smaller Oracle deployments without Kafka, evaluate whether the full Debezium + Kafka stack is warranted; otherwise a direct JDBC pull with deduplication may suffice.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.