Debezium Kubernetes Deployment: Production-Ready CDC Setup

Running Debezium on Kubernetes with the Strimzi operator gives you a declarative, GitOps-friendly CDC deployment with automatic rolling updates, connector lifecycle management via CRDs, and native Kubernetes health probes—everything needed for a production-grade pipeline.

Why Kubernetes for Debezium?

Running Debezium on bare VMs or Docker Compose works for development, but production CDC pipelines need:

High availability: Kafka Connect workers in a distributed cluster; connector tasks automatically rebalanced if a worker fails.
Resource guarantees: CPU and memory limits prevent a runaway connector from starving other workloads.
Declarative configuration: KafkaConnector CRDs checked into git, reviewed in pull requests, applied via CI/CD.
Observability: Kubernetes-native liveness and readiness probes, metrics scraped by Prometheus.

The Strimzi operator brings all of this to Kafka and Kafka Connect on Kubernetes, and it's the recommended way to run Debezium in production on K8s.

Architecture Overview

┌──────────────────────────────────────────────────────┐
│ Kubernetes Cluster                                   │
│  ┌─────────────────┐   ┌──────────────────────────┐  │
│  │ Strimzi Operator │   │ KafkaConnectCluster      │  │
│  │ (watches CRDs)   │──▶│ (3 replicas, Debezium    │  │
│  └─────────────────┘   │  plugin image)            │  │
│                         └──────────┬───────────────┘  │
│  ┌──────────────────────┐          │                   │
│  │ KafkaConnector CRDs  │──────────┘                   │
│  │ - postgres-orders    │   REST API (port 8083)       │
│  │ - mysql-inventory    │                              │
│  └──────────────────────┘                              │
└──────────────────────────────────────────────────────┘

Step-by-Step Tutorial

Step 1: Install Strimzi and Create a Kafka Cluster

# Install Strimzi operator
kubectl create namespace kafka
kubectl apply -f https://strimzi.io/install/latest?namespace=kafka -n kafka

# Create a minimal Kafka cluster for development/staging
kubectl apply -f - <<EOF
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: debezium-kafka
  namespace: kafka
spec:
  kafka:
    version: 3.7.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.retention.hours: 168
    storage:
      type: persistent-claim
      size: 100Gi
      class: gp3
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: gp3
  entityOperator:
    topicOperator: {}
    userOperator: {}
EOF

Step 2: Deploy Kafka Connect with Debezium Plugin

Build a custom Kafka Connect image with the Debezium connector JARs, then reference it in the KafkaConnect resource:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: debezium-connect
  namespace: kafka
  annotations:
    strimzi.io/use-connector-resources: "true"
spec:
  version: 3.7.0
  replicas: 3
  bootstrapServers: debezium-kafka-kafka-bootstrap:9092
  image: my-registry/debezium-connect:2.7.0
  config:
    group.id: debezium-connect-cluster
    offset.storage.topic: debezium-connect-offsets
    config.storage.topic: debezium-connect-configs
    status.storage.topic: debezium-connect-status
    config.storage.replication.factor: 3
    offset.storage.replication.factor: 3
    status.storage.replication.factor: 3
    key.converter: org.apache.kafka.connect.json.JsonConverter
    value.converter: org.apache.kafka.connect.json.JsonConverter
    key.converter.schemas.enable: "false"
    value.converter.schemas.enable: "false"
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 2Gi
  livenessProbe:
    httpGet:
      path: /
      port: 8083
    initialDelaySeconds: 60
    periodSeconds: 30
    timeoutSeconds: 10
    failureThreshold: 3
  readinessProbe:
    httpGet:
      path: /connectors
      port: 8083
    initialDelaySeconds: 30
    periodSeconds: 15
    timeoutSeconds: 5
    failureThreshold: 3
  jvmOptions:
    "-Xms": "512m"
    "-Xmx": "1536m"
  metricsConfig:
    type: jmxPrometheusExporter
    valueFrom:
      configMapKeyRef:
        name: debezium-connect-metrics
        key: metrics-config.yaml

Step 3: Deploy a Connector Using KafkaConnector CRD

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: postgres-orders-connector
  namespace: kafka
  labels:
    strimzi.io/cluster: debezium-connect
spec:
  class: io.debezium.connector.postgresql.PostgresConnector
  tasksMax: 1
  autoRestart:
    enabled: true
    maxRestarts: 10
  config:
    database.hostname: postgres-service
    database.port: "5432"
    database.user: debezium
    database.password: ${file:/opt/kafka/external-configuration/db-secret/password}
    database.dbname: shop
    database.server.name: pgserver1
    table.include.list: public.orders
    plugin.name: pgoutput
    snapshot.mode: initial
    snapshot.isolation.mode: read_committed
    heartbeat.interval.ms: "10000"
    errors.tolerance: all
    errors.deadletterqueue.topic.name: dlq.orders-cdc
    errors.deadletterqueue.context.headers.enable: "true"

Reference secrets safely using Strimzi's ExternalConfiguration:

  externalConfiguration:
    volumes:
      - name: db-secret
        secret:
          secretName: postgres-debezium-credentials

Step 4: Connect RisingWave to Process the Stream

-- For Debezium → Kafka → RisingWave pipeline:
CREATE SOURCE orders_cdc (
    id          BIGINT,
    customer_id BIGINT,
    total       NUMERIC,
    status      VARCHAR,
    updated_at  TIMESTAMPTZ,
    _op         VARCHAR  -- debezium op field: c/u/d/r
) WITH (
    connector = 'kafka',
    topic = 'pgserver1.public.orders',
    properties.bootstrap.server = 'debezium-kafka-kafka-bootstrap.kafka.svc.cluster.local:9092',
    scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;

CREATE MATERIALIZED VIEW order_pipeline_health AS
SELECT
    status,
    COUNT(*)        AS order_count,
    MAX(updated_at) AS last_updated,
    EXTRACT(EPOCH FROM (NOW() - MAX(updated_at))) AS seconds_since_last_event
FROM orders_cdc
GROUP BY status;

Production Checklist

Concern	Configuration
Connector HA	`replicas: 3` in KafkaConnect, tasks rebalanced automatically
Secret management	Strimzi ExternalConfiguration + K8s Secrets
Resource limits	`requests`/`limits` in KafkaConnect spec
Liveness probe	HTTP GET `/` on port 8083
Readiness probe	HTTP GET `/connectors` on port 8083
Auto-restart on failure	`autoRestart.enabled: true` in KafkaConnector
Metrics	JMX Prometheus Exporter via `metricsConfig`
DLQ for bad events	`errors.tolerance: all` + DLQ topic
Heartbeat monitoring	`heartbeat.interval.ms: 10000`
Offset storage	Kafka internal topics with replication factor 3

FAQ

Q: How does Strimzi manage connector upgrades without downtime? Update the KafkaConnect image reference, and Strimzi performs a rolling restart of Connect workers. Connectors are automatically rebalanced to running workers during the rollout. The autoRestart feature in KafkaConnector ensures any connector that fails to resume after a worker restart is automatically retried.

Q: What resource limits should I set for Debezium Connect workers? A good starting point for a single connector per worker: requests: {cpu: 500m, memory: 1Gi}, limits: {cpu: 2, memory: 2Gi}. Increase memory for connectors handling large snapshots or many tables. The JVM heap (-Xmx) should be set to roughly 75% of the container memory limit.

Q: Can I run multiple connectors on the same Connect cluster? Yes. Kafka Connect distributes connector tasks across all available workers. Each connector can be independently configured, paused, or restarted via the KafkaConnector CRD without affecting other connectors. Monitor per-connector task counts and ensure total tasks don't exceed replicas × max.tasks.per.worker.

Key Takeaways

Strimzi's KafkaConnector CRD enables GitOps-style connector management—connector configs live in version control and are applied declaratively.
Set explicit CPU and memory requests and limits on Connect workers to prevent noisy-neighbor issues in shared clusters.
Use autoRestart.enabled: true to recover from transient connector failures without manual intervention.
The liveness probe on / and readiness probe on /connectors (port 8083) give Kubernetes accurate signal to route traffic and restart unhealthy pods.
Store database credentials in Kubernetes Secrets and reference them via Strimzi's ExternalConfiguration—never hardcode passwords in KafkaConnector CRDs.

Get started | Slack