Change Data Capture with Debezium — Real-Time Data Sync for Microservices

Posted on: 4/18/2026 12:10:44 PM

< 1sAverage latency for log-based CDC
200+Community-supported connectors
0%Overhead on source DB (log-based)
100K+GitHub stars in the Debezium ecosystem

1. What Is Change Data Capture and Why It Matters

Change Data Capture (CDC) is the technique of detecting and recording every change (INSERT, UPDATE, DELETE) that happens inside a database, then streaming them as events to downstream systems. Instead of systems constantly polling the database asking "anything new?", CDC pushes changes out as soon as they happen — turning the database into a real-time event source.

In modern microservices architecture, CDC solves a core problem: how do we keep data in sync between services without tight coupling? When Service A writes data to PostgreSQL, Service B (search), Service C (analytics), and Service D (cache) all need to know about that change — but Service A shouldn't have to call each of them. CDC solves this by reading the database's transaction log and emitting events automatically.

💡 Why not use Dual Write?

Many teams go with "write to DB, then publish an event" (dual write). But this pattern doesn't guarantee atomicity — if the DB commit succeeds but the message broker fails, data becomes inconsistent. CDC eliminates this entirely because it only reads from the already-committed transaction log.

Main CDC Use Cases

CDC isn't only for data sync. Below are common production scenarios:

  • Event-Driven Architecture: Turn every database change into a domain event so microservices can react in real time without calling each other's APIs
  • Cache Invalidation: Automatically invalidate or update Redis/Memcached when source data changes — definitively solving cache consistency
  • Search Index Sync: Synchronize data from an OLTP database to Elasticsearch/OpenSearch in near real time, without batch reindexing
  • Data Warehouse / Lakehouse: Stream data from operational DBs to ClickHouse, Apache Iceberg, or BigQuery for analytics
  • Audit Trail: Capture a complete change history — who changed what, when, and with what old/new values
  • Database Migration: Zero-downtime migration between engines by CDC-ing from source to target, then cutting over once synced

2. CDC Methods: From Crude to Professional

There are 4 main CDC methods, each with trade-offs in performance, accuracy, and complexity:

graph LR
    A["Timestamp-based
Polling"] --> B["Trigger-based
CDC"] B --> C["Query-based
CDC"] C --> D["Log-based
CDC"] style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#e94560,stroke:#fff,color:#fff

CDC methods ranked by sophistication — Log-based is the production-grade option

MethodHow it worksProsCons
Timestamp-based Query records with updated_at > last check Simple, no special tools Misses DELETE, relies on a timestamp column, loads the DB
Trigger-based DB triggers write changes into a shadow table Captures every kind of change Heavy overhead on the write path, hard to maintain, not portable
Query-based Periodic polling with diff detection Works with any DB High latency (minutes), loads the DB for large tables, misses intermediate changes
Log-based Reads the transaction log (WAL / binlog / oplog) Zero impact on the DB, captures every change, exact ordering, sub-second latency More complex, depends on DB-specific log formats

✅ Best Practice

For any production workload, always choose log-based CDC. The other methods only suit development/prototyping or databases without logical replication. Log-based CDC is the only one that guarantees zero overhead, captures DELETEs, keeps transaction order intact, and delivers sub-second latency.

3. Debezium: Architecture and How It Works

Debezium is the most popular open-source CDC platform today, developed by Red Hat and the community. Debezium reads a database's transaction log, converts each row-level change into a structured event, and publishes it to Kafka (or other messaging systems via Debezium Server).

Overall Architecture

graph TB
    subgraph Sources["Source Databases"]
        PG["PostgreSQL
WAL"] MY["MySQL
Binlog"] MG["MongoDB
Oplog"] SS["SQL Server
CT/CDC"] end subgraph KC["Kafka Connect Cluster"] DC1["Debezium
Connector 1"] DC2["Debezium
Connector 2"] DC3["Debezium
Connector N"] end subgraph Kafka["Apache Kafka"] T1["Topic: db.schema.users"] T2["Topic: db.schema.orders"] T3["Topic: db.schema.products"] SR["Schema Registry"] end subgraph Consumers["Downstream Consumers"] ES["Elasticsearch"] RD["Redis Cache"] CH["ClickHouse"] MS["Microservices"] end PG --> DC1 MY --> DC2 MG --> DC3 SS --> DC1 DC1 --> T1 DC2 --> T2 DC3 --> T3 DC1 --> SR DC2 --> SR T1 --> ES T2 --> RD T1 --> CH T3 --> MS style PG fill:#336791,stroke:#fff,color:#fff style MY fill:#4479A1,stroke:#fff,color:#fff style MG fill:#4DB33D,stroke:#fff,color:#fff style SS fill:#CC2927,stroke:#fff,color:#fff style DC1 fill:#e94560,stroke:#fff,color:#fff style DC2 fill:#e94560,stroke:#fff,color:#fff style DC3 fill:#e94560,stroke:#fff,color:#fff style T1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style T2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style T3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style SR fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style ES fill:#2c3e50,stroke:#fff,color:#fff style RD fill:#DC382D,stroke:#fff,color:#fff style CH fill:#FFCC01,stroke:#2c3e50,color:#2c3e50 style MS fill:#2c3e50,stroke:#fff,color:#fff

End-to-end architecture: Debezium reads the transaction log → publishes events to Kafka → downstream consumers process them

Three main components in the Debezium architecture:

  • Debezium Connector: Runs as a Kafka Connect plugin. Each connector watches one database instance, reads its transaction log, parses it into change events, and produces them to Kafka topics
  • Apache Kafka: Acts as the central message broker. Each table in the source DB maps to one Kafka topic. Kafka provides durability, ordering, and lets many consumers read the same stream
  • Schema Registry: Stores the change-event schema (Avro / JSON Schema / Protobuf), so consumers can deserialize correctly and handle schema evolution

Structure of a Change Event

Each Debezium event has a standard structure with a key (record primary key) and a value (change details):

{
  "schema": { ... },
  "payload": {
    "before": {
      "id": 1001,
      "name": "Nguyen Van A",
      "email": "a@example.com",
      "status": "active"
    },
    "after": {
      "id": 1001,
      "name": "Nguyen Van A",
      "email": "newemail@example.com",
      "status": "active"
    },
    "source": {
      "version": "2.7.0",
      "connector": "postgresql",
      "name": "myapp",
      "ts_ms": 1713436800000,
      "db": "mydb",
      "schema": "public",
      "table": "users",
      "lsn": 123456789,
      "txId": 5678
    },
    "op": "u",
    "ts_ms": 1713436800123,
    "transaction": {
      "id": "5678",
      "total_order": 3,
      "data_collection_order": 1
    }
  }
}

The op field indicates the operation type: "c" = create (INSERT), "u" = update, "d" = delete, "r" = read (snapshot). The before and after fields hold the row state before and after the change — particularly useful for audit trails and diff detection.

4. Debezium Connectors: Multi-Database Support

Debezium supports most popular databases, with each connector leveraging that engine's native replication:

DatabaseCDC mechanismNotable details
PostgreSQL Logical Replication (pgoutput / wal2json) Best-supported; use publications to filter tables. Requires wal_level=logical
MySQL/MariaDB Binlog (ROW format) Reads binary logs with GTID tracking. Requires binlog_format=ROW and binlog_row_image=FULL
SQL Server Change Tracking / CDC tables Uses SQL Server's built-in CDC feature. Requires CDC enabled on both the database and each target table
MongoDB Oplog / Change Streams Uses the Change Streams API (MongoDB 3.6+). Supports resume tokens to avoid event loss
Oracle LogMiner / XStream Reads redo logs via the LogMiner API. Requires supplemental logging enabled
Cassandra Commit Log Community connector that reads commit-log segments. Suitable for wide-column store CDC

⚠️ SQL Server caveat

SQL Server CDC requires the SQL Server Agent to be running and CDC enabled on the database (sys.sp_cdc_enable_db). For Azure SQL Database you need the Standard tier or higher. Debezium reads from the change tables (cdc.*) rather than the transaction log directly.

5. Hands-on: Debezium + PostgreSQL

Below is a complete CDC pipeline setup with Docker Compose — from a PostgreSQL source to a Kafka consumer:

Docker Compose for a CDC environment

version: '3.8'
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secret
    command:
      - "postgres"
      - "-c"
      - "wal_level=logical"
      - "-c"
      - "max_replication_slots=4"
      - "-c"
      - "max_wal_senders=4"
    ports:
      - "5432:5432"

  kafka:
    image: bitnami/kafka:3.7
    environment:
      KAFKA_CFG_NODE_ID: 0
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 0@kafka:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
    ports:
      - "9092:9092"

  connect:
    image: debezium/connect:2.7
    depends_on:
      - kafka
      - postgres
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: cdc-connect-cluster
      CONFIG_STORAGE_TOPIC: _connect-configs
      OFFSET_STORAGE_TOPIC: _connect-offsets
      STATUS_STORAGE_TOPIC: _connect-status
    ports:
      - "8083:8083"

Registering the Debezium Connector

Once containers are up, register the connector via its REST API:

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "myapp-postgres-connector",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "database.hostname": "postgres",
      "database.port": "5432",
      "database.user": "appuser",
      "database.password": "secret",
      "database.dbname": "myapp",
      "topic.prefix": "myapp",
      "schema.include.list": "public",
      "table.include.list": "public.users,public.orders",
      "plugin.name": "pgoutput",
      "slot.name": "debezium_slot",
      "publication.name": "debezium_pub",
      "snapshot.mode": "initial",
      "transforms": "unwrap",
      "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
      "transforms.unwrap.drop.tombstones": "false",
      "transforms.unwrap.delete.handling.mode": "rewrite",
      "key.converter": "org.apache.kafka.connect.json.JsonConverter",
      "value.converter": "org.apache.kafka.connect.json.JsonConverter",
      "key.converter.schemas.enable": "false",
      "value.converter.schemas.enable": "false"
    }
  }'

Important config keys:

  • topic.prefix: Prefix for Kafka topic names — events from public.users land in topic myapp.public.users
  • snapshot.mode: initial: On first run, Debezium snapshots existing data before switching to WAL streaming
  • plugin.name: pgoutput: Uses PostgreSQL's built-in logical decoding plugin (no extra extension needed)
  • ExtractNewRecordState: Transform that flattens events — instead of a before/after envelope, consumers receive the new record state directly

6. Advanced CDC Patterns in Production

The Transactional Outbox Pattern

The Outbox Pattern is one of the most important CDC applications. Instead of publishing events directly to a broker (dual write), the service writes events into an outbox table inside the same transaction as the business data. Debezium reads the outbox table and publishes events to Kafka.

sequenceDiagram
    participant App as Application
    participant DB as PostgreSQL
    participant DBZ as Debezium
    participant K as Kafka
    participant C as Consumer

    App->>DB: BEGIN TRANSACTION
    App->>DB: INSERT INTO orders (...)
    App->>DB: INSERT INTO outbox (aggregate_type, payload)
    App->>DB: COMMIT

    DBZ->>DB: Read WAL (outbox table changes)
    DBZ->>K: Publish OrderCreated event
    K->>C: Consume event
    C->>C: Update search index / Send email / Update cache

Outbox Pattern: events are written atomically with business data; Debezium relays them to Kafka

Debezium ships an Outbox Event Router SMT (Single Message Transform) for this pattern:

-- Outbox table schema
CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR(255) NOT NULL,
    aggregate_id VARCHAR(255) NOT NULL,
    type VARCHAR(255) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Example: creating an order
BEGIN;
  INSERT INTO orders (id, customer_id, total, status)
    VALUES ('ord-001', 'cust-123', 599000, 'pending');

  INSERT INTO outbox (aggregate_type, aggregate_id, type, payload)
    VALUES (
      'Order', 'ord-001', 'OrderCreated',
      '{"orderId":"ord-001","customerId":"cust-123","total":599000}'::jsonb
    );
COMMIT;

Connector config for the Outbox:

{
  "transforms": "outbox",
  "transforms.outbox.type":
    "io.debezium.transforms.outbox.EventRouter",
  "transforms.outbox.table.field.event.id": "id",
  "transforms.outbox.table.field.event.key": "aggregate_id",
  "transforms.outbox.table.field.event.type": "type",
  "transforms.outbox.table.field.event.payload": "payload",
  "transforms.outbox.route.by.field": "aggregate_type",
  "transforms.outbox.route.topic.regex": "(.*)Order(.*)",
  "transforms.outbox.route.topic.replacement": "order-events"
}

✅ Why is the Outbox Pattern better than Dual Write?

Atomicity: Business data and event live in the same transaction — you can't have committed data with a lost event. Idempotency: Debezium tracks offsets in the WAL, so after a restart it replays from where it left off, without duplication. Ordering: Events preserve exact transaction commit order as written in the WAL.

CQRS + Materialized View with CDC

CDC is a natural bridge for CQRS (Command Query Responsibility Segregation). The write side writes to an OLTP database (PostgreSQL, SQL Server); CDC streams changes to read-optimized stores (Elasticsearch, Redis, ClickHouse):

graph LR
    subgraph Write["Write Side"]
        API["API
.NET Core"] --> PG["PostgreSQL
OLTP"] end subgraph CDC["CDC Pipeline"] PG --> DBZ["Debezium"] DBZ --> KF["Kafka"] end subgraph Read["Read Side (Materialized Views)"] KF --> ES["Elasticsearch
Full-text Search"] KF --> RD["Redis
Hot Data Cache"] KF --> CK["ClickHouse
Analytics"] end style API fill:#512BD4,stroke:#fff,color:#fff style PG fill:#336791,stroke:#fff,color:#fff style DBZ fill:#e94560,stroke:#fff,color:#fff style KF fill:#2c3e50,stroke:#fff,color:#fff style ES fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style RD fill:#DC382D,stroke:#fff,color:#fff style CK fill:#FFCC01,stroke:#2c3e50,color:#2c3e50

CQRS with CDC: write to PostgreSQL, then CDC automatically syncs to read stores

CDC-based Cache Invalidation

Cache invalidation is one of the hardest problems in distributed systems. With CDC you don't need invalidation logic in application code — Debezium emits events on data changes, and consumers listen and update/invalidate the cache:

// .NET consumer that invalidates Redis cache from CDC events
public class CdcCacheInvalidator : BackgroundService
{
    private readonly IConsumer<string, string> _consumer;
    private readonly IConnectionMultiplexer _redis;

    protected override async Task ExecuteAsync(
        CancellationToken stoppingToken)
    {
        _consumer.Subscribe("myapp.public.products");

        while (!stoppingToken.IsCancellationRequested)
        {
            var result = _consumer.Consume(stoppingToken);
            var change = JsonSerializer
                .Deserialize<DebeziumEvent>(result.Message.Value);

            var db = _redis.GetDatabase();
            var productId = change.After?.Id ?? change.Before?.Id;

            switch (change.Op)
            {
                case "c":
                case "u":
                    // Upsert into cache with TTL
                    await db.StringSetAsync(
                        $"product:{productId}",
                        JsonSerializer.Serialize(change.After),
                        TimeSpan.FromMinutes(30));
                    break;

                case "d":
                    // Remove from cache
                    await db.KeyDeleteAsync($"product:{productId}");
                    break;
            }
        }
    }
}

7. Debezium Server: CDC Without Kafka

Not every organization wants to operate a Kafka cluster. Debezium Server is a standalone application that runs Debezium connectors without Kafka Connect — instead, events go directly to sinks such as Redis Streams, Amazon Kinesis, Google Pub/Sub, Azure Event Hubs, or HTTP webhooks.

graph LR
    DB["Source DB"] --> DS["Debezium
Server"] DS --> RS["Redis Streams"] DS --> KN["Amazon Kinesis"] DS --> PB["Google Pub/Sub"] DS --> EH["Azure Event Hubs"] DS --> HT["HTTP Webhook"] style DB fill:#336791,stroke:#fff,color:#fff style DS fill:#e94560,stroke:#fff,color:#fff style RS fill:#DC382D,stroke:#fff,color:#fff style KN fill:#FF9900,stroke:#fff,color:#fff style PB fill:#4285F4,stroke:#fff,color:#fff style EH fill:#0078D4,stroke:#fff,color:#fff style HT fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Debezium Server: lightweight deployment sinking directly to cloud services

# application.properties for Debezium Server
debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
debezium.source.database.hostname=localhost
debezium.source.database.port=5432
debezium.source.database.user=appuser
debezium.source.database.password=secret
debezium.source.database.dbname=myapp
debezium.source.topic.prefix=myapp
debezium.source.plugin.name=pgoutput
debezium.source.offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
debezium.source.offset.storage.file.filename=/data/offsets.dat

# Sink: Redis Streams
debezium.sink.type=redis
debezium.sink.redis.address=localhost:6379

# Or sink: HTTP Webhook
# debezium.sink.type=http
# debezium.sink.http.url=https://api.example.com/webhooks/cdc

💡 When to use Debezium Server instead of Kafka Connect?

Pick Debezium Server when the team is small and doesn't want to operate Kafka, you're already on managed cloud messaging (Kinesis, Event Hubs), or you only need CDC for 1-2 simple databases. Pick Kafka Connect when you need many connectors, require exactly-once semantics, need to replay events from Kafka topics, or already have Kafka infrastructure.

8. Production Best Practices

Snapshot Strategy

When a Debezium connector first starts, it needs to snapshot the existing data. There are several snapshot modes:

ModeDescriptionUse case
initial Snapshot everything, then switch to streaming Initial setup, need all existing data
initial_only Snapshot only, don't stream afterward One-time data migration
when_needed Re-snapshot if offset is lost Fault-tolerant production setups
no_data Capture schema only, skip existing data Only interested in changes after setup
recovery Re-snapshot when a WAL gap is detected Recovery after WAL truncation incidents

Monitoring and Observability

A CDC pipeline is a critical component — if Debezium stops or lags, downstream systems serve stale data. Metrics to monitor:

LagMilliSecondsBehindSource — distance between event time and processing time
QueueQueueTotalCapacity vs QueueRemainingCapacity — buffer overflow risk
ErrorsNumberOfErroneousEvents — parse/transform failures
ThroughputTotalNumberOfEventsSeen — events per second processed
# Prometheus scrape config for Debezium metrics
- job_name: 'debezium'
  metrics_path: '/metrics'
  static_configs:
    - targets: ['connect:8083']

# Important alert rules
groups:
  - name: debezium_alerts
    rules:
      - alert: DebeziumHighLag
        expr: debezium_metrics_MilliSecondsBehindSource > 30000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Debezium lag > 30s on {{ $labels.context }}"

      - alert: DebeziumConnectorDown
        expr: debezium_metrics_Connected == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Debezium connector lost DB connection"

Schema Evolution

Database schemas change constantly (ALTER TABLE, ADD COLUMN...). Debezium handles this via the Schema History Topic — storing every DDL change so a restarted connector can rebuild the schema. Combined with a Schema Registry, consumers can handle schema evolution gracefully:

  • Adding columns: New events contain the new field; older consumers ignore unknown fields (forward compatibility)
  • Removing columns: New consumers handle missing fields using default values (backward compatibility)
  • Renaming columns: Handle with extra care — you usually have to coordinate the schema change and consumer update deployments

⚠️ WAL retention matters

If the Debezium connector is offline for too long, PostgreSQL may recycle WAL segments. When it restarts, the old offset can't be found → forced re-snapshot. Set wal_keep_size large enough (or use a replication slot) and monitor pg_replication_slots regularly. In production, set max_slot_wal_keep_size to avoid disks filling up from accumulated WAL.

9. Consuming CDC in a .NET Application

In the .NET ecosystem, there are several ways to consume CDC events. Below is a pattern using the Confluent.Kafka NuGet package combined with a hosted service:

// Program.cs - Register the CDC consumer service
builder.Services.AddSingleton<IConsumer<string, string>>(sp =>
{
    var config = new ConsumerConfig
    {
        BootstrapServers = "kafka:9092",
        GroupId = "order-projection-service",
        AutoOffsetReset = AutoOffsetReset.Earliest,
        EnableAutoCommit = false, // Manual commit for exactly-once
        EnablePartitionEof = true
    };
    return new ConsumerBuilder<string, string>(config).Build();
});

builder.Services.AddHostedService<OrderProjectionService>();

// OrderProjectionService.cs
public class OrderProjectionService : BackgroundService
{
    private readonly IConsumer<string, string> _consumer;
    private readonly IServiceScopeFactory _scopeFactory;
    private readonly ILogger<OrderProjectionService> _logger;

    protected override async Task ExecuteAsync(
        CancellationToken ct)
    {
        _consumer.Subscribe(new[] {
            "myapp.public.orders",
            "myapp.public.order_items"
        });

        while (!ct.IsCancellationRequested)
        {
            try
            {
                var result = _consumer.Consume(ct);
                if (result.IsPartitionEOF) continue;

                var evt = JsonSerializer
                    .Deserialize<CdcEvent>(result.Message.Value);

                using var scope = _scopeFactory.CreateScope();
                var handler = scope.ServiceProvider
                    .GetRequiredService<ICdcEventHandler>();

                await handler.HandleAsync(evt);

                // Commit offset after successful processing
                _consumer.Commit(result);
            }
            catch (ConsumeException ex)
            {
                _logger.LogError(ex,
                    "CDC consume error on {Topic}",
                    ex.ConsumerRecord?.Topic);
                // Implement retry / dead letter logic
            }
        }
    }
}

// ICdcEventHandler implementation
public class OrderCdcHandler : ICdcEventHandler
{
    private readonly ElasticClient _elastic;

    public async Task HandleAsync(CdcEvent evt)
    {
        if (evt.Source.Table == "orders")
        {
            switch (evt.Op)
            {
                case "c":
                case "u":
                    await _elastic.IndexDocumentAsync(
                        MapToSearchModel(evt.After));
                    break;
                case "d":
                    await _elastic.DeleteAsync<OrderSearch>(
                        evt.Before.Id);
                    break;
            }
        }
    }
}

10. CDC Tools in 2026

ToolTypeSupported databasesSinksPricing
Debezium Open source PostgreSQL, MySQL, SQL Server, MongoDB, Oracle, Cassandra, DB2 Kafka, Redis, HTTP, Kinesis, Pub/Sub, Event Hubs Free
AWS DMS Managed Most RDBMSes + MongoDB + S3 RDS, Redshift, S3, Kinesis, OpenSearch Pay-per-use (instance hours)
Fivetran SaaS 200+ connectors Snowflake, BigQuery, Redshift, Databricks Credits-based ($$)
Airbyte Open source + Cloud 300+ connectors Warehouses, lakehouses Free (self-hosted) / Credits (cloud)
RisingWave Open source PostgreSQL, MySQL via CDC Stream processing + materialized views Free (self-hosted)

💡 Which one should you pick?

Debezium — when you need full control, self-hosting, event-driven architecture, or already run Kafka. AWS DMS — for migration or replication between AWS services. Fivetran/Airbyte — when you focus on data warehouse/lakehouse with many SaaS connectors. RisingWave — when you need stream processing directly on the CDC stream.

11. Common Pitfalls and Fixes

1. A "stuck" Replication Slot

Symptom: Disk usage climbs continuously because PostgreSQL retains WAL for an inactive slot.
Cause: Connector crash or massive consumer lag.
Fix: Monitor pg_replication_slots, set max_slot_wal_keep_size, and alert when pg_wal_lsn_diff() exceeds thresholds.

2. Snapshots too slow on huge tables

Symptom: Initial snapshot takes hours on tables with hundreds of millions of rows.
Fix: Use snapshot.mode=no_data and backfill separately with a batch job. Or tune snapshot.fetch.size and snapshot.max.threads.

3. Wrong event ordering

Symptom: Consumers receive UPDATE before INSERT.
Cause: Multiple Kafka partitions cause events for the same entity to land in different partitions.
Fix: Ensure the message key is the primary key — Kafka guarantees ordering within a partition for a given key.

4. Schema changes crash the connector

Symptom: The connector crashes after an ALTER TABLE.
Fix: Enable schema.history.internal topic; set schema.history.internal.store.only.captured.tables.ddl=true to reduce noise. Test schema changes on staging first.

12. Conclusion

Change Data Capture with Debezium has become an indispensable building block in modern microservices architecture. Instead of forcing applications to "know about" every downstream system, CDC lets the database become the single event source — simple, reliable, and easy to scale.

Key takeaways:

  • Log-based CDC is the only choice for production — zero overhead, captures every change, preserves order
  • Outbox Pattern + Debezium solves the atomic event-publishing problem that dual write can't
  • Debezium Server opens up Kafka-less CDC for small teams or cloud-native workloads
  • Monitoring is mandatory — CDC lag = stale downstream data; alert on MilliSecondsBehindSource and WAL growth
  • Combining CDC with CQRS, cache invalidation, and search sync produces a complete event-driven architecture without modifying application code

CDC is not a solution for every data integration problem. But when you need real-time data synchronization with strong consistency guarantees, Debezium + Kafka (or Debezium Server) is the most worthwhile stack to invest in as of 2026.

References