Change Data Capture with Debezium — Real-Time Data Sync for Microservices
Posted on: 4/18/2026 12:10:44 PM
Table of contents
- 1. What Is Change Data Capture and Why It Matters
- 2. CDC Methods: From Crude to Professional
- 3. Debezium: Architecture and How It Works
- 4. Debezium Connectors: Multi-Database Support
- 5. Hands-on: Debezium + PostgreSQL
- 6. Advanced CDC Patterns in Production
- 7. Debezium Server: CDC Without Kafka
- 8. Production Best Practices
- 9. Consuming CDC in a .NET Application
- 10. CDC Tools in 2026
- 11. Common Pitfalls and Fixes
- 12. Conclusion
- References
1. What Is Change Data Capture and Why It Matters
Change Data Capture (CDC) is the technique of detecting and recording every change (INSERT, UPDATE, DELETE) that happens inside a database, then streaming them as events to downstream systems. Instead of systems constantly polling the database asking "anything new?", CDC pushes changes out as soon as they happen — turning the database into a real-time event source.
In modern microservices architecture, CDC solves a core problem: how do we keep data in sync between services without tight coupling? When Service A writes data to PostgreSQL, Service B (search), Service C (analytics), and Service D (cache) all need to know about that change — but Service A shouldn't have to call each of them. CDC solves this by reading the database's transaction log and emitting events automatically.
💡 Why not use Dual Write?
Many teams go with "write to DB, then publish an event" (dual write). But this pattern doesn't guarantee atomicity — if the DB commit succeeds but the message broker fails, data becomes inconsistent. CDC eliminates this entirely because it only reads from the already-committed transaction log.
Main CDC Use Cases
CDC isn't only for data sync. Below are common production scenarios:
- Event-Driven Architecture: Turn every database change into a domain event so microservices can react in real time without calling each other's APIs
- Cache Invalidation: Automatically invalidate or update Redis/Memcached when source data changes — definitively solving cache consistency
- Search Index Sync: Synchronize data from an OLTP database to Elasticsearch/OpenSearch in near real time, without batch reindexing
- Data Warehouse / Lakehouse: Stream data from operational DBs to ClickHouse, Apache Iceberg, or BigQuery for analytics
- Audit Trail: Capture a complete change history — who changed what, when, and with what old/new values
- Database Migration: Zero-downtime migration between engines by CDC-ing from source to target, then cutting over once synced
2. CDC Methods: From Crude to Professional
There are 4 main CDC methods, each with trade-offs in performance, accuracy, and complexity:
graph LR
A["Timestamp-based
Polling"] --> B["Trigger-based
CDC"]
B --> C["Query-based
CDC"]
C --> D["Log-based
CDC"]
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style D fill:#e94560,stroke:#fff,color:#fff
CDC methods ranked by sophistication — Log-based is the production-grade option
| Method | How it works | Pros | Cons |
|---|---|---|---|
| Timestamp-based | Query records with updated_at > last check |
Simple, no special tools | Misses DELETE, relies on a timestamp column, loads the DB |
| Trigger-based | DB triggers write changes into a shadow table | Captures every kind of change | Heavy overhead on the write path, hard to maintain, not portable |
| Query-based | Periodic polling with diff detection | Works with any DB | High latency (minutes), loads the DB for large tables, misses intermediate changes |
| Log-based | Reads the transaction log (WAL / binlog / oplog) | Zero impact on the DB, captures every change, exact ordering, sub-second latency | More complex, depends on DB-specific log formats |
✅ Best Practice
For any production workload, always choose log-based CDC. The other methods only suit development/prototyping or databases without logical replication. Log-based CDC is the only one that guarantees zero overhead, captures DELETEs, keeps transaction order intact, and delivers sub-second latency.
3. Debezium: Architecture and How It Works
Debezium is the most popular open-source CDC platform today, developed by Red Hat and the community. Debezium reads a database's transaction log, converts each row-level change into a structured event, and publishes it to Kafka (or other messaging systems via Debezium Server).
Overall Architecture
graph TB
subgraph Sources["Source Databases"]
PG["PostgreSQL
WAL"]
MY["MySQL
Binlog"]
MG["MongoDB
Oplog"]
SS["SQL Server
CT/CDC"]
end
subgraph KC["Kafka Connect Cluster"]
DC1["Debezium
Connector 1"]
DC2["Debezium
Connector 2"]
DC3["Debezium
Connector N"]
end
subgraph Kafka["Apache Kafka"]
T1["Topic: db.schema.users"]
T2["Topic: db.schema.orders"]
T3["Topic: db.schema.products"]
SR["Schema Registry"]
end
subgraph Consumers["Downstream Consumers"]
ES["Elasticsearch"]
RD["Redis Cache"]
CH["ClickHouse"]
MS["Microservices"]
end
PG --> DC1
MY --> DC2
MG --> DC3
SS --> DC1
DC1 --> T1
DC2 --> T2
DC3 --> T3
DC1 --> SR
DC2 --> SR
T1 --> ES
T2 --> RD
T1 --> CH
T3 --> MS
style PG fill:#336791,stroke:#fff,color:#fff
style MY fill:#4479A1,stroke:#fff,color:#fff
style MG fill:#4DB33D,stroke:#fff,color:#fff
style SS fill:#CC2927,stroke:#fff,color:#fff
style DC1 fill:#e94560,stroke:#fff,color:#fff
style DC2 fill:#e94560,stroke:#fff,color:#fff
style DC3 fill:#e94560,stroke:#fff,color:#fff
style T1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style T2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style T3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style SR fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style ES fill:#2c3e50,stroke:#fff,color:#fff
style RD fill:#DC382D,stroke:#fff,color:#fff
style CH fill:#FFCC01,stroke:#2c3e50,color:#2c3e50
style MS fill:#2c3e50,stroke:#fff,color:#fff
End-to-end architecture: Debezium reads the transaction log → publishes events to Kafka → downstream consumers process them
Three main components in the Debezium architecture:
- Debezium Connector: Runs as a Kafka Connect plugin. Each connector watches one database instance, reads its transaction log, parses it into change events, and produces them to Kafka topics
- Apache Kafka: Acts as the central message broker. Each table in the source DB maps to one Kafka topic. Kafka provides durability, ordering, and lets many consumers read the same stream
- Schema Registry: Stores the change-event schema (Avro / JSON Schema / Protobuf), so consumers can deserialize correctly and handle schema evolution
Structure of a Change Event
Each Debezium event has a standard structure with a key (record primary key) and a value (change details):
{
"schema": { ... },
"payload": {
"before": {
"id": 1001,
"name": "Nguyen Van A",
"email": "a@example.com",
"status": "active"
},
"after": {
"id": 1001,
"name": "Nguyen Van A",
"email": "newemail@example.com",
"status": "active"
},
"source": {
"version": "2.7.0",
"connector": "postgresql",
"name": "myapp",
"ts_ms": 1713436800000,
"db": "mydb",
"schema": "public",
"table": "users",
"lsn": 123456789,
"txId": 5678
},
"op": "u",
"ts_ms": 1713436800123,
"transaction": {
"id": "5678",
"total_order": 3,
"data_collection_order": 1
}
}
}The op field indicates the operation type: "c" = create (INSERT), "u" = update, "d" = delete, "r" = read (snapshot). The before and after fields hold the row state before and after the change — particularly useful for audit trails and diff detection.
4. Debezium Connectors: Multi-Database Support
Debezium supports most popular databases, with each connector leveraging that engine's native replication:
| Database | CDC mechanism | Notable details |
|---|---|---|
| PostgreSQL | Logical Replication (pgoutput / wal2json) | Best-supported; use publications to filter tables. Requires wal_level=logical |
| MySQL/MariaDB | Binlog (ROW format) | Reads binary logs with GTID tracking. Requires binlog_format=ROW and binlog_row_image=FULL |
| SQL Server | Change Tracking / CDC tables | Uses SQL Server's built-in CDC feature. Requires CDC enabled on both the database and each target table |
| MongoDB | Oplog / Change Streams | Uses the Change Streams API (MongoDB 3.6+). Supports resume tokens to avoid event loss |
| Oracle | LogMiner / XStream | Reads redo logs via the LogMiner API. Requires supplemental logging enabled |
| Cassandra | Commit Log | Community connector that reads commit-log segments. Suitable for wide-column store CDC |
⚠️ SQL Server caveat
SQL Server CDC requires the SQL Server Agent to be running and CDC enabled on the database (sys.sp_cdc_enable_db). For Azure SQL Database you need the Standard tier or higher. Debezium reads from the change tables (cdc.*) rather than the transaction log directly.
5. Hands-on: Debezium + PostgreSQL
Below is a complete CDC pipeline setup with Docker Compose — from a PostgreSQL source to a Kafka consumer:
Docker Compose for a CDC environment
version: '3.8'
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: myapp
POSTGRES_USER: appuser
POSTGRES_PASSWORD: secret
command:
- "postgres"
- "-c"
- "wal_level=logical"
- "-c"
- "max_replication_slots=4"
- "-c"
- "max_wal_senders=4"
ports:
- "5432:5432"
kafka:
image: bitnami/kafka:3.7
environment:
KAFKA_CFG_NODE_ID: 0
KAFKA_CFG_PROCESS_ROLES: controller,broker
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 0@kafka:9093
KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
ports:
- "9092:9092"
connect:
image: debezium/connect:2.7
depends_on:
- kafka
- postgres
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: cdc-connect-cluster
CONFIG_STORAGE_TOPIC: _connect-configs
OFFSET_STORAGE_TOPIC: _connect-offsets
STATUS_STORAGE_TOPIC: _connect-status
ports:
- "8083:8083"Registering the Debezium Connector
Once containers are up, register the connector via its REST API:
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "myapp-postgres-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "appuser",
"database.password": "secret",
"database.dbname": "myapp",
"topic.prefix": "myapp",
"schema.include.list": "public",
"table.include.list": "public.users,public.orders",
"plugin.name": "pgoutput",
"slot.name": "debezium_slot",
"publication.name": "debezium_pub",
"snapshot.mode": "initial",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false"
}
}'Important config keys:
topic.prefix: Prefix for Kafka topic names — events frompublic.usersland in topicmyapp.public.userssnapshot.mode: initial: On first run, Debezium snapshots existing data before switching to WAL streamingplugin.name: pgoutput: Uses PostgreSQL's built-in logical decoding plugin (no extra extension needed)ExtractNewRecordState: Transform that flattens events — instead of a before/after envelope, consumers receive the new record state directly
6. Advanced CDC Patterns in Production
The Transactional Outbox Pattern
The Outbox Pattern is one of the most important CDC applications. Instead of publishing events directly to a broker (dual write), the service writes events into an outbox table inside the same transaction as the business data. Debezium reads the outbox table and publishes events to Kafka.
sequenceDiagram
participant App as Application
participant DB as PostgreSQL
participant DBZ as Debezium
participant K as Kafka
participant C as Consumer
App->>DB: BEGIN TRANSACTION
App->>DB: INSERT INTO orders (...)
App->>DB: INSERT INTO outbox (aggregate_type, payload)
App->>DB: COMMIT
DBZ->>DB: Read WAL (outbox table changes)
DBZ->>K: Publish OrderCreated event
K->>C: Consume event
C->>C: Update search index / Send email / Update cache
Outbox Pattern: events are written atomically with business data; Debezium relays them to Kafka
Debezium ships an Outbox Event Router SMT (Single Message Transform) for this pattern:
-- Outbox table schema
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(255) NOT NULL,
aggregate_id VARCHAR(255) NOT NULL,
type VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Example: creating an order
BEGIN;
INSERT INTO orders (id, customer_id, total, status)
VALUES ('ord-001', 'cust-123', 599000, 'pending');
INSERT INTO outbox (aggregate_type, aggregate_id, type, payload)
VALUES (
'Order', 'ord-001', 'OrderCreated',
'{"orderId":"ord-001","customerId":"cust-123","total":599000}'::jsonb
);
COMMIT;Connector config for the Outbox:
{
"transforms": "outbox",
"transforms.outbox.type":
"io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.table.field.event.id": "id",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.type": "type",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.regex": "(.*)Order(.*)",
"transforms.outbox.route.topic.replacement": "order-events"
}✅ Why is the Outbox Pattern better than Dual Write?
Atomicity: Business data and event live in the same transaction — you can't have committed data with a lost event. Idempotency: Debezium tracks offsets in the WAL, so after a restart it replays from where it left off, without duplication. Ordering: Events preserve exact transaction commit order as written in the WAL.
CQRS + Materialized View with CDC
CDC is a natural bridge for CQRS (Command Query Responsibility Segregation). The write side writes to an OLTP database (PostgreSQL, SQL Server); CDC streams changes to read-optimized stores (Elasticsearch, Redis, ClickHouse):
graph LR
subgraph Write["Write Side"]
API["API
.NET Core"] --> PG["PostgreSQL
OLTP"]
end
subgraph CDC["CDC Pipeline"]
PG --> DBZ["Debezium"]
DBZ --> KF["Kafka"]
end
subgraph Read["Read Side (Materialized Views)"]
KF --> ES["Elasticsearch
Full-text Search"]
KF --> RD["Redis
Hot Data Cache"]
KF --> CK["ClickHouse
Analytics"]
end
style API fill:#512BD4,stroke:#fff,color:#fff
style PG fill:#336791,stroke:#fff,color:#fff
style DBZ fill:#e94560,stroke:#fff,color:#fff
style KF fill:#2c3e50,stroke:#fff,color:#fff
style ES fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style RD fill:#DC382D,stroke:#fff,color:#fff
style CK fill:#FFCC01,stroke:#2c3e50,color:#2c3e50
CQRS with CDC: write to PostgreSQL, then CDC automatically syncs to read stores
CDC-based Cache Invalidation
Cache invalidation is one of the hardest problems in distributed systems. With CDC you don't need invalidation logic in application code — Debezium emits events on data changes, and consumers listen and update/invalidate the cache:
// .NET consumer that invalidates Redis cache from CDC events
public class CdcCacheInvalidator : BackgroundService
{
private readonly IConsumer<string, string> _consumer;
private readonly IConnectionMultiplexer _redis;
protected override async Task ExecuteAsync(
CancellationToken stoppingToken)
{
_consumer.Subscribe("myapp.public.products");
while (!stoppingToken.IsCancellationRequested)
{
var result = _consumer.Consume(stoppingToken);
var change = JsonSerializer
.Deserialize<DebeziumEvent>(result.Message.Value);
var db = _redis.GetDatabase();
var productId = change.After?.Id ?? change.Before?.Id;
switch (change.Op)
{
case "c":
case "u":
// Upsert into cache with TTL
await db.StringSetAsync(
$"product:{productId}",
JsonSerializer.Serialize(change.After),
TimeSpan.FromMinutes(30));
break;
case "d":
// Remove from cache
await db.KeyDeleteAsync($"product:{productId}");
break;
}
}
}
}7. Debezium Server: CDC Without Kafka
Not every organization wants to operate a Kafka cluster. Debezium Server is a standalone application that runs Debezium connectors without Kafka Connect — instead, events go directly to sinks such as Redis Streams, Amazon Kinesis, Google Pub/Sub, Azure Event Hubs, or HTTP webhooks.
graph LR
DB["Source DB"] --> DS["Debezium
Server"]
DS --> RS["Redis Streams"]
DS --> KN["Amazon Kinesis"]
DS --> PB["Google Pub/Sub"]
DS --> EH["Azure Event Hubs"]
DS --> HT["HTTP Webhook"]
style DB fill:#336791,stroke:#fff,color:#fff
style DS fill:#e94560,stroke:#fff,color:#fff
style RS fill:#DC382D,stroke:#fff,color:#fff
style KN fill:#FF9900,stroke:#fff,color:#fff
style PB fill:#4285F4,stroke:#fff,color:#fff
style EH fill:#0078D4,stroke:#fff,color:#fff
style HT fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Debezium Server: lightweight deployment sinking directly to cloud services
# application.properties for Debezium Server debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector debezium.source.database.hostname=localhost debezium.source.database.port=5432 debezium.source.database.user=appuser debezium.source.database.password=secret debezium.source.database.dbname=myapp debezium.source.topic.prefix=myapp debezium.source.plugin.name=pgoutput debezium.source.offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore debezium.source.offset.storage.file.filename=/data/offsets.dat # Sink: Redis Streams debezium.sink.type=redis debezium.sink.redis.address=localhost:6379 # Or sink: HTTP Webhook # debezium.sink.type=http # debezium.sink.http.url=https://api.example.com/webhooks/cdc
💡 When to use Debezium Server instead of Kafka Connect?
Pick Debezium Server when the team is small and doesn't want to operate Kafka, you're already on managed cloud messaging (Kinesis, Event Hubs), or you only need CDC for 1-2 simple databases. Pick Kafka Connect when you need many connectors, require exactly-once semantics, need to replay events from Kafka topics, or already have Kafka infrastructure.
8. Production Best Practices
Snapshot Strategy
When a Debezium connector first starts, it needs to snapshot the existing data. There are several snapshot modes:
| Mode | Description | Use case |
|---|---|---|
initial |
Snapshot everything, then switch to streaming | Initial setup, need all existing data |
initial_only |
Snapshot only, don't stream afterward | One-time data migration |
when_needed |
Re-snapshot if offset is lost | Fault-tolerant production setups |
no_data |
Capture schema only, skip existing data | Only interested in changes after setup |
recovery |
Re-snapshot when a WAL gap is detected | Recovery after WAL truncation incidents |
Monitoring and Observability
A CDC pipeline is a critical component — if Debezium stops or lags, downstream systems serve stale data. Metrics to monitor:
# Prometheus scrape config for Debezium metrics
- job_name: 'debezium'
metrics_path: '/metrics'
static_configs:
- targets: ['connect:8083']
# Important alert rules
groups:
- name: debezium_alerts
rules:
- alert: DebeziumHighLag
expr: debezium_metrics_MilliSecondsBehindSource > 30000
for: 5m
labels:
severity: warning
annotations:
summary: "Debezium lag > 30s on {{ $labels.context }}"
- alert: DebeziumConnectorDown
expr: debezium_metrics_Connected == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Debezium connector lost DB connection"Schema Evolution
Database schemas change constantly (ALTER TABLE, ADD COLUMN...). Debezium handles this via the Schema History Topic — storing every DDL change so a restarted connector can rebuild the schema. Combined with a Schema Registry, consumers can handle schema evolution gracefully:
- Adding columns: New events contain the new field; older consumers ignore unknown fields (forward compatibility)
- Removing columns: New consumers handle missing fields using default values (backward compatibility)
- Renaming columns: Handle with extra care — you usually have to coordinate the schema change and consumer update deployments
⚠️ WAL retention matters
If the Debezium connector is offline for too long, PostgreSQL may recycle WAL segments. When it restarts, the old offset can't be found → forced re-snapshot. Set wal_keep_size large enough (or use a replication slot) and monitor pg_replication_slots regularly. In production, set max_slot_wal_keep_size to avoid disks filling up from accumulated WAL.
9. Consuming CDC in a .NET Application
In the .NET ecosystem, there are several ways to consume CDC events. Below is a pattern using the Confluent.Kafka NuGet package combined with a hosted service:
// Program.cs - Register the CDC consumer service
builder.Services.AddSingleton<IConsumer<string, string>>(sp =>
{
var config = new ConsumerConfig
{
BootstrapServers = "kafka:9092",
GroupId = "order-projection-service",
AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoCommit = false, // Manual commit for exactly-once
EnablePartitionEof = true
};
return new ConsumerBuilder<string, string>(config).Build();
});
builder.Services.AddHostedService<OrderProjectionService>();
// OrderProjectionService.cs
public class OrderProjectionService : BackgroundService
{
private readonly IConsumer<string, string> _consumer;
private readonly IServiceScopeFactory _scopeFactory;
private readonly ILogger<OrderProjectionService> _logger;
protected override async Task ExecuteAsync(
CancellationToken ct)
{
_consumer.Subscribe(new[] {
"myapp.public.orders",
"myapp.public.order_items"
});
while (!ct.IsCancellationRequested)
{
try
{
var result = _consumer.Consume(ct);
if (result.IsPartitionEOF) continue;
var evt = JsonSerializer
.Deserialize<CdcEvent>(result.Message.Value);
using var scope = _scopeFactory.CreateScope();
var handler = scope.ServiceProvider
.GetRequiredService<ICdcEventHandler>();
await handler.HandleAsync(evt);
// Commit offset after successful processing
_consumer.Commit(result);
}
catch (ConsumeException ex)
{
_logger.LogError(ex,
"CDC consume error on {Topic}",
ex.ConsumerRecord?.Topic);
// Implement retry / dead letter logic
}
}
}
}
// ICdcEventHandler implementation
public class OrderCdcHandler : ICdcEventHandler
{
private readonly ElasticClient _elastic;
public async Task HandleAsync(CdcEvent evt)
{
if (evt.Source.Table == "orders")
{
switch (evt.Op)
{
case "c":
case "u":
await _elastic.IndexDocumentAsync(
MapToSearchModel(evt.After));
break;
case "d":
await _elastic.DeleteAsync<OrderSearch>(
evt.Before.Id);
break;
}
}
}
}10. CDC Tools in 2026
| Tool | Type | Supported databases | Sinks | Pricing |
|---|---|---|---|---|
| Debezium | Open source | PostgreSQL, MySQL, SQL Server, MongoDB, Oracle, Cassandra, DB2 | Kafka, Redis, HTTP, Kinesis, Pub/Sub, Event Hubs | Free |
| AWS DMS | Managed | Most RDBMSes + MongoDB + S3 | RDS, Redshift, S3, Kinesis, OpenSearch | Pay-per-use (instance hours) |
| Fivetran | SaaS | 200+ connectors | Snowflake, BigQuery, Redshift, Databricks | Credits-based ($$) |
| Airbyte | Open source + Cloud | 300+ connectors | Warehouses, lakehouses | Free (self-hosted) / Credits (cloud) |
| RisingWave | Open source | PostgreSQL, MySQL via CDC | Stream processing + materialized views | Free (self-hosted) |
💡 Which one should you pick?
Debezium — when you need full control, self-hosting, event-driven architecture, or already run Kafka. AWS DMS — for migration or replication between AWS services. Fivetran/Airbyte — when you focus on data warehouse/lakehouse with many SaaS connectors. RisingWave — when you need stream processing directly on the CDC stream.
11. Common Pitfalls and Fixes
1. A "stuck" Replication Slot
Symptom: Disk usage climbs continuously because PostgreSQL retains WAL for an inactive slot.
Cause: Connector crash or massive consumer lag.
Fix: Monitor pg_replication_slots, set max_slot_wal_keep_size, and alert when pg_wal_lsn_diff() exceeds thresholds.
2. Snapshots too slow on huge tables
Symptom: Initial snapshot takes hours on tables with hundreds of millions of rows.
Fix: Use snapshot.mode=no_data and backfill separately with a batch job. Or tune snapshot.fetch.size and snapshot.max.threads.
3. Wrong event ordering
Symptom: Consumers receive UPDATE before INSERT.
Cause: Multiple Kafka partitions cause events for the same entity to land in different partitions.
Fix: Ensure the message key is the primary key — Kafka guarantees ordering within a partition for a given key.
4. Schema changes crash the connector
Symptom: The connector crashes after an ALTER TABLE.
Fix: Enable schema.history.internal topic; set schema.history.internal.store.only.captured.tables.ddl=true to reduce noise. Test schema changes on staging first.
12. Conclusion
Change Data Capture with Debezium has become an indispensable building block in modern microservices architecture. Instead of forcing applications to "know about" every downstream system, CDC lets the database become the single event source — simple, reliable, and easy to scale.
Key takeaways:
- Log-based CDC is the only choice for production — zero overhead, captures every change, preserves order
- Outbox Pattern + Debezium solves the atomic event-publishing problem that dual write can't
- Debezium Server opens up Kafka-less CDC for small teams or cloud-native workloads
- Monitoring is mandatory — CDC lag = stale downstream data; alert on MilliSecondsBehindSource and WAL growth
- Combining CDC with CQRS, cache invalidation, and search sync produces a complete event-driven architecture without modifying application code
CDC is not a solution for every data integration problem. But when you need real-time data synchronization with strong consistency guarantees, Debezium + Kafka (or Debezium Server) is the most worthwhile stack to invest in as of 2026.
References
- Debezium Official Documentation — Features
- Conduktor — Implementing CDC with Debezium
- Factor House — Complete Guide to Kafka CDC (2026)
- Redpanda — CDC Approaches, Architectures, and Best Practices
- Hevo Data — Change Data Capture in 2026
- RisingWave — Data Integration for Streaming: Tools, Patterns (2026)
DynamoDB Single-Table Design — The Art of NoSQL Modeling for Large-Scale Systems
Speculation Rules API — Lightning-Fast Web Navigation with Prefetch and Prerender
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.