Database Migration không Downtime: Expand-Contract, EF Core và Batch Backfill cho Production

Posted on: 4/22/2026 11:16:02 AM

Table of contents

1. Tại sao Database Migration là điểm chết của Zero-Downtime Deployment?
1. Anti-pattern phổ biến
2. Expand-Contract Pattern — Nền tảng của mọi Zero-Downtime Migration
3. Các kịch bản Migration phổ biến và cách xử lý
4. EF Core Migration Workflow cho Production
5. SQL Server — Online Operations và Lock Management
6. Online Schema Change Tools — Khi DDL gốc không đủ
1. Với SQL Server thì sao?
7. Blue-Green Database Pattern
1. Hạn chế Blue-Green Database
8. Backfill Data an toàn — Nghệ thuật của Batch Processing
1. 8.1 Pattern: Batched Backfill với Throttling
2. 8.2 Monitor Backfill Progress
9. Rollback Strategy — Chuẩn bị cho khi mọi thứ đi sai
1. 9.1 Forward-Only Migration
2. 9.2 Khi nào cần Rollback Plan
  1. Checkpoint trước mỗi migration
10. CI/CD Pipeline cho Database Migration
11. Checklist Migration an toàn
12. Tổng kết
1. Bắt đầu từ đâu?
Nguồn tham khảo

Hầu hết developer đều quen với việc chạy dotnet ef database update rồi deploy code mới. Nhưng trên production với hàng triệu request mỗi giờ, một câu lệnh ALTER TABLE ADD COLUMN ... NOT NULL có thể lock bảng trong hàng phút — và mỗi phút downtime có thể thiệt hại hàng nghìn đô. Bài viết này đi sâu vào các chiến lược thay đổi schema database mà không cần downtime, từ pattern cốt lõi đến implementation cụ thể trên .NET 10 và SQL Server.

99.99%Uptime yêu cầu = chỉ 52 phút downtime/năm

~$5,600Chi phí trung bình mỗi phút downtime (SMB)

3 phaExpand → Migrate → Contract

0 lockMục tiêu: không blocking lock trên production

1. Tại sao Database Migration là điểm chết của Zero-Downtime Deployment?

Bạn đã setup Blue-Green deployment hoàn hảo, CI/CD pipeline chạy mượt, container scale tự động. Nhưng khi deploy phiên bản mới cần thêm một cột vào bảng có 50 triệu rows — mọi thứ sụp đổ. Lý do:

Schema lock: Nhiều DDL statement yêu cầu Schema Modification Lock (Sch-M) trên SQL Server, block mọi query đang chạy trên bảng đó.
Version mismatch: Code mới expect cột mới, code cũ chưa biết cột đó tồn tại. Trong rolling deployment, hai version chạy đồng thời → crash hoặc data inconsistency.
Rollback phức tạp: Không giống code có thể rollback bằng git revert, schema change đã apply thì dữ liệu đã biến đổi — rollback có thể mất data.
Coupling chặt: Khi migration chạy cùng application startup (auto-migrate), một pod chạy migration trong khi 9 pod khác đang serve traffic với schema cũ.

Anti-pattern phổ biến

Đừng bao giờ để EF Core tự chạy Database.Migrate() trong Program.cs trên production. Điều này nghĩa là pod đầu tiên startup sẽ chạy migration trong khi traffic đang vào — tạo race condition giữa schema change và request handling.

2. Expand-Contract Pattern — Nền tảng của mọi Zero-Downtime Migration

Expand-Contract (hay còn gọi Parallel Change) là pattern cốt lõi: thay vì thay đổi trực tiếp (destructive change), bạn tách thành 3 giai đoạn riêng biệt, mỗi giai đoạn đều backward-compatible.

graph LR
    A["Trạng thái ban đầu
Schema V1 + Code V1"] --> B["EXPAND
Schema V2 (thêm mới)
Code V1 vẫn hoạt động"]
    B --> C["MIGRATE
Schema V2
Code V2 (dùng cả cũ+mới)
Backfill data"]
    C --> D["CONTRACT
Schema V3 (xóa cũ)
Code V3 (chỉ dùng m���i)"]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style C fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style D fill:#e3f2fd,stroke:#2196F3,color:#2c3e50

Ba giai đoạn của Expand-Contract Pattern

2.1 Pha Expand — Thêm mà không phá

Thêm các thành phần mới (cột, bảng, index) mà không xóa hoặc sửa bất cứ thứ gì hiện có. Code cũ (V1) vẫn hoạt động bình thường vì nó không biết đến cột mới — cột mới phải NULLABLE hoặc có DEFAULT.

-- Expand: thêm cột mới, nullable, không break code cũ
ALTER TABLE dbo.Users ADD Email NVARCHAR(256) NULL;

-- Thêm index ONLINE để không block read/write
CREATE NONCLUSTERED INDEX IX_Users_Email
ON dbo.Users (Email)
WITH (ONLINE = ON);

2.2 Pha Migrate — Chuyển đổi dần dần

Deploy code mới ghi vào cả cột cũ và cột mới (dual-write). Đồng thời chạy background job backfill dữ liệu từ cũ sang mới cho các row hiện có. Khi 100% data đã được migrate, code mới bắt đầu đọc từ cột mới.

// Dual-write trong application code
public async Task UpdateUser(int userId, string newEmail)
{
    // Ghi cả hai cột trong giai đoạn chuyển tiếp
    await _db.ExecuteAsync(@"
        UPDATE Users
        SET Email = @Email,           -- cột mới
            ContactInfo = @Email      -- cột cũ (backward compat)
        WHERE Id = @UserId",
        new { Email = newEmail, UserId = userId });
}

2.3 Pha Contract — Dọn dẹp

Sau khi chắc chắn không còn code nào đọc/ghi cột cũ (monitoring đủ lâu, thường 1–2 sprint), xóa cột cũ và constraint không cần thiết.

-- Contract: chỉ chạy sau khi confirm 100% traffic dùng cột mới
ALTER TABLE dbo.Users DROP COLUMN ContactInfo;

Quy tắc vàng

Mỗi deployment chỉ nên thực hiện một pha duy nhất. Deploy 1: Expand (thêm cột). Deploy 2: Migrate (code mới + backfill). Deploy 3: Contract (xóa cũ). Không bao giờ gộp Expand + Contract vào một release.

3. Các kịch bản Migration phổ biến và cách xử lý

3.1 Thêm cột mới (Add Column)

Đây là migration đơn giản nhất nhưng hay bị làm sai. Sai lầm phổ biến: thêm cột NOT NULL mà không có DEFAULT.

Cách làm	Downtime?	Giải thích
`ADD col INT NOT NULL`	Có — lock toàn bảng	SQL Server phải scan toàn bộ rows để verify constraint
`ADD col INT NULL`	Không — metadata-only	Chỉ thay đổi metadata, không scan data
`ADD col INT NOT NULL DEFAULT 0`	Không (SQL Server 2012+)	Default được lưu ở metadata, không backfill ngay
`ADD col INT NULL` → backfill → `ALTER NOT NULL`	Tùy — pha cuối cần Sch-M	An toàn nhất cho bảng lớn, backfill theo batch

3.2 Đổi tên cột (Rename Column)

Không bao giờ dùng sp_rename trực tiếp trên production. Thay vào đó, dùng Expand-Contract:

Expand: Thêm cột mới với tên mới, trigger sync data từ cột cũ sang mới.
Migrate: Code mới đọc/ghi cột mới. Backfill toàn bộ data cũ.
Contract: Xóa trigger và cột cũ.

-- Expand: thêm cột mới + trigger đồng bộ
ALTER TABLE dbo.Orders ADD CustomerEmail NVARCHAR(256) NULL;

CREATE TRIGGER trg_SyncEmail ON dbo.Orders
AFTER INSERT, UPDATE AS
BEGIN
    SET NOCOUNT ON;
    UPDATE o SET o.CustomerEmail = i.CustEmail
    FROM dbo.Orders o INNER JOIN inserted i ON o.Id = i.Id
    WHERE i.CustEmail IS NOT NULL AND o.CustomerEmail IS NULL;
END;

-- Backfill batch 10,000 rows mỗi lần
WHILE 1=1
BEGIN
    UPDATE TOP (10000) dbo.Orders
    SET CustomerEmail = CustEmail
    WHERE CustomerEmail IS NULL AND CustEmail IS NOT NULL;

    IF @@ROWCOUNT = 0 BREAK;
    WAITFOR DELAY '00:00:01'; -- throttle để giảm load
END;

3.3 Thay đổi kiểu dữ liệu (Change Data Type)

Ví dụ: đổi VARCHAR(50) thành NVARCHAR(256). Đây là migration nguy hiểm vì ALTER COLUMN trên bảng lớn sẽ rebuild toàn bộ.

graph TD
    A["Tạo cột mới
NVARCHAR(256) NULL"] --> B["Deploy code dual-write
Ghi cả cột cũ + mới"]
    B --> C["Backfill batch
Copy data cũ → mới"]
    C --> D{"100% data
đã migrate?"}
    D -- "Chưa" --> C
    D -- "Rồi" --> E["Deploy code read cột mới"]
    E --> F["Monitor 1-2 tuần"]
    F --> G["Drop cột cũ"]

    style A fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style B fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style C fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style F fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style G fill:#fce4ec,stroke:#e94560,color:#2c3e50

Flow chart cho migration thay đổi kiểu dữ liệu

3.4 Xóa cột (Drop Column)

Nghe đơn giản nhưng nếu code đang chạy còn reference cột đó → crash. Quy trình:

Deploy 1: Code mới không còn đọc/ghi cột đó (nhưng cột vẫn tồn tại).
Monitor: Đợi 1–2 tuần, kiểm tra không query nào touch cột đó qua sys.dm_exec_query_stats.
Deploy 2: ALTER TABLE DROP COLUMN.

3.5 Thêm constraint NOT NULL

Thêm NOT NULL constraint vào cột hiện có yêu cầu SQL Server scan toàn bộ bảng để verify không có NULL. Với bảng lớn, điều này gây lock kéo dài.

-- Bước 1: Thêm CHECK constraint WITH NOCHECK (không scan data hiện có)
ALTER TABLE dbo.Users WITH NOCHECK
ADD CONSTRAINT CK_Users_Email_NotNull CHECK (Email IS NOT NULL);

-- Bước 2: Backfill NULL rows (nếu còn)
UPDATE TOP (10000) dbo.Users SET Email = '' WHERE Email IS NULL;

-- Bước 3: Khi 100% data clean, enable constraint verification
ALTER TABLE dbo.Users WITH CHECK
CHECK CONSTRAINT CK_Users_Email_NotNull;

-- Bước 4 (optional): Convert sang NOT NULL nếu cần
ALTER TABLE dbo.Users ALTER COLUMN Email NVARCHAR(256) NOT NULL;

4. EF Core Migration Workflow cho Production

Entity Framework Core là ORM phổ biến nhất trên .NET, nhưng workflow migration mặc định không phù hợp cho zero-downtime. Cần thay đổi cách tiếp cận.

4.1 Tách Migration khỏi Application Startup

// Program.cs — KHÔNG làm thế này trên production
// app.Services.GetRequiredService<AppDbContext>().Database.Migrate();

// Thay vào đó: tạo migration bundle riêng
// Terminal: dotnet ef migrations bundle --self-contained -o migrate.exe
// CI/CD: chạy migrate.exe TRƯỚC khi deploy code mới

Tốt nhất là tạo một Kubernetes Job hoặc Azure Container Instance riêng chỉ để chạy migration:

# k8s-migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration-v42
spec:
  template:
    spec:
      containers:
      - name: migrator
        image: myapp:v42-migrator
        command: ["./migrate"]
        env:
        - name: ConnectionStrings__Default
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: connection-string
      restartPolicy: Never
  backoffLimit: 1

4.2 Viết Migration an toàn với EF Core

EF Core generate migration dựa trên diff giữa model snapshot và model hiện tại. Nhưng generated code thường không zero-downtime safe. Cần review và sửa:

// Migration tự generate — KHÔNG safe
public partial class AddUserEmail : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        // EF tự generate NOT NULL — sẽ lock bảng
        migrationBuilder.AddColumn<string>(
            name: "Email",
            table: "Users",
            type: "nvarchar(256)",
            nullable: false,      // <-- vấn đề ở đây
            defaultValue: "");
    }
}

// Sửa thành safe version
public partial class AddUserEmail : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        // Bước 1: Thêm nullable trước
        migrationBuilder.AddColumn<string>(
            name: "Email",
            table: "Users",
            type: "nvarchar(256)",
            nullable: true);       // nullable = safe

        // Bước 2: Backfill sẽ chạy ở migration riêng sau
    }
}

4.3 Công cụ hỗ trợ: EF Core Migration Analyzers

Sử dụng thư viện như ErikEJ.EFCorePowerTools hoặc custom Roslyn analyzer để tự động phát hiện migration không safe:

// Roslyn analyzer kiểm tra migration có safe không
// Phát hiện: AddColumn với nullable: false mà không có defaultValue
// Phát hiện: DropColumn, DropTable trong cùng migration với AddColumn
// Phát hiện: AlterColumn thay đổi data type

Chiến lược Migration Review

Mỗi PR có migration file phải trả lời 3 câu hỏi: (1) Code cũ có crash không khi schema mới được apply? (2) Code mới có crash không nếu schema cũ vẫn đang chạy? (3) Rollback schema có mất data không? Nếu bất kỳ câu trả lời nào là "có" → cần tách thành multiple deployments.

5. SQL Server — Online Operations và Lock Management

SQL Server có nhiều tính năng hỗ trợ online schema change, nhưng cần hiểu rõ phiên bản nào hỗ trợ gì.

5.1 Online Index Operations

Operation	Online Support	Phiên bản yêu cầu
CREATE INDEX	`WITH (ONLINE = ON)`	Enterprise / Developer
ALTER INDEX REBUILD	`WITH (ONLINE = ON)`	Enterprise / Developer
ALTER COLUMN (data type)	`WITH (ONLINE = ON)`	SQL Server 2016+ Enterprise
ADD COLUMN (nullable)	Tự động online	Mọi edition
ADD COLUMN (NOT NULL + DEFAULT)	Metadata-only (2012+)	Mọi edition từ 2012
DROP COLUMN	Tự động online	Mọi edition
ADD FOREIGN KEY	`WITH NOCHECK` để tránh scan	Mọi edition

5.2 Lock Escalation và cách kiểm soát

SQL Server tự động escalate từ row lock → page lock → table lock khi số lượng lock vượt ngưỡng. Với bảng lớn, một batch update có thể trigger table lock.

-- Kiểm soát lock escalation khi backfill
ALTER TABLE dbo.Orders SET (LOCK_ESCALATION = DISABLE);

-- Backfill theo batch nhỏ, mỗi batch trong transaction riêng
DECLARE @BatchSize INT = 5000;
DECLARE @RowsAffected INT = 1;

WHILE @RowsAffected > 0
BEGIN
    BEGIN TRANSACTION;

    UPDATE TOP (@BatchSize) dbo.Orders
    SET CustomerEmail = CustEmail
    WHERE CustomerEmail IS NULL AND CustEmail IS NOT NULL;

    SET @RowsAffected = @@ROWCOUNT;

    COMMIT TRANSACTION;

    -- Nghỉ giữa các batch để giảm áp lực I/O
    IF @RowsAffected > 0
        WAITFOR DELAY '00:00:02';
END;

-- Bật lại lock escalation
ALTER TABLE dbo.Orders SET (LOCK_ESCALATION = TABLE);

5.3 Monitoring Lock trong quá trình Migration

-- Kiểm tra lock đang active
SELECT
    r.session_id,
    r.blocking_session_id,
    r.wait_type,
    r.wait_time,
    t.text AS query_text,
    r.status
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0;

-- Kiểm tra schema lock
SELECT
    resource_type,
    request_mode,
    request_status,
    request_session_id
FROM sys.dm_tran_locks
WHERE resource_type = 'OBJECT'
AND request_mode IN ('Sch-M', 'Sch-S');

6. Online Schema Change Tools — Khi DDL gốc không đủ

Với MySQL, hai công cụ phổ biến nhất là gh-ost (GitHub) và pt-online-schema-change (Percona). Chúng áp dụng nguyên lý Expand-Contract ở cấp công cụ.

graph TB
    subgraph "gh-ost (GitHub)"
        G1["1. Tạo ghost table
copy schema gốc"] --> G2["2. Thực hiện ALTER
trên ghost table"]
        G2 --> G3["3. Stream binlog
sync data realtime"]
        G3 --> G4["4. Cut-over
rename tables (atomic)"]
    end

    subgraph "pt-online-schema-change (Percona)"
        P1["1. Tạo bảng mới
với schema mới"] --> P2["2. Gắn triggers
sync INSERT/UPDATE/DELETE"]
        P2 --> P3["3. Copy data theo batch
từ bảng gốc"]
        P3 --> P4["4. Swap tables
rename atomic"]
    end

    style G1 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style G2 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style G3 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style G4 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style P1 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style P2 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style P3 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style P4 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50

So sánh cơ chế hoạt động gh-ost vs pt-online-schema-change

Tiêu chí	gh-ost	pt-online-schema-change
Cơ chế sync	Binary log streaming	Triggers trên bảng gốc
Impact lên production	Thấp — không trigger overhead	Trung bình — triggers thêm write latency
Foreign keys	Không hỗ trợ	Hỗ trợ (có hạn chế)
Throttling	Tự động dựa trên replication lag	Manual cấu hình chunk size
Rollback	Drop ghost table	Drop new table + remove triggers
Phù hợp cho	High-traffic, modern infra	Legacy, có FK constraints

Với SQL Server thì sao?

SQL Server không có gh-ost/pt-osc tương đương. Thay vào đó, dùng kết hợp: WITH (ONLINE = ON) cho index operations, metadata-only ADD COLUMN, và batch update cho backfill. Với thay đổi lớn, consider Blue-Green Database pattern: maintain hai database, sync qua CDC hoặc replication.

7. Blue-Green Database Pattern

Khi schema change quá phức tạp cho Expand-Contract đơn thuần (ví dụ: tái cấu trúc hoàn toàn bảng, merge/split bảng), Blue-Green Database là giải pháp cuối cùng.

graph LR
    LB["Load Balancer"] --> Blue["BLUE (Production)
Schema V1
Nhận traffic"]
    LB -.-> Green["GREEN (Staging)
Schema V2
Đang chuẩn bị"]
    Blue -- "CDC / Replication" --> Green

    Green --> Switch{"Sẵn sàng?"}
    Switch -- "Có" --> LB2["Load Balancer
Chuyển traffic → Green"]
    Switch -- "Chưa" --> Green

    style Blue fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style Green fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LB2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style Switch fill:#fff3e0,stroke:#ff9800,color:#2c3e50

Blue-Green Database: chuyển đổi schema thông qua hai database song song

Quy trình:

Tạo Green database từ backup của Blue.
Apply schema migration lên Green (có thể lock thoải mái vì Green chưa nhận traffic).
Setup CDC (Change Data Capture) hoặc replication từ Blue → Green để sync data realtime.
Test Green database với smoke test và load test.
Cut-over: chuyển connection string từ Blue sang Green. Downtime ở mức milliseconds.
Blue trở thành rollback target trong vài giờ sau cut-over.

Hạn chế Blue-Green Database

Chi phí gấp đôi (duy trì 2 database). CDC/Replication phức tạp, đặc biệt khi schema đã thay đổi. Không phù hợp cho migration nhỏ — chỉ dùng khi Expand-Contract không khả thi.

8. Backfill Data an toàn — Nghệ thuật của Batch Processing

Backfill là bước nguy hiểm nhất trong migration vì nó touch data thật. Một UPDATE trên 50 triệu rows có thể:

Tràn transaction log (tempdb/log file grow hàng GB).
Lock escalation → block toàn bộ query trên bảng.
Replication lag tăng vọt nếu có replica.

8.1 Pattern: Batched Backfill với Throttling

public class BackfillService : BackgroundService
{
    private readonly IDbConnectionFactory _db;
    private readonly ILogger<BackfillService> _logger;

    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        const int batchSize = 5_000;
        const int delayMs = 2_000;
        long totalUpdated = 0;

        while (!ct.IsCancellationRequested)
        {
            var affected = await _db.ExecuteAsync(@"
                UPDATE TOP (@Batch) dbo.Users
                SET EmailNormalized = UPPER(Email)
                WHERE EmailNormalized IS NULL AND Email IS NOT NULL",
                new { Batch = batchSize });

            totalUpdated += affected;
            _logger.LogInformation(
                "Backfill progress: {Total} rows updated", totalUpdated);

            if (affected == 0)
            {
                _logger.LogInformation("Backfill complete!");
                break;
            }

            // Throttle: đợi giữa batch để DB "thở"
            await Task.Delay(delayMs, ct);
        }
    }
}

8.2 Monitor Backfill Progress

-- Kiểm tra tiến độ backfill
SELECT
    COUNT(*) AS TotalRows,
    SUM(CASE WHEN EmailNormalized IS NOT NULL THEN 1 ELSE 0 END) AS Migrated,
    SUM(CASE WHEN EmailNormalized IS NULL AND Email IS NOT NULL THEN 1 ELSE 0 END) AS Pending,
    CAST(
        SUM(CASE WHEN EmailNormalized IS NOT NULL THEN 1.0 ELSE 0 END)
        / COUNT(*) * 100 AS DECIMAL(5,2)
    ) AS PercentComplete
FROM dbo.Users;

9. Rollback Strategy — Chuẩn bị cho khi mọi thứ đi sai

Không giống code deployment có thể rollback bằng cách deploy version cũ, database migration rollback phức tạp hơn nhiều vì data đã thay đổi.

9.1 Forward-Only Migration

Triết lý: không rollback migration, chỉ roll forward. Nếu migration gây lỗi, tạo migration mới để sửa thay vì revert. Lý do:

Nếu bạn ADD COLUMN rồi data đã được ghi vào, DROP COLUMN = mất data.
Nếu bạn backfill data rồi rollback schema, data transformation đã xảy ra.
Forward-only đơn giản hơn và ít risk hơn reverse migration.

9.2 Khi nào cần Rollback Plan

Loại migration	Có thể rollback?	Chiến lược
Add nullable column	Có — drop column	Safe vì data cũ không bị ảnh hưởng
Rename column	Có — rename lại	Nhưng code đã deploy cần rollback cùng
Change data type (widen)	Phức tạp	Data đã ghi ở format mới, cần transform ngược
Drop column	Không	Data đã mất. Cần restore từ backup
Merge tables	Rất phức tạp	Dùng Blue-Green Database để có rollback instant

Checkpoint trước mỗi migration

Luôn tạo database snapshot (SQL Server) hoặc point-in-time backup trước khi chạy migration. Không phải để rollback thường xuyên, mà là "insurance" khi mọi thứ thực sự hỏng.

10. CI/CD Pipeline cho Database Migration

Migration phải là bước riêng biệt trong pipeline, chạy trước application deployment.

graph LR
    A["Code Push"] --> B["Build + Test"]
    B --> C["Generate
Migration Bundle"]
    C --> D["Review
Migration SQL"]
    D --> E["Apply Migration
(K8s Job)"]
    E --> F["Verify Schema"]
    F --> G["Deploy App
(Rolling Update)"]
    G --> H["Smoke Test"]
    H --> I["Monitor
30 phút"]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style D fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style E fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style F fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style G fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50

CI/CD Pipeline với Database Migration tách biệt

# GitHub Actions workflow
name: Deploy with Migration

on:
  push:
    branches: [main]

jobs:
  migrate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup .NET 10
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '10.0.x'

      - name: Build Migration Bundle
        run: |
          dotnet tool restore
          dotnet ef migrations bundle \
            --project src/MyApp.Data \
            --startup-project src/MyApp.Api \
            --self-contained \
            -o ./migrate

      - name: Generate Migration SQL (for review)
        run: |
          dotnet ef migrations script \
            --project src/MyApp.Data \
            --idempotent \
            -o migration.sql

      - name: Apply Migration
        run: ./migrate --connection "${{ secrets.DB_CONNECTION }}"

      - name: Verify Schema
        run: |
          dotnet ef database verify \
            --project src/MyApp.Data \
            --connection "${{ secrets.DB_CONNECTION }}"

  deploy:
    needs: migrate
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Kubernetes
        run: kubectl rollout restart deployment/myapp

11. Checklist Migration an toàn

Trước mỗi production migration, đi qua checklist sau:

#	Kiểm tra	Hành động nếu fail
1	Code cũ (N-1) có chạy được với schema mới không?	Tách migration, apply Expand-Contract
2	Code mới (N) có chạy được với schema cũ không?	Deploy code trước, migration sau
3	Migration có yêu cầu table lock không?	Dùng ONLINE option hoặc batch approach
4	Bảng có bao nhiêu rows?	>1M rows: bắt buộc batch backfill
5	Có backup/snapshot trước migration?	Tạo snapshot trước khi chạy
6	Migration có idempotent không?	Thêm IF EXISTS/IF NOT EXISTS check
7	Rollback plan đã document chưa?	Viết rollback script trước khi apply
8	Đã test trên staging với data volume tương đương?	Clone production data (anonymized) sang staging

12. Tổng kết

Zero-downtime database migration không phải phép thuật — mà là kỷ luật kỹ thuật. Nguyên tắc cốt lõi:

Expand-Contract là pattern nền tảng: thêm trước, chuyển đổi dần, xóa sau.
Tách migration khỏi deployment: Migration chạy riêng, code deploy riêng.
Backward compatibility: Mỗi bước phải tương thích ngược với version trước.
Batch everything: Không bao giờ update hàng triệu rows trong một transaction.
Monitor trước, trong, và sau: Lock wait time, replication lag, error rate.
Forward-only mindset: Ưu tiên roll forward hơn rollback.

Bắt đầu từ đâu?

Nếu team bạn chưa từng làm zero-downtime migration: bắt đầu bằng việc tách migration khỏi Program.cs. Chỉ riêng bước này đã giảm 80% rủi ro. Sau đó dần áp dụng Expand-Contract cho các thay đổi phức tạp.

Nguồn tham khảo

#Database Migration #Zero Downtime #Expand Contract Pattern #EF Core #.NET 10 #SQL Server #Schema Change #Online DDL #Batch Backfill #Blue-Green Database #Lock Management #gh-ost #CI/CD #system design

# Database Migration không Downtime: Expand-Contract, EF Core và Batch Backfill cho Production

Hầu hết developer đều quen với việc chạy `dotnet ef database update` rồi deploy code mới. Nhưng trên production với hàng triệu request mỗi giờ, một câu lệnh `ALTER TABLE ADD COLUMN ... NOT NULL` có thể lock bảng trong hàng phút — và mỗi phút downtime có thể thiệt hại hàng nghìn đô. Bài viết này đi sâu vào các chiến lược thay đổi schema database mà không cần downtime, từ pattern cốt lõi đến implementation cụ thể trên .NET 10 và SQL Server.

99.99%Uptime yêu cầu = chỉ 52 phút downtime/năm

~$5,600Chi phí trung bình mỗi phút downtime (SMB)

3 phaExpand → Migrate → Contract

0 lockMục tiêu: không blocking lock trên production

## 1. Tại sao Database Migration là điểm chết của Zero-Downtime Deployment?

- **Schema lock:** Nhiều DDL statement yêu cầu Schema Modification Lock (Sch-M) trên SQL Server, block *mọi* query đang chạy trên bảng đó.
- **Version mismatch:** Code mới expect cột mới, code cũ chưa biết cột đó tồn tại. Trong rolling deployment, hai version chạy đồng thời → crash hoặc data inconsistency.
- **Rollback phức tạp:** Không giống code có thể rollback bằng `git revert`, schema change đã apply thì dữ liệu đã biến đổi — rollback có thể mất data.
- **Coupling chặt:** Khi migration chạy cùng application startup (auto-migrate), một pod chạy migration trong khi 9 pod khác đang serve traffic với schema cũ.

#### Anti-pattern phổ biến

Đừng bao giờ để EF Core tự chạy `Database.Migrate()` trong `Program.cs` trên production. Điều này nghĩa là pod đầu tiên startup sẽ chạy migration trong khi traffic đang vào — tạo race condition giữa schema change và request handling.

## 2. Expand-Contract Pattern — Nền tảng của mọi Zero-Downtime Migration

```
graph LR
    A["Trạng thái ban đầu  
Schema V1 + Code V1"] --> B["EXPAND  
Schema V2 (thêm mới)  
Code V1 vẫn hoạt động"]
    B --> C["MIGRATE  
Schema V2  
Code V2 (dùng cả cũ+mới)  
Backfill data"]
    C --> D["CONTRACT  
Schema V3 (xóa cũ)  
Code V3 (chỉ dùng m���i)"]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style C fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style D fill:#e3f2fd,stroke:#2196F3,color:#2c3e50

```
Ba giai đoạn của Expand-Contract Pattern

### 2.1 Pha Expand — Thêm mà không phá

Thêm các thành phần mới (cột, bảng, index) mà **không xóa hoặc sửa** bất cứ thứ gì hiện có. Code cũ (V1) vẫn hoạt động bình thường vì nó không biết đến cột mới — cột mới phải `NULLABLE` hoặc có `DEFAULT`.

```
-- Expand: thêm cột mới, nullable, không break code cũ
ALTER TABLE dbo.Users ADD Email NVARCHAR(256) NULL;

-- Thêm index ONLINE để không block read/write
CREATE NONCLUSTERED INDEX IX_Users_Email
ON dbo.Users (Email)
WITH (ONLINE = ON);

```

### 2.2 Pha Migrate — Chuyển đổi dần dần

Deploy code mới ghi vào **cả cột cũ và cột mới** (dual-write). Đồng thời chạy background job backfill dữ liệu từ cũ sang mới cho các row hiện có. Khi 100% data đã được migrate, code mới bắt đầu **đọc từ cột mới**.

```
// Dual-write trong application code
public async Task UpdateUser(int userId, string newEmail)
{
    // Ghi cả hai cột trong giai đoạn chuyển tiếp
    await _db.ExecuteAsync(@"
        UPDATE Users
        SET Email = @Email,           -- cột mới
            ContactInfo = @Email      -- cột cũ (backward compat)
        WHERE Id = @UserId",
        new { Email = newEmail, UserId = userId });
}

```

### 2.3 Pha Contract — Dọn dẹp

Sau khi chắc chắn không còn code nào đọc/ghi cột cũ (monitoring đủ lâu, thường 1–2 sprint), xóa cột cũ và constraint không cần thiết.

```
-- Contract: chỉ chạy sau khi confirm 100% traffic dùng cột mới
ALTER TABLE dbo.Users DROP COLUMN ContactInfo;

```

#### Quy tắc vàng

Mỗi deployment chỉ nên thực hiện **một pha duy nhất**. Deploy 1: Expand (thêm cột). Deploy 2: Migrate (code mới + backfill). Deploy 3: Contract (xóa cũ). Không bao giờ gộp Expand + Contract vào một release.

## 3. Các kịch bản Migration phổ biến và cách xử lý

### 3.1 Thêm cột mới (Add Column)

Đây là migration đơn giản nhất nhưng hay bị làm sai. Sai lầm phổ biến: thêm cột `NOT NULL` mà không có `DEFAULT`.

| Cách làm | Downtime? | Giải thích |
| --- | --- | --- |
| `ADD col INT NOT NULL` | Có — lock toàn bảng | SQL Server phải scan toàn bộ rows để verify constraint |
| `ADD col INT NULL` | Không — metadata-only | Chỉ thay đổi metadata, không scan data |
| `ADD col INT NOT NULL DEFAULT 0` | Không (SQL Server 2012+) | Default được lưu ở metadata, không backfill ngay |
| `ADD col INT NULL` → backfill → `ALTER NOT NULL` | Tùy — pha cuối cần Sch-M | An toàn nhất cho bảng lớn, backfill theo batch |

### 3.2 Đổi tên cột (Rename Column)

Không bao giờ dùng `sp_rename` trực tiếp trên production. Thay vào đó, dùng Expand-Contract:

1. **Expand:** Thêm cột mới với tên mới, trigger sync data từ cột cũ sang mới.
2. **Migrate:** Code mới đọc/ghi cột mới. Backfill toàn bộ data cũ.
3. **Contract:** Xóa trigger và cột cũ.

```
-- Expand: thêm cột mới + trigger đồng bộ
ALTER TABLE dbo.Orders ADD CustomerEmail NVARCHAR(256) NULL;

CREATE TRIGGER trg_SyncEmail ON dbo.Orders
AFTER INSERT, UPDATE AS
BEGIN
    SET NOCOUNT ON;
    UPDATE o SET o.CustomerEmail = i.CustEmail
    FROM dbo.Orders o INNER JOIN inserted i ON o.Id = i.Id
    WHERE i.CustEmail IS NOT NULL AND o.CustomerEmail IS NULL;
END;

-- Backfill batch 10,000 rows mỗi lần
WHILE 1=1
BEGIN
    UPDATE TOP (10000) dbo.Orders
    SET CustomerEmail = CustEmail
    WHERE CustomerEmail IS NULL AND CustEmail IS NOT NULL;

IF @@ROWCOUNT = 0 BREAK;
    WAITFOR DELAY '00:00:01'; -- throttle để giảm load
END;

```

### 3.3 Thay đổi kiểu dữ liệu (Change Data Type)

Ví dụ: đổi `VARCHAR(50)` thành `NVARCHAR(256)`. Đây là migration nguy hiểm vì `ALTER COLUMN` trên bảng lớn sẽ rebuild toàn bộ.

```
graph TD
    A["Tạo cột mới  
NVARCHAR(256) NULL"] --> B["Deploy code dual-write  
Ghi cả cột cũ + mới"]
    B --> C["Backfill batch  
Copy data cũ → mới"]
    C --> D{"100% data  
đã migrate?"}
    D -- "Chưa" --> C
    D -- "Rồi" --> E["Deploy code read cột mới"]
    E --> F["Monitor 1-2 tuần"]
    F --> G["Drop cột cũ"]

style A fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style B fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style C fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style F fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style G fill:#fce4ec,stroke:#e94560,color:#2c3e50

```
Flow chart cho migration thay đổi kiểu dữ liệu

### 3.4 Xóa cột (Drop Column)

Nghe đơn giản nhưng nếu code đang chạy còn reference cột đó → crash. Quy trình:

1. **Deploy 1:** Code mới không còn đọc/ghi cột đó (nhưng cột vẫn tồn tại).
2. **Monitor:** Đợi 1–2 tuần, kiểm tra không query nào touch cột đó qua `sys.dm_exec_query_stats`.
3. **Deploy 2:** `ALTER TABLE DROP COLUMN`.

### 3.5 Thêm constraint NOT NULL

Thêm `NOT NULL` constraint vào cột hiện có yêu cầu SQL Server scan toàn bộ bảng để verify không có NULL. Với bảng lớn, điều này gây lock kéo dài.

```
-- Bước 1: Thêm CHECK constraint WITH NOCHECK (không scan data hiện có)
ALTER TABLE dbo.Users WITH NOCHECK
ADD CONSTRAINT CK_Users_Email_NotNull CHECK (Email IS NOT NULL);

-- Bước 2: Backfill NULL rows (nếu còn)
UPDATE TOP (10000) dbo.Users SET Email = '' WHERE Email IS NULL;

-- Bước 3: Khi 100% data clean, enable constraint verification
ALTER TABLE dbo.Users WITH CHECK
CHECK CONSTRAINT CK_Users_Email_NotNull;

-- Bước 4 (optional): Convert sang NOT NULL nếu cần
ALTER TABLE dbo.Users ALTER COLUMN Email NVARCHAR(256) NOT NULL;

```

## 4. EF Core Migration Workflow cho Production

Entity Framework Core là ORM phổ biến nhất trên .NET, nhưng workflow migration mặc định không phù hợp cho zero-downtime. Cần thay đổi cách tiếp cận.

### 4.1 Tách Migration khỏi Application Startup

```
// Program.cs — KHÔNG làm thế này trên production
// app.Services.GetRequiredService<AppDbContext>().Database.Migrate();

// Thay vào đó: tạo migration bundle riêng
// Terminal: dotnet ef migrations bundle --self-contained -o migrate.exe
// CI/CD: chạy migrate.exe TRƯỚC khi deploy code mới

```
Tốt nhất là tạo một **Kubernetes Job** hoặc **Azure Container Instance** riêng chỉ để chạy migration:

```
# k8s-migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration-v42
spec:
  template:
    spec:
      containers:
      - name: migrator
        image: myapp:v42-migrator
        command: ["./migrate"]
        env:
        - name: ConnectionStrings__Default
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: connection-string
      restartPolicy: Never
  backoffLimit: 1

```

### 4.2 Viết Migration an toàn với EF Core

EF Core generate migration dựa trên diff giữa model snapshot và model hiện tại. Nhưng generated code thường không zero-downtime safe. Cần review và sửa:

```
// Migration tự generate — KHÔNG safe
public partial class AddUserEmail : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        // EF tự generate NOT NULL — sẽ lock bảng
        migrationBuilder.AddColumn<string>(
            name: "Email",
            table: "Users",
            type: "nvarchar(256)",
            nullable: false,      // <-- vấn đề ở đây
            defaultValue: "");
    }
}

// Sửa thành safe version
public partial class AddUserEmail : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        // Bước 1: Thêm nullable trước
        migrationBuilder.AddColumn<string>(
            name: "Email",
            table: "Users",
            type: "nvarchar(256)",
            nullable: true);       // nullable = safe

// Bước 2: Backfill sẽ chạy ở migration riêng sau
    }
}

```

### 4.3 Công cụ hỗ trợ: EF Core Migration Analyzers

Sử dụng thư viện như **ErikEJ.EFCorePowerTools** hoặc custom Roslyn analyzer để tự động phát hiện migration không safe:

```
// Roslyn analyzer kiểm tra migration có safe không
// Phát hiện: AddColumn với nullable: false mà không có defaultValue
// Phát hiện: DropColumn, DropTable trong cùng migration với AddColumn
// Phát hiện: AlterColumn thay đổi data type

```

#### Chiến lược Migration Review

## 5. SQL Server — Online Operations và Lock Management

SQL Server có nhiều tính năng hỗ trợ online schema change, nhưng cần hiểu rõ phiên bản nào hỗ trợ gì.

### 5.1 Online Index Operations

| Operation | Online Support | Phiên bản yêu cầu |
| --- | --- | --- |
| CREATE INDEX | `WITH (ONLINE = ON)` | Enterprise / Developer |
| ALTER INDEX REBUILD | `WITH (ONLINE = ON)` | Enterprise / Developer |
| ALTER COLUMN (data type) | `WITH (ONLINE = ON)` | SQL Server 2016+ Enterprise |
| ADD COLUMN (nullable) | Tự động online | Mọi edition |
| ADD COLUMN (NOT NULL + DEFAULT) | Metadata-only (2012+) | Mọi edition từ 2012 |
| DROP COLUMN | Tự động online | Mọi edition |
| ADD FOREIGN KEY | `WITH NOCHECK` để tránh scan | Mọi edition |

### 5.2 Lock Escalation và cách kiểm soát

SQL Server tự động escalate từ row lock → page lock → table lock khi số lượng lock vượt ngưỡng. Với bảng lớn, một batch update có thể trigger table lock.

```
-- Kiểm soát lock escalation khi backfill
ALTER TABLE dbo.Orders SET (LOCK_ESCALATION = DISABLE);

-- Backfill theo batch nhỏ, mỗi batch trong transaction riêng
DECLARE @BatchSize INT = 5000;
DECLARE @RowsAffected INT = 1;

WHILE @RowsAffected > 0
BEGIN
    BEGIN TRANSACTION;

UPDATE TOP (@BatchSize) dbo.Orders
    SET CustomerEmail = CustEmail
    WHERE CustomerEmail IS NULL AND CustEmail IS NOT NULL;

SET @RowsAffected = @@ROWCOUNT;

COMMIT TRANSACTION;

-- Nghỉ giữa các batch để giảm áp lực I/O
    IF @RowsAffected > 0
        WAITFOR DELAY '00:00:02';
END;

-- Bật lại lock escalation
ALTER TABLE dbo.Orders SET (LOCK_ESCALATION = TABLE);

```

### 5.3 Monitoring Lock trong quá trình Migration

```
-- Kiểm tra lock đang active
SELECT
    r.session_id,
    r.blocking_session_id,
    r.wait_type,
    r.wait_time,
    t.text AS query_text,
    r.status
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0;

-- Kiểm tra schema lock
SELECT
    resource_type,
    request_mode,
    request_status,
    request_session_id
FROM sys.dm_tran_locks
WHERE resource_type = 'OBJECT'
AND request_mode IN ('Sch-M', 'Sch-S');

```

## 6. Online Schema Change Tools — Khi DDL gốc không đủ

Với MySQL, hai công cụ phổ biến nhất là **gh-ost** (GitHub) và **pt-online-schema-change** (Percona). Chúng áp dụng nguyên lý Expand-Contract ở cấp công cụ.

```
graph TB
    subgraph "gh-ost (GitHub)"
        G1["1. Tạo ghost table  
copy schema gốc"] --> G2["2. Thực hiện ALTER  
trên ghost table"]
        G2 --> G3["3. Stream binlog  
sync data realtime"]
        G3 --> G4["4. Cut-over  
rename tables (atomic)"]
    end

subgraph "pt-online-schema-change (Percona)"
        P1["1. Tạo bảng mới  
với schema mới"] --> P2["2. Gắn triggers  
sync INSERT/UPDATE/DELETE"]
        P2 --> P3["3. Copy data theo batch  
từ bảng gốc"]
        P3 --> P4["4. Swap tables  
rename atomic"]
    end

style G1 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style G2 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style G3 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style G4 fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style P1 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style P2 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style P3 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style P4 fill:#e3f2fd,stroke:#2196F3,color:#2c3e50

```
So sánh cơ chế hoạt động gh-ost vs pt-online-schema-change

| Tiêu chí | gh-ost | pt-online-schema-change |
| --- | --- | --- |
| Cơ chế sync | Binary log streaming | Triggers trên bảng gốc |
| Impact lên production | Thấp — không trigger overhead | Trung bình — triggers thêm write latency |
| Foreign keys | Không hỗ trợ | Hỗ trợ (có hạn chế) |
| Throttling | Tự động dựa trên replication lag | Manual cấu hình chunk size |
| Rollback | Drop ghost table | Drop new table + remove triggers |
| Phù hợp cho | High-traffic, modern infra | Legacy, có FK constraints |

#### Với SQL Server thì sao?

SQL Server không có gh-ost/pt-osc tương đương. Thay vào đó, dùng kết hợp: `WITH (ONLINE = ON)` cho index operations, metadata-only `ADD COLUMN`, và batch update cho backfill. Với thay đổi lớn, consider **Blue-Green Database** pattern: maintain hai database, sync qua CDC hoặc replication.

## 7. Blue-Green Database Pattern

Khi schema change quá phức tạp cho Expand-Contract đơn thuần (ví dụ: tái cấu trúc hoàn toàn bảng, merge/split bảng), Blue-Green Database là giải pháp cuối cùng.

```
graph LR
    LB["Load Balancer"] --> Blue["BLUE (Production)  
Schema V1  
Nhận traffic"]
    LB -.-> Green["GREEN (Staging)  
Schema V2  
Đang chuẩn bị"]
    Blue -- "CDC / Replication" --> Green

Green --> Switch{"Sẵn sàng?"}
    Switch -- "Có" --> LB2["Load Balancer  
Chuyển traffic → Green"]
    Switch -- "Chưa" --> Green

style Blue fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style Green fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LB2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style Switch fill:#fff3e0,stroke:#ff9800,color:#2c3e50

```
Blue-Green Database: chuyển đổi schema thông qua hai database song song

**Quy trình:**

1. Tạo Green database từ backup của Blue.
2. Apply schema migration lên Green (có thể lock thoải mái vì Green chưa nhận traffic).
3. Setup CDC (Change Data Capture) hoặc replication từ Blue → Green để sync data realtime.
4. Test Green database với smoke test và load test.
5. Cut-over: chuyển connection string từ Blue sang Green. Downtime ở mức milliseconds.
6. Blue trở thành rollback target trong vài giờ sau cut-over.

#### Hạn chế Blue-Green Database

## 8. Backfill Data an toàn — Nghệ thuật của Batch Processing

Backfill là bước nguy hiểm nhất trong migration vì nó touch data thật. Một `UPDATE` trên 50 triệu rows có thể:

- Tràn transaction log (tempdb/log file grow hàng GB).
- Lock escalation → block toàn bộ query trên bảng.
- Replication lag tăng vọt nếu có replica.

### 8.1 Pattern: Batched Backfill với Throttling

```
public class BackfillService : BackgroundService
{
    private readonly IDbConnectionFactory _db;
    private readonly ILogger<BackfillService> _logger;

protected override async Task ExecuteAsync(CancellationToken ct)
    {
        const int batchSize = 5_000;
        const int delayMs = 2_000;
        long totalUpdated = 0;

while (!ct.IsCancellationRequested)
        {
            var affected = await _db.ExecuteAsync(@"
                UPDATE TOP (@Batch) dbo.Users
                SET EmailNormalized = UPPER(Email)
                WHERE EmailNormalized IS NULL AND Email IS NOT NULL",
                new { Batch = batchSize });

totalUpdated += affected;
            _logger.LogInformation(
                "Backfill progress: {Total} rows updated", totalUpdated);

if (affected == 0)
            {
                _logger.LogInformation("Backfill complete!");
                break;
            }

// Throttle: đợi giữa batch để DB "thở"
            await Task.Delay(delayMs, ct);
        }
    }
}

```

### 8.2 Monitor Backfill Progress

```
-- Kiểm tra tiến độ backfill
SELECT
    COUNT(*) AS TotalRows,
    SUM(CASE WHEN EmailNormalized IS NOT NULL THEN 1 ELSE 0 END) AS Migrated,
    SUM(CASE WHEN EmailNormalized IS NULL AND Email IS NOT NULL THEN 1 ELSE 0 END) AS Pending,
    CAST(
        SUM(CASE WHEN EmailNormalized IS NOT NULL THEN 1.0 ELSE 0 END)
        / COUNT(*) * 100 AS DECIMAL(5,2)
    ) AS PercentComplete
FROM dbo.Users;

```

## 9. Rollback Strategy — Chuẩn bị cho khi mọi thứ đi sai

Không giống code deployment có thể rollback bằng cách deploy version cũ, database migration rollback phức tạp hơn nhiều vì **data đã thay đổi**.

### 9.1 Forward-Only Migration

Triết lý: **không rollback migration, chỉ roll forward**. Nếu migration gây lỗi, tạo migration mới để sửa thay vì revert. Lý do:

- Nếu bạn `ADD COLUMN` rồi data đã được ghi vào, `DROP COLUMN` = mất data.
- Nếu bạn backfill data rồi rollback schema, data transformation đã xảy ra.
- Forward-only đơn giản hơn và ít risk hơn reverse migration.

### 9.2 Khi nào cần Rollback Plan

| Loại migration | Có thể rollback? | Chiến lược |
| --- | --- | --- |
| Add nullable column | Có — drop column | Safe vì data cũ không bị ảnh hưởng |
| Rename column | Có — rename lại | Nhưng code đã deploy cần rollback cùng |
| Change data type (widen) | Phức tạp | Data đã ghi ở format mới, cần transform ngược |
| Drop column | Không | Data đã mất. Cần restore từ backup |
| Merge tables | Rất phức tạp | Dùng Blue-Green Database để có rollback instant |

#### Checkpoint trước mỗi migration

Luôn tạo **database snapshot** (SQL Server) hoặc **point-in-time backup** trước khi chạy migration. Không phải để rollback thường xuyên, mà là "insurance" khi mọi thứ thực sự hỏng.

## 10. CI/CD Pipeline cho Database Migration

Migration phải là bước riêng biệt trong pipeline, chạy **trước** application deployment.

```
graph LR
    A["Code Push"] --> B["Build + Test"]
    B --> C["Generate  
Migration Bundle"]
    C --> D["Review  
Migration SQL"]
    D --> E["Apply Migration  
(K8s Job)"]
    E --> F["Verify Schema"]
    F --> G["Deploy App  
(Rolling Update)"]
    G --> H["Smoke Test"]
    H --> I["Monitor  
30 phút"]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style D fill:#fff3e0,stroke:#ff9800,color:#2c3e50
    style E fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style F fill:#e3f2fd,stroke:#2196F3,color:#2c3e50
    style G fill:#e8f5e9,stroke:#4CAF50,color:#2c3e50
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
CI/CD Pipeline với Database Migration tách biệt

```
# GitHub Actions workflow
name: Deploy with Migration

on:
  push:
    branches: [main]

jobs:
  migrate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

- name: Setup .NET 10
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '10.0.x'

- name: Build Migration Bundle
        run: |
          dotnet tool restore
          dotnet ef migrations bundle \
            --project src/MyApp.Data \
            --startup-project src/MyApp.Api \
            --self-contained \
            -o ./migrate

- name: Generate Migration SQL (for review)
        run: |
          dotnet ef migrations script \
            --project src/MyApp.Data \
            --idempotent \
            -o migration.sql

- name: Apply Migration
        run: ./migrate --connection "${{ secrets.DB_CONNECTION }}"

- name: Verify Schema
        run: |
          dotnet ef database verify \
            --project src/MyApp.Data \
            --connection "${{ secrets.DB_CONNECTION }}"

deploy:
    needs: migrate
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Kubernetes
        run: kubectl rollout restart deployment/myapp

```

## 11. Checklist Migration an toàn

Trước mỗi production migration, đi qua checklist sau:

| # | Kiểm tra | Hành động nếu fail |
| --- | --- | --- |
| 1 | Code cũ (N-1) có chạy được với schema mới không? | Tách migration, apply Expand-Contract |
| 2 | Code mới (N) có chạy được với schema cũ không? | Deploy code trước, migration sau |
| 3 | Migration có yêu cầu table lock không? | Dùng ONLINE option hoặc batch approach |
| 4 | Bảng có bao nhiêu rows? | >1M rows: bắt buộc batch backfill |
| 5 | Có backup/snapshot trước migration? | Tạo snapshot trước khi chạy |
| 6 | Migration có idempotent không? | Thêm IF EXISTS/IF NOT EXISTS check |
| 7 | Rollback plan đã document chưa? | Viết rollback script trước khi apply |
| 8 | Đã test trên staging với data volume tương đương? | Clone production data (anonymized) sang staging |

## 12. Tổng kết

Zero-downtime database migration không phải phép thuật — mà là kỷ luật kỹ thuật. Nguyên tắc cốt lõi:

- **Expand-Contract** là pattern nền tảng: thêm trước, chuyển đổi dần, xóa sau.
- **Tách migration khỏi deployment:** Migration chạy riêng, code deploy riêng.
- **Backward compatibility:** Mỗi bước phải tương thích ngược với version trước.
- **Batch everything:** Không bao giờ update hàng triệu rows trong một transaction.
- **Monitor trước, trong, và sau:** Lock wait time, replication lag, error rate.
- **Forward-only mindset:** Ưu tiên roll forward hơn rollback.

#### Bắt đầu từ đâu?

Nếu team bạn chưa từng làm zero-downtime migration: bắt đầu bằng việc tách migration khỏi `Program.cs`. Chỉ riêng bước này đã giảm 80% rủi ro. Sau đó dần áp dụng Expand-Contract cho các thay đổi phức tạp.

## Nguồn tham khảo

- [Expand and Contract Pattern — Tim Wellhausen](https://www.tim-wellhausen.de/papers/ExpandAndContract/ExpandAndContract.html)
- [Zero-Downtime Database Migrations with EF Core — Kittikawin L.](https://medium.com/@kittikawin_ball/zero-downtime-database-migrations-with-ef-core-d9fcff7e74aa)
- [Database Migrations in Production: Zero-Downtime Schema Changes (2026 Guide) — DEV Community](https://dev.to/young_gao/database-migrations-in-production-zero-downtime-schema-changes-5fng)
- [gh-ost: GitHub's Online Schema-migration Tool for MySQL](https://github.com/github/gh-ost)
- [gh-ost vs pt-online-schema-change — Bytebase](https://www.bytebase.com/blog/gh-ost-vs-pt-online-schema-change/)
- [ALTER TABLE (Transact-SQL) — Microsoft Learn](https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql)
- [Applying Migrations — EF Core | Microsoft Learn](https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations/applying)

Cloudflare Workers — Xây dựng ứng dụng Full-Stack Serverless miễn phí trên Edge

C# 14 Deep Dive — 8 Tính Năng Mới Định Hình Tương Lai .NET

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.