Distributed Locking — Giải quyết Race Condition trong hệ thống phân tán với Redis và .NET 10

Posted on: 4/25/2026 2:12:07 AM

Table of contents

Mục lục
1. Race Condition trong hệ thống phân tán
2. Distributed Lock là gì và tại sao cần?
1. Ba thuộc tính cốt lõi của Distributed Lock
2. Các use case phổ biến
3. Redis Distributed Lock — Từ SETNX đến Redlock
1. 3.1 Cách tiếp cận đơn giản: SET NX EX
  1. ⚠️ Tại sao phải dùng Lua script để release?
2. 3.2 Hạn chế của Single-node Redis Lock
4. Thuật toán Redlock chi tiết
1. Các bước chi tiết của Redlock
  1. Cuộc tranh luận Kleppmann vs Sanfilippo
5. Triển khai Distributed Lock với .NET 10
6. PostgreSQL Advisory Lock — Giải pháp thay thế
1. 💡 Khi nào dùng Advisory Lock thay vì Redis?
7. Fencing Token — Giải quyết vấn đề Split-brain
8. So sánh hiệu suất các giải pháp
9. Anti-patterns và lỗi thường gặp
10. Production-ready Patterns

Mục lục

1. Race Condition trong hệ thống phân tán
2. Distributed Lock là gì và tại sao cần?
3. Redis Distributed Lock — Từ SETNX đến Redlock
4. Thuật toán Redlock chi tiết
5. Triển khai Distributed Lock với .NET 10
6. PostgreSQL Advisory Lock — Giải pháp thay thế
7. Fencing Token — Giải quyết vấn đề Split-brain
8. So sánh hiệu suất các giải pháp
9. Anti-patterns và lỗi thường gặp
10. Production-ready Patterns

67% Hệ thống phân tán gặp race condition nếu thiếu lock

<1ms Latency trung bình Redis SETNX

5 nodes Quorum tối thiểu cho Redlock

99.99% Độ tin cậy Distributed Lock khi triển khai đúng

1. Race Condition trong hệ thống phân tán

Hãy tưởng tượng hệ thống e-commerce của bạn có 3 instance đang chạy đồng thời. Một sản phẩm chỉ còn 1 đơn vị trong kho, nhưng 2 request đặt hàng đến cùng lúc trên 2 instance khác nhau. Cả hai đều đọc stock = 1, cả hai đều trừ kho thành stock = 0, và kết quả là bạn bán 2 sản phẩm trong khi chỉ có 1.

Đây chính là race condition — và nó xảy ra thường xuyên hơn bạn nghĩ trong môi trường distributed.

sequenceDiagram
    participant I1 as Instance 1
    participant DB as Database
    participant I2 as Instance 2

    I1->>DB: SELECT stock WHERE id=1
    I2->>DB: SELECT stock WHERE id=1
    DB-->>I1: stock = 1
    DB-->>I2: stock = 1
    I1->>DB: UPDATE stock = 0
    I2->>DB: UPDATE stock = 0
    Note over DB: ⚠️ Bán 2 sản phẩm, kho chỉ có 1!

Race condition khi 2 instance đọc và ghi đồng thời mà không có lock

Trong hệ thống monolith, bạn có thể dùng lock statement trong C# hoặc synchronized trong Java. Nhưng khi hệ thống phân tán với nhiều process trên nhiều máy chủ, bạn cần một cơ chế lock bên ngoài mà tất cả các instance đều thấy — đó là Distributed Lock.

2. Distributed Lock là gì và tại sao cần?

Distributed Lock là cơ chế đảm bảo rằng trong một thời điểm, chỉ có duy nhất một process được phép thực thi một đoạn code hoặc truy cập một resource cụ thể, bất kể process đó chạy trên máy chủ nào.

Ba thuộc tính cốt lõi của Distributed Lock

Safety (Mutual Exclusion): Tại mọi thời điểm, tối đa một client giữ lock.
Liveness (Deadlock-free): Ngay cả khi client giữ lock bị crash, lock phải được giải phóng.
Fault Tolerance: Lock vẫn hoạt động khi một phần hệ thống gặp sự cố.

Các use case phổ biến

Use Case	Mô tả	Hậu quả nếu thiếu lock
Inventory deduction	Trừ kho khi đặt hàng	Overselling, bán vượt tồn kho
Scheduled job	Cron job chỉ chạy 1 lần dù có nhiều instance	Duplicate emails, tính toán sai
Rate limiting	Giới hạn số request trong sliding window	Vượt quota, abuse API
Leader election	Chọn 1 node làm leader xử lý task	Split-brain, dữ liệu inconsistent
Payment processing	Đảm bảo idempotency cho giao dịch	Charge trùng, mất tiền
Cache stampede prevention	Chỉ 1 request rebuild cache khi hết hạn	Database bị quá tải (thundering herd)

3. Redis Distributed Lock — Từ SETNX đến Redlock

3.1 Cách tiếp cận đơn giản: SET NX EX

Redis cung cấp lệnh atomic SET key value NX EX timeout — chỉ set nếu key chưa tồn tại, kèm thời gian hết hạn tự động:

# Acquire lock
SET order:lock:12345 "instance-1-uuid" NX EX 30

# Release lock (chỉ release nếu đúng owner)
# Dùng Lua script để đảm bảo atomicity
EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 order:lock:12345 "instance-1-uuid"

graph TD
    A[Client muốn acquire lock] --> B{SET key NX EX 30}
    B -->|OK - Lock acquired| C[Thực thi critical section]
    B -->|nil - Lock đã tồn tại| D[Retry sau delay]
    C --> E[Release lock bằng Lua script]
    E --> F{Value khớp owner?}
    F -->|Có| G[DEL key - Lock released]
    F -->|Không| H[Bỏ qua - Lock thuộc client khác]
    D --> B

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#ff9800,stroke:#fff,color:#fff

Flow acquire và release lock với Redis SET NX

⚠️ Tại sao phải dùng Lua script để release?

Nếu bạn dùng 2 lệnh riêng biệt (GET rồi DEL), giữa 2 lệnh có thể xảy ra tình huống: lock đã hết hạn, client khác đã acquire lock mới, và bạn vô tình xóa lock của client khác. Lua script chạy atomic trên Redis, loại bỏ hoàn toàn rủi ro này.

3.2 Hạn chế của Single-node Redis Lock

Cách tiếp cận trên có vấn đề nghiêm trọng khi Redis master bị lỗi:

Client A acquire lock trên Redis master
Redis master crash trước khi replicate lock sang replica
Replica được promote thành master mới
Client B acquire cùng lock trên master mới — vi phạm mutual exclusion!

Đây là lý do Martin Kleppmann (tác giả "Designing Data-Intensive Applications") đã chỉ trích cách tiếp cận single-node, và Salvatore Sanfilippo (tác giả Redis) đề xuất thuật toán Redlock.

4. Thuật toán Redlock chi tiết

Redlock sử dụng N node Redis độc lập (khuyến nghị N=5) để đảm bảo safety ngay cả khi một số node bị lỗi.

sequenceDiagram
    participant C as Client
    participant R1 as Redis Node 1
    participant R2 as Redis Node 2
    participant R3 as Redis Node 3
    participant R4 as Redis Node 4
    participant R5 as Redis Node 5

    Note over C: Bước 1: Lấy timestamp T1
    C->>R1: SET lock NX EX 30
    R1-->>C: OK ✓
    C->>R2: SET lock NX EX 30
    R2-->>C: OK ✓
    C->>R3: SET lock NX EX 30
    R3-->>C: FAIL ✗
    C->>R4: SET lock NX EX 30
    R4-->>C: OK ✓
    C->>R5: SET lock NX EX 30
    R5-->>C: OK ✓

    Note over C: Bước 2: Lấy timestamp T2
    Note over C: Acquired 4/5 nodes ≥ quorum (3)
    Note over C: Lock validity = 30s - (T2-T1)
    Note over C: Nếu validity > 0 → Lock thành công!

Thuật toán Redlock: acquire lock trên đa số node (quorum)

Các bước chi tiết của Redlock

Ghi nhận thời gian bắt đầu T1
Tuần tự acquire lock trên tất cả N node Redis, mỗi node có timeout nhỏ (vài ms) để không bị block nếu node chết
Tính thời gian đã dùng elapsed = T2 - T1
Lock thành công nếu: acquire được trên ≥ N/2 + 1 node VÀ elapsed < lock TTL
Lock validity time thực tế = TTL - elapsed (lock hết hạn sớm hơn vì đã mất thời gian acquire)
Nếu thất bại, release lock trên tất cả node (kể cả node đã fail) để cleanup

Cuộc tranh luận Kleppmann vs Sanfilippo

Martin Kleppmann cho rằng Redlock không an toàn vì GC pause hoặc clock skew có thể khiến client tin mình vẫn giữ lock trong khi lock đã hết hạn. Salvatore đáp lại rằng với clock đồng bộ hợp lý (NTP), Redlock đủ an toàn cho hầu hết use case. Giải pháp dung hòa: kết hợp Redlock với fencing token (mục 7).

5. Triển khai Distributed Lock với .NET 10

5.1 Sử dụng StackExchange.Redis

using StackExchange.Redis;

public class RedisDistributedLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _lockKey;
    private readonly string _lockValue;
    private readonly TimeSpan _expiry;
    private bool _acquired;

    public RedisDistributedLock(IDatabase db, string resource, TimeSpan expiry)
    {
        _db = db;
        _lockKey = $"lock:{resource}";
        _lockValue = Guid.NewGuid().ToString("N");
        _expiry = expiry;
    }

    public async Task<bool> AcquireAsync(TimeSpan timeout, CancellationToken ct = default)
    {
        var deadline = DateTime.UtcNow + timeout;
        while (DateTime.UtcNow < deadline)
        {
            _acquired = await _db.StringSetAsync(
                _lockKey, _lockValue, _expiry, When.NotExists);

            if (_acquired) return true;
            await Task.Delay(50, ct);
        }
        return false;
    }

    public async ValueTask DisposeAsync()
    {
        if (!_acquired) return;

        const string script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """;

        await _db.ScriptEvaluateAsync(script,
            [new RedisKey(_lockKey)],
            [new RedisValue(_lockValue)]);
    }
}

5.2 Sử dụng trong ASP.NET 10 Minimal API

app.MapPost("/api/orders", async (
    OrderRequest req,
    IConnectionMultiplexer redis,
    OrderService orderService) =>
{
    var db = redis.GetDatabase();
    await using var lockObj = new RedisDistributedLock(
        db, $"order:product:{req.ProductId}", TimeSpan.FromSeconds(30));

    if (!await lockObj.AcquireAsync(TimeSpan.FromSeconds(5)))
        return Results.Conflict("Sản phẩm đang được xử lý bởi request khác");

    var result = await orderService.PlaceOrderAsync(req);
    return Results.Ok(result);
});

5.3 Redlock với RedLock.net

// Đăng ký trong DI container
builder.Services.AddSingleton<IDistributedLockFactory>(sp =>
{
    var endpoints = new List<RedLockEndPoint>
    {
        new DnsEndPoint("redis-1.internal", 6379),
        new DnsEndPoint("redis-2.internal", 6379),
        new DnsEndPoint("redis-3.internal", 6379),
        new DnsEndPoint("redis-4.internal", 6379),
        new DnsEndPoint("redis-5.internal", 6379),
    };
    return RedLockFactory.Create(endpoints);
});

// Sử dụng
app.MapPost("/api/payments/{id}/process", async (
    string id,
    IDistributedLockFactory lockFactory,
    PaymentService paymentService) =>
{
    var resource = $"payment:{id}";
    var expiry = TimeSpan.FromSeconds(30);
    var wait = TimeSpan.FromSeconds(10);
    var retry = TimeSpan.FromMilliseconds(200);

    await using var redLock = await lockFactory.CreateLockAsync(
        resource, expiry, wait, retry);

    if (!redLock.IsAcquired)
        return Results.Conflict("Payment đang được xử lý");

    var result = await paymentService.ProcessAsync(id);
    return Results.Ok(result);
});

6. PostgreSQL Advisory Lock — Giải pháp thay thế

Nếu hệ thống đã dùng PostgreSQL, bạn có thể tận dụng Advisory Lock mà không cần thêm Redis:

public class PostgresAdvisoryLock
{
    private readonly NpgsqlConnection _conn;
    private readonly long _lockId;

    public PostgresAdvisoryLock(NpgsqlConnection conn, string resource)
    {
        _conn = conn;
        _lockId = resource.GetHashCode();
    }

    public async Task<bool> TryAcquireAsync(CancellationToken ct = default)
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_try_advisory_lock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        return (bool)(await cmd.ExecuteScalarAsync(ct))!;
    }

    public async Task ReleaseAsync()
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_advisory_unlock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        await cmd.ExecuteNonQueryAsync();
    }
}

💡 Khi nào dùng Advisory Lock thay vì Redis?

Nên dùng khi: Hệ thống đơn giản, đã có PostgreSQL, không muốn thêm dependency Redis.
Không nên khi: Cần lock cross-database, latency cực thấp (<1ms), hoặc hệ thống có hàng nghìn lock đồng thời (Advisory Lock tiêu tốn shared memory của PostgreSQL).

7. Fencing Token — Giải quyết vấn đề Split-brain

Ngay cả với Redlock, vẫn có khả năng 2 client tin mình giữ lock (do GC pause, network delay). Fencing token là lớp bảo vệ cuối cùng.

sequenceDiagram
    participant C1 as Client 1
    participant LS as Lock Service
    participant DB as Database
    participant C2 as Client 2

    C1->>LS: Acquire lock
    LS-->>C1: Lock granted, token=33
    Note over C1: GC pause 30 giây...
    C2->>LS: Acquire lock (lock đã hết hạn)
    LS-->>C2: Lock granted, token=34
    C2->>DB: WRITE (fencing_token=34) ✓
    Note over C1: GC pause kết thúc
    C1->>DB: WRITE (fencing_token=33)
    DB-->>C1: REJECTED! token 33 < 34
    Note over DB: Database từ chối write có token cũ

Fencing token bảo vệ data integrity ngay cả khi lock bị vi phạm

public class FencedDistributedLock
{
    private static long _globalCounter = 0;

    public long FencingToken { get; private set; }

    public async Task<bool> AcquireAsync(IDatabase db, string resource)
    {
        var token = Interlocked.Increment(ref _globalCounter);
        var value = $"{Environment.MachineName}:{token}";

        var acquired = await db.StringSetAsync(
            $"lock:{resource}", value,
            TimeSpan.FromSeconds(30), When.NotExists);

        if (acquired)
        {
            FencingToken = token;
            return true;
        }
        return false;
    }
}

// Trong repository layer
public async Task UpdateInventoryAsync(
    int productId, int quantity, long fencingToken)
{
    var rows = await _db.ExecuteAsync("""
        UPDATE Inventory
        SET Stock = Stock - @quantity, FencingToken = @token
        WHERE ProductId = @productId
          AND FencingToken < @token
        """,
        new { quantity, token = fencingToken, productId });

    if (rows == 0)
        throw new StaleTokenException("Fencing token rejected");
}

8. So sánh hiệu suất các giải pháp

Tiêu chí	Redis SET NX	Redlock (5 nodes)	PostgreSQL Advisory	ZooKeeper
Latency acquire	~0.5ms	~3-5ms	~1-2ms	~5-10ms
Throughput	~100K ops/s	~20K ops/s	~50K ops/s	~10K ops/s
Safety level	Trung bình	Cao	Cao	Rất cao
Fault tolerance	Thấp (single point)	Cao (N/2+1)	Theo DB replication	Cao (quorum)
Thêm dependency	Redis	5 Redis instances	Không (dùng DB sẵn)	ZooKeeper cluster
Complexity	Đơn giản	Trung bình	Đơn giản	Phức tạp
Auto-release khi crash	Có (TTL)	Có (TTL)	Có (session end)	Có (ephemeral node)
Phù hợp cho	Cache stampede, rate limit	Payment, inventory	Cron job, batch	Leader election

9. Anti-patterns và lỗi thường gặp

❌ Anti-pattern 1: Lock không có TTL

// SAI: Nếu process crash, lock không bao giờ được release
await db.StringSetAsync("lock:order", "1", when: When.NotExists);
// ... process crash ở đây → deadlock vĩnh viễn

// ĐÚNG: Luôn set expiry
await db.StringSetAsync("lock:order", "1",
    TimeSpan.FromSeconds(30), When.NotExists);

❌ Anti-pattern 2: Release lock không kiểm tra owner

// SAI: Có thể xóa lock của client khác
await db.KeyDeleteAsync("lock:order");

// ĐÚNG: Dùng Lua script kiểm tra owner trước khi delete
const string script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    end
    return 0
    """;
await db.ScriptEvaluateAsync(script, ...);

❌ Anti-pattern 3: Lock TTL quá ngắn

// SAI: TTL 2s nhưng operation có thể mất 5s
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(2));
await GenerateReportAsync(); // mất 5s → lock hết hạn giữa chừng!

// ĐÚNG: TTL phải lớn hơn worst-case execution time
// Kết hợp lock extension (renew) nếu cần
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(60));

❌ Anti-pattern 4: Retry không có backoff

// SAI: Tight loop gây CPU spike và Redis overload
while (!acquired) { acquired = await TryAcquire(); }

// ĐÚNG: Exponential backoff + jitter
var delay = 50;
while (!acquired && DateTime.UtcNow < deadline)
{
    acquired = await TryAcquire();
    if (!acquired)
    {
        var jitter = Random.Shared.Next(0, delay / 2);
        await Task.Delay(delay + jitter, ct);
        delay = Math.Min(delay * 2, 1000);
    }
}

10. Production-ready Patterns

10.1 Lock Extension (Auto-renewal)

Khi operation có thể chạy lâu hơn TTL, bạn cần tự động gia hạn lock:

public class AutoRenewingLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _key;
    private readonly string _value;
    private readonly CancellationTokenSource _renewCts = new();
    private Task? _renewTask;

    public async Task<bool> AcquireAsync(TimeSpan ttl)
    {
        var acquired = await _db.StringSetAsync(
            _key, _value, ttl, When.NotExists);

        if (acquired)
        {
            _renewTask = RenewLoopAsync(ttl, _renewCts.Token);
        }
        return acquired;
    }

    private async Task RenewLoopAsync(TimeSpan ttl, CancellationToken ct)
    {
        var renewInterval = ttl / 3;
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(renewInterval, ct);

            const string script = """
                if redis.call('get', KEYS[1]) == ARGV[1] then
                    return redis.call('pexpire', KEYS[1], ARGV[2])
                end
                return 0
                """;
            await _db.ScriptEvaluateAsync(script,
                [new RedisKey(_key)],
                [new RedisValue(_value),
                 new RedisValue(((int)ttl.TotalMilliseconds).ToString())]);
        }
    }

    public async ValueTask DisposeAsync()
    {
        await _renewCts.CancelAsync();
        if (_renewTask != null) await _renewTask;
        // Release lock...
    }
}

10.2 Lock với Observability

public class ObservableDistributedLock
{
    private static readonly Meter Meter = new("DistributedLock");
    private static readonly Counter<long> AcquiredCounter =
        Meter.CreateCounter<long>("lock.acquired");
    private static readonly Counter<long> FailedCounter =
        Meter.CreateCounter<long>("lock.failed");
    private static readonly Histogram<double> AcquireLatency =
        Meter.CreateHistogram<double>("lock.acquire.duration.ms");

    public async Task<bool> AcquireAsync(string resource, TimeSpan expiry)
    {
        var sw = Stopwatch.StartNew();
        var acquired = await InternalAcquireAsync(resource, expiry);
        sw.Stop();

        AcquireLatency.Record(sw.Elapsed.TotalMilliseconds,
            new("resource", resource));

        if (acquired)
            AcquiredCounter.Add(1, new("resource", resource));
        else
            FailedCounter.Add(1, new("resource", resource));

        return acquired;
    }
}

graph LR
    A[Request đến] --> B{Acquire Lock}
    B -->|Thành công| C[Execute Critical Section]
    B -->|Thất bại sau retry| D[Return 409 Conflict]
    C --> E{Operation OK?}
    E -->|Có| F[Release Lock]
    E -->|Lỗi| G[Release Lock + Rollback]
    F --> H[Return 200 OK]
    G --> I[Return 500 Error]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff
    style H fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f44336,stroke:#fff,color:#fff

Production flow xử lý request với distributed lock

💡 Checklist trước khi deploy Distributed Lock

✅ Lock có TTL — tránh deadlock khi process crash
✅ Release kiểm tra owner — tránh xóa lock của client khác
✅ Retry có exponential backoff + jitter — tránh thundering herd
✅ Fencing token cho critical data — bảo vệ data integrity
✅ Metrics & alerting — theo dõi lock contention và latency
✅ TTL > worst-case execution time + buffer — hoặc dùng auto-renewal
✅ Graceful degradation — hệ thống vẫn hoạt động (có thể chậm hơn) khi lock service down

Distributed Lock là một trong những building block quan trọng nhất của hệ thống phân tán. Hiểu rõ cơ chế hoạt động, trade-off giữa các giải pháp, và áp dụng đúng pattern sẽ giúp bạn xây dựng hệ thống đáng tin cậy, tránh được những lỗi race condition tinh vi mà chỉ xuất hiện khi production có tải cao.

Nguồn tham khảo

#system design #redis #PostgreSQL #Distributed Lock #.NET 10 #Redlock #Race Condition #Fencing Token

# Distributed Locking — Giải quyết Race Condition trong hệ thống phân tán với Redis và .NET 10

### Mục lục

- [1. Race Condition trong hệ thống phân tán](#race-condition)
- [2. Distributed Lock là gì và tại sao cần?](#distributed-lock-overview)
- [3. Redis Distributed Lock — Từ SETNX đến Redlock](#redis-distributed-lock)
- [4. Thuật toán Redlock chi tiết](#redlock-algorithm)
- [5. Triển khai Distributed Lock với .NET 10](#dotnet10-implementation)
- [6. PostgreSQL Advisory Lock — Giải pháp thay thế](#postgresql-advisory-lock)
- [7. Fencing Token — Giải quyết vấn đề Split-brain](#fencing-token)
- [8. So sánh hiệu suất các giải pháp](#performance-comparison)
- [9. Anti-patterns và lỗi thường gặp](#anti-patterns)
- [10. Production-ready Patterns](#production-patterns)

67% Hệ thống phân tán gặp race condition nếu thiếu lock

<1ms Latency trung bình Redis SETNX

5 nodes Quorum tối thiểu cho Redlock

99.99% Độ tin cậy Distributed Lock khi triển khai đúng

## 1. Race Condition trong hệ thống phân tán

Hãy tưởng tượng hệ thống e-commerce của bạn có 3 instance đang chạy đồng thời. Một sản phẩm chỉ còn 1 đơn vị trong kho, nhưng 2 request đặt hàng đến cùng lúc trên 2 instance khác nhau. Cả hai đều đọc `stock = 1`, cả hai đều trừ kho thành `stock = 0`, và kết quả là bạn bán 2 sản phẩm trong khi chỉ có 1.

Đây chính là **race condition** — và nó xảy ra thường xuyên hơn bạn nghĩ trong môi trường distributed.

```
sequenceDiagram
    participant I1 as Instance 1
    participant DB as Database
    participant I2 as Instance 2

I1->>DB: SELECT stock WHERE id=1
    I2->>DB: SELECT stock WHERE id=1
    DB-->>I1: stock = 1
    DB-->>I2: stock = 1
    I1->>DB: UPDATE stock = 0
    I2->>DB: UPDATE stock = 0
    Note over DB: ⚠️ Bán 2 sản phẩm, kho chỉ có 1!

```

Race condition khi 2 instance đọc và ghi đồng thời mà không có lock

Trong hệ thống monolith, bạn có thể dùng `lock` statement trong C# hoặc `synchronized` trong Java. Nhưng khi hệ thống phân tán với nhiều process trên nhiều máy chủ, bạn cần một cơ chế lock **bên ngoài** mà tất cả các instance đều thấy — đó là **Distributed Lock**.

## 2. Distributed Lock là gì và tại sao cần?

Distributed Lock là cơ chế đảm bảo rằng trong một thời điểm, chỉ có **duy nhất một process** được phép thực thi một đoạn code hoặc truy cập một resource cụ thể, bất kể process đó chạy trên máy chủ nào.

#### Ba thuộc tính cốt lõi của Distributed Lock

**Safety (Mutual Exclusion):** Tại mọi thời điểm, tối đa một client giữ lock.  
**Liveness (Deadlock-free):** Ngay cả khi client giữ lock bị crash, lock phải được giải phóng.  
**Fault Tolerance:** Lock vẫn hoạt động khi một phần hệ thống gặp sự cố.

### Các use case phổ biến

| Use Case | Mô tả | Hậu quả nếu thiếu lock |
| --- | --- | --- |
| **Inventory deduction** | Trừ kho khi đặt hàng | Overselling, bán vượt tồn kho |
| **Scheduled job** | Cron job chỉ chạy 1 lần dù có nhiều instance | Duplicate emails, tính toán sai |
| **Rate limiting** | Giới hạn số request trong sliding window | Vượt quota, abuse API |
| **Leader election** | Chọn 1 node làm leader xử lý task | Split-brain, dữ liệu inconsistent |
| **Payment processing** | Đảm bảo idempotency cho giao dịch | Charge trùng, mất tiền |
| **Cache stampede prevention** | Chỉ 1 request rebuild cache khi hết hạn | Database bị quá tải (thundering herd) |

## 3. Redis Distributed Lock — Từ SETNX đến Redlock

### 3.1 Cách tiếp cận đơn giản: SET NX EX

Redis cung cấp lệnh atomic `SET key value NX EX timeout` — chỉ set nếu key chưa tồn tại, kèm thời gian hết hạn tự động:

```bash
# Acquire lock
SET order:lock:12345 "instance-1-uuid" NX EX 30

# Release lock (chỉ release nếu đúng owner)
# Dùng Lua script để đảm bảo atomicity
EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 order:lock:12345 "instance-1-uuid"
```

```
graph TD
    A[Client muốn acquire lock] --> B{SET key NX EX 30}
    B -->|OK - Lock acquired| C[Thực thi critical section]
    B -->|nil - Lock đã tồn tại| D[Retry sau delay]
    C --> E[Release lock bằng Lua script]
    E --> F{Value khớp owner?}
    F -->|Có| G[DEL key - Lock released]
    F -->|Không| H[Bỏ qua - Lock thuộc client khác]
    D --> B

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#ff9800,stroke:#fff,color:#fff

```

Flow acquire và release lock với Redis SET NX

#### ⚠️ Tại sao phải dùng Lua script để release?

Nếu bạn dùng 2 lệnh riêng biệt (`GET` rồi `DEL`), giữa 2 lệnh có thể xảy ra tình huống: lock đã hết hạn, client khác đã acquire lock mới, và bạn vô tình xóa lock của client khác. Lua script chạy atomic trên Redis, loại bỏ hoàn toàn rủi ro này.

### 3.2 Hạn chế của Single-node Redis Lock

Cách tiếp cận trên có vấn đề nghiêm trọng khi Redis master bị lỗi:

1. Client A acquire lock trên Redis master
2. Redis master crash **trước khi replicate** lock sang replica
3. Replica được promote thành master mới
4. Client B acquire cùng lock trên master mới — **vi phạm mutual exclusion!**

## 4. Thuật toán Redlock chi tiết

Redlock sử dụng **N node Redis độc lập** (khuyến nghị N=5) để đảm bảo safety ngay cả khi một số node bị lỗi.

```
sequenceDiagram
    participant C as Client
    participant R1 as Redis Node 1
    participant R2 as Redis Node 2
    participant R3 as Redis Node 3
    participant R4 as Redis Node 4
    participant R5 as Redis Node 5

Note over C: Bước 1: Lấy timestamp T1
    C->>R1: SET lock NX EX 30
    R1-->>C: OK ✓
    C->>R2: SET lock NX EX 30
    R2-->>C: OK ✓
    C->>R3: SET lock NX EX 30
    R3-->>C: FAIL ✗
    C->>R4: SET lock NX EX 30
    R4-->>C: OK ✓
    C->>R5: SET lock NX EX 30
    R5-->>C: OK ✓

Note over C: Bước 2: Lấy timestamp T2
    Note over C: Acquired 4/5 nodes ≥ quorum (3)
    Note over C: Lock validity = 30s - (T2-T1)
    Note over C: Nếu validity > 0 → Lock thành công!

```

Thuật toán Redlock: acquire lock trên đa số node (quorum)

### Các bước chi tiết của Redlock

1. **Ghi nhận thời gian bắt đầu** T1
2. **Tuần tự acquire lock** trên tất cả N node Redis, mỗi node có timeout nhỏ (vài ms) để không bị block nếu node chết
3. **Tính thời gian đã dùng** elapsed = T2 - T1
4. **Lock thành công** nếu: acquire được trên ≥ N/2 + 1 node VÀ elapsed < lock TTL
5. **Lock validity time** thực tế = TTL - elapsed (lock hết hạn sớm hơn vì đã mất thời gian acquire)
6. **Nếu thất bại**, release lock trên **tất cả** node (kể cả node đã fail) để cleanup

#### Cuộc tranh luận Kleppmann vs Sanfilippo

## 5. Triển khai Distributed Lock với .NET 10

### 5.1 Sử dụng StackExchange.Redis

```csharp
using StackExchange.Redis;

public class RedisDistributedLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _lockKey;
    private readonly string _lockValue;
    private readonly TimeSpan _expiry;
    private bool _acquired;

public RedisDistributedLock(IDatabase db, string resource, TimeSpan expiry)
    {
        _db = db;
        _lockKey = $"lock:{resource}";
        _lockValue = Guid.NewGuid().ToString("N");
        _expiry = expiry;
    }

public async Task<bool> AcquireAsync(TimeSpan timeout, CancellationToken ct = default)
    {
        var deadline = DateTime.UtcNow + timeout;
        while (DateTime.UtcNow < deadline)
        {
            _acquired = await _db.StringSetAsync(
                _lockKey, _lockValue, _expiry, When.NotExists);

if (_acquired) return true;
            await Task.Delay(50, ct);
        }
        return false;
    }

public async ValueTask DisposeAsync()
    {
        if (!_acquired) return;

const string script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """;

await _db.ScriptEvaluateAsync(script,
            [new RedisKey(_lockKey)],
            [new RedisValue(_lockValue)]);
    }
}
```

### 5.2 Sử dụng trong ASP.NET 10 Minimal API

```csharp
app.MapPost("/api/orders", async (
    OrderRequest req,
    IConnectionMultiplexer redis,
    OrderService orderService) =>
{
    var db = redis.GetDatabase();
    await using var lockObj = new RedisDistributedLock(
        db, $"order:product:{req.ProductId}", TimeSpan.FromSeconds(30));

if (!await lockObj.AcquireAsync(TimeSpan.FromSeconds(5)))
        return Results.Conflict("Sản phẩm đang được xử lý bởi request khác");

var result = await orderService.PlaceOrderAsync(req);
    return Results.Ok(result);
});
```

### 5.3 Redlock với RedLock.net

```csharp
// Đăng ký trong DI container
builder.Services.AddSingleton<IDistributedLockFactory>(sp =>
{
    var endpoints = new List<RedLockEndPoint>
    {
        new DnsEndPoint("redis-1.internal", 6379),
        new DnsEndPoint("redis-2.internal", 6379),
        new DnsEndPoint("redis-3.internal", 6379),
        new DnsEndPoint("redis-4.internal", 6379),
        new DnsEndPoint("redis-5.internal", 6379),
    };
    return RedLockFactory.Create(endpoints);
});

// Sử dụng
app.MapPost("/api/payments/{id}/process", async (
    string id,
    IDistributedLockFactory lockFactory,
    PaymentService paymentService) =>
{
    var resource = $"payment:{id}";
    var expiry = TimeSpan.FromSeconds(30);
    var wait = TimeSpan.FromSeconds(10);
    var retry = TimeSpan.FromMilliseconds(200);

await using var redLock = await lockFactory.CreateLockAsync(
        resource, expiry, wait, retry);

if (!redLock.IsAcquired)
        return Results.Conflict("Payment đang được xử lý");

var result = await paymentService.ProcessAsync(id);
    return Results.Ok(result);
});
```

## 6. PostgreSQL Advisory Lock — Giải pháp thay thế

Nếu hệ thống đã dùng PostgreSQL, bạn có thể tận dụng **Advisory Lock** mà không cần thêm Redis:

```csharp
public class PostgresAdvisoryLock
{
    private readonly NpgsqlConnection _conn;
    private readonly long _lockId;

public PostgresAdvisoryLock(NpgsqlConnection conn, string resource)
    {
        _conn = conn;
        _lockId = resource.GetHashCode();
    }

public async Task<bool> TryAcquireAsync(CancellationToken ct = default)
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_try_advisory_lock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        return (bool)(await cmd.ExecuteScalarAsync(ct))!;
    }

public async Task ReleaseAsync()
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_advisory_unlock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        await cmd.ExecuteNonQueryAsync();
    }
}
```

#### 💡 Khi nào dùng Advisory Lock thay vì Redis?

**Nên dùng khi:** Hệ thống đơn giản, đã có PostgreSQL, không muốn thêm dependency Redis.  
**Không nên khi:** Cần lock cross-database, latency cực thấp (<1ms), hoặc hệ thống có hàng nghìn lock đồng thời (Advisory Lock tiêu tốn shared memory của PostgreSQL).

## 7. Fencing Token — Giải quyết vấn đề Split-brain

Ngay cả với Redlock, vẫn có khả năng 2 client tin mình giữ lock (do GC pause, network delay). **Fencing token** là lớp bảo vệ cuối cùng.

```
sequenceDiagram
    participant C1 as Client 1
    participant LS as Lock Service
    participant DB as Database
    participant C2 as Client 2

C1->>LS: Acquire lock
    LS-->>C1: Lock granted, token=33
    Note over C1: GC pause 30 giây...
    C2->>LS: Acquire lock (lock đã hết hạn)
    LS-->>C2: Lock granted, token=34
    C2->>DB: WRITE (fencing_token=34) ✓
    Note over C1: GC pause kết thúc
    C1->>DB: WRITE (fencing_token=33)
    DB-->>C1: REJECTED! token 33 < 34
    Note over DB: Database từ chối write có token cũ

```

Fencing token bảo vệ data integrity ngay cả khi lock bị vi phạm

```csharp
public class FencedDistributedLock
{
    private static long _globalCounter = 0;

public long FencingToken { get; private set; }

public async Task<bool> AcquireAsync(IDatabase db, string resource)
    {
        var token = Interlocked.Increment(ref _globalCounter);
        var value = $"{Environment.MachineName}:{token}";

var acquired = await db.StringSetAsync(
            $"lock:{resource}", value,
            TimeSpan.FromSeconds(30), When.NotExists);

if (acquired)
        {
            FencingToken = token;
            return true;
        }
        return false;
    }
}

// Trong repository layer
public async Task UpdateInventoryAsync(
    int productId, int quantity, long fencingToken)
{
    var rows = await _db.ExecuteAsync("""
        UPDATE Inventory
        SET Stock = Stock - @quantity, FencingToken = @token
        WHERE ProductId = @productId
          AND FencingToken < @token
        """,
        new { quantity, token = fencingToken, productId });

if (rows == 0)
        throw new StaleTokenException("Fencing token rejected");
}
```

## 8. So sánh hiệu suất các giải pháp

| Tiêu chí | Redis SET NX | Redlock (5 nodes) | PostgreSQL Advisory | ZooKeeper |
| --- | --- | --- | --- | --- |
| **Latency acquire** | ~0.5ms | ~3-5ms | ~1-2ms | ~5-10ms |
| **Throughput** | ~100K ops/s | ~20K ops/s | ~50K ops/s | ~10K ops/s |
| **Safety level** | Trung bình | Cao | Cao | Rất cao |
| **Fault tolerance** | Thấp (single point) | Cao (N/2+1) | Theo DB replication | Cao (quorum) |
| **Thêm dependency** | Redis | 5 Redis instances | Không (dùng DB sẵn) | ZooKeeper cluster |
| **Complexity** | Đơn giản | Trung bình | Đơn giản | Phức tạp |
| **Auto-release khi crash** | Có (TTL) | Có (TTL) | Có (session end) | Có (ephemeral node) |
| **Phù hợp cho** | Cache stampede, rate limit | Payment, inventory | Cron job, batch | Leader election |

## 9. Anti-patterns và lỗi thường gặp

### ❌ Anti-pattern 1: Lock không có TTL

```csharp
// SAI: Nếu process crash, lock không bao giờ được release
await db.StringSetAsync("lock:order", "1", when: When.NotExists);
// ... process crash ở đây → deadlock vĩnh viễn

// ĐÚNG: Luôn set expiry
await db.StringSetAsync("lock:order", "1",
    TimeSpan.FromSeconds(30), When.NotExists);
```

### ❌ Anti-pattern 2: Release lock không kiểm tra owner

```csharp
// SAI: Có thể xóa lock của client khác
await db.KeyDeleteAsync("lock:order");

// ĐÚNG: Dùng Lua script kiểm tra owner trước khi delete
const string script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    end
    return 0
    """;
await db.ScriptEvaluateAsync(script, ...);
```

### ❌ Anti-pattern 3: Lock TTL quá ngắn

```csharp
// SAI: TTL 2s nhưng operation có thể mất 5s
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(2));
await GenerateReportAsync(); // mất 5s → lock hết hạn giữa chừng!

// ĐÚNG: TTL phải lớn hơn worst-case execution time
// Kết hợp lock extension (renew) nếu cần
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(60));
```

### ❌ Anti-pattern 4: Retry không có backoff

```csharp
// SAI: Tight loop gây CPU spike và Redis overload
while (!acquired) { acquired = await TryAcquire(); }

// ĐÚNG: Exponential backoff + jitter
var delay = 50;
while (!acquired && DateTime.UtcNow &lt; deadline)
{
    acquired = await TryAcquire();
    if (!acquired)
    {
        var jitter = Random.Shared.Next(0, delay / 2);
        await Task.Delay(delay + jitter, ct);
        delay = Math.Min(delay * 2, 1000);
    }
}
```

## 10. Production-ready Patterns

### 10.1 Lock Extension (Auto-renewal)

Khi operation có thể chạy lâu hơn TTL, bạn cần tự động gia hạn lock:

```csharp
public class AutoRenewingLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _key;
    private readonly string _value;
    private readonly CancellationTokenSource _renewCts = new();
    private Task? _renewTask;

public async Task<bool> AcquireAsync(TimeSpan ttl)
    {
        var acquired = await _db.StringSetAsync(
            _key, _value, ttl, When.NotExists);

if (acquired)
        {
            _renewTask = RenewLoopAsync(ttl, _renewCts.Token);
        }
        return acquired;
    }

private async Task RenewLoopAsync(TimeSpan ttl, CancellationToken ct)
    {
        var renewInterval = ttl / 3;
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(renewInterval, ct);

const string script = """
                if redis.call('get', KEYS[1]) == ARGV[1] then
                    return redis.call('pexpire', KEYS[1], ARGV[2])
                end
                return 0
                """;
            await _db.ScriptEvaluateAsync(script,
                [new RedisKey(_key)],
                [new RedisValue(_value),
                 new RedisValue(((int)ttl.TotalMilliseconds).ToString())]);
        }
    }

public async ValueTask DisposeAsync()
    {
        await _renewCts.CancelAsync();
        if (_renewTask != null) await _renewTask;
        // Release lock...
    }
}
```

### 10.2 Lock với Observability

```csharp
public class ObservableDistributedLock
{
    private static readonly Meter Meter = new("DistributedLock");
    private static readonly Counter<long> AcquiredCounter =
        Meter.CreateCounter<long>("lock.acquired");
    private static readonly Counter<long> FailedCounter =
        Meter.CreateCounter<long>("lock.failed");
    private static readonly Histogram<double> AcquireLatency =
        Meter.CreateHistogram<double>("lock.acquire.duration.ms");

public async Task<bool> AcquireAsync(string resource, TimeSpan expiry)
    {
        var sw = Stopwatch.StartNew();
        var acquired = await InternalAcquireAsync(resource, expiry);
        sw.Stop();

AcquireLatency.Record(sw.Elapsed.TotalMilliseconds,
            new("resource", resource));

if (acquired)
            AcquiredCounter.Add(1, new("resource", resource));
        else
            FailedCounter.Add(1, new("resource", resource));

return acquired;
    }
}
```

```
graph LR
    A[Request đến] --> B{Acquire Lock}
    B -->|Thành công| C[Execute Critical Section]
    B -->|Thất bại sau retry| D[Return 409 Conflict]
    C --> E{Operation OK?}
    E -->|Có| F[Release Lock]
    E -->|Lỗi| G[Release Lock + Rollback]
    F --> H[Return 200 OK]
    G --> I[Return 500 Error]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff
    style H fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f44336,stroke:#fff,color:#fff

```

Production flow xử lý request với distributed lock

#### 💡 Checklist trước khi deploy Distributed Lock

✅ Lock có TTL — tránh deadlock khi process crash  
✅ Release kiểm tra owner — tránh xóa lock của client khác  
✅ Retry có exponential backoff + jitter — tránh thundering herd  
✅ Fencing token cho critical data — bảo vệ data integrity  
✅ Metrics & alerting — theo dõi lock contention và latency  
✅ TTL > worst-case execution time + buffer — hoặc dùng auto-renewal  
✅ Graceful degradation — hệ thống vẫn hoạt động (có thể chậm hơn) khi lock service down

### Nguồn tham khảo

- [Redis Distributed Locks Documentation](https://redis.io/docs/latest/develop/use/patterns/distributed-locks/)
- [Martin Kleppmann — How to do distributed locking](https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
- [StackExchange.Redis Documentation](https://learn.microsoft.com/en-us/dotnet/api/stackexchange.redis)
- [PostgreSQL Advisory Locks](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS)
- [RedLock.net — Distributed lock with Redis](https://github.com/samcook/RedLock.net)

WebGPU — Kỷ Nguyên Mới Của GPU Computing Trên Trình Duyệt

ClickHouse — Cơ sở dữ liệu phân tích thời gian thực cho hệ thống quy mô lớn

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.