Distributed Locking — Solving Race Conditions in Distributed Systems with Redis and .NET 10

Posted on: 4/25/2026 2:12:07 AM

Table of contents

Table of Contents
1. Race Conditions in Distributed Systems
2. What is a Distributed Lock and Why Do You Need One?
1. Three Core Properties of a Distributed Lock
2. Common Use Cases
3. Redis Distributed Lock — From SETNX to Redlock
1. 3.1 The Simple Approach: SET NX EX
  1. ⚠️ Why Use a Lua Script to Release?
2. 3.2 Limitations of Single-node Redis Lock
4. The Redlock Algorithm in Detail
1. Detailed Redlock Steps
  1. The Kleppmann vs Sanfilippo Debate
5. Implementing Distributed Locks with .NET 10
6. PostgreSQL Advisory Lock — An Alternative Approach
1. When to Use Advisory Locks Instead of Redis?
7. Fencing Tokens — Solving the Split-brain Problem
8. Performance Comparison Across Solutions
9. Anti-patterns and Common Mistakes
10. Production-ready Patterns

1. Race Conditions in Distributed Systems
2. What is a Distributed Lock and Why Do You Need One?
3. Redis Distributed Lock — From SETNX to Redlock
4. The Redlock Algorithm in Detail
5. Implementing Distributed Locks with .NET 10
6. PostgreSQL Advisory Lock — An Alternative Approach
7. Fencing Tokens — Solving the Split-brain Problem
8. Performance Comparison Across Solutions
9. Anti-patterns and Common Mistakes
10. Production-ready Patterns

67% Distributed systems hit race conditions without locking

<1ms Average Redis SETNX latency

5 nodes Minimum quorum for Redlock

99.99% Reliability when properly implemented

1. Race Conditions in Distributed Systems

Imagine your e-commerce system has 3 instances running concurrently. A product has only 1 unit left in stock, but 2 purchase requests arrive simultaneously on 2 different instances. Both read stock = 1, both decrement to stock = 0, and you end up selling 2 units when you only had 1.

This is a race condition — and it happens far more often than you'd think in distributed environments.

sequenceDiagram
    participant I1 as Instance 1
    participant DB as Database
    participant I2 as Instance 2

    I1->>DB: SELECT stock WHERE id=1
    I2->>DB: SELECT stock WHERE id=1
    DB-->>I1: stock = 1
    DB-->>I2: stock = 1
    I1->>DB: UPDATE stock = 0
    I2->>DB: UPDATE stock = 0
    Note over DB: ⚠️ Sold 2 items, only 1 in stock!

Race condition when 2 instances read and write concurrently without locking

In a monolith, you can use C#'s lock statement or Java's synchronized. But when your system is distributed across multiple processes on different servers, you need an external locking mechanism visible to all instances — that's a Distributed Lock.

2. What is a Distributed Lock and Why Do You Need One?

A Distributed Lock ensures that at any given time, only one process can execute a piece of code or access a specific resource, regardless of which server that process is running on.

Three Core Properties of a Distributed Lock

Safety (Mutual Exclusion): At most one client holds the lock at any time.
Liveness (Deadlock-free): Even if the lock holder crashes, the lock must eventually be released.
Fault Tolerance: The lock continues to function when parts of the system fail.

Common Use Cases

Use Case	Description	Consequence Without Lock
Inventory deduction	Decrement stock on purchase	Overselling beyond available stock
Scheduled jobs	Run cron job once across multiple instances	Duplicate emails, incorrect calculations
Rate limiting	Enforce request limits in sliding window	Quota bypass, API abuse
Leader election	Select one node as leader for task processing	Split-brain, inconsistent data
Payment processing	Ensure transaction idempotency	Double charges, financial loss
Cache stampede prevention	Single request rebuilds cache on expiry	Database overload (thundering herd)

3. Redis Distributed Lock — From SETNX to Redlock

3.1 The Simple Approach: SET NX EX

Redis provides the atomic command SET key value NX EX timeout — sets the key only if it doesn't exist, with automatic expiration:

# Acquire lock
SET order:lock:12345 "instance-1-uuid" NX EX 30

# Release lock (only release if correct owner)
# Use Lua script to ensure atomicity
EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 order:lock:12345 "instance-1-uuid"

graph TD
    A[Client wants to acquire lock] --> B{SET key NX EX 30}
    B -->|OK - Lock acquired| C[Execute critical section]
    B -->|nil - Lock exists| D[Retry after delay]
    C --> E[Release lock via Lua script]
    E --> F{Value matches owner?}
    F -->|Yes| G[DEL key - Lock released]
    F -->|No| H[Skip - Lock belongs to another client]
    D --> B

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#ff9800,stroke:#fff,color:#fff

Acquire and release flow with Redis SET NX

⚠️ Why Use a Lua Script to Release?

If you use two separate commands (GET then DEL), between them the lock might expire, another client acquires a new lock, and you accidentally delete their lock. Lua scripts run atomically on Redis, completely eliminating this risk.

3.2 Limitations of Single-node Redis Lock

This approach has a critical flaw when the Redis master fails:

Client A acquires lock on Redis master
Redis master crashes before replicating the lock to the replica
Replica gets promoted to new master
Client B acquires the same lock on the new master — mutual exclusion violated!

This is why Martin Kleppmann (author of "Designing Data-Intensive Applications") criticized the single-node approach, and Salvatore Sanfilippo (Redis creator) proposed the Redlock algorithm.

4. The Redlock Algorithm in Detail

Redlock uses N independent Redis nodes (recommended N=5) to ensure safety even when some nodes fail.

sequenceDiagram
    participant C as Client
    participant R1 as Redis Node 1
    participant R2 as Redis Node 2
    participant R3 as Redis Node 3
    participant R4 as Redis Node 4
    participant R5 as Redis Node 5

    Note over C: Step 1: Record timestamp T1
    C->>R1: SET lock NX EX 30
    R1-->>C: OK ✓
    C->>R2: SET lock NX EX 30
    R2-->>C: OK ✓
    C->>R3: SET lock NX EX 30
    R3-->>C: FAIL ✗
    C->>R4: SET lock NX EX 30
    R4-->>C: OK ✓
    C->>R5: SET lock NX EX 30
    R5-->>C: OK ✓

    Note over C: Step 2: Record timestamp T2
    Note over C: Acquired 4/5 nodes ≥ quorum (3)
    Note over C: Lock validity = 30s - (T2-T1)
    Note over C: If validity > 0 → Lock success!

Redlock algorithm: acquire lock on majority of nodes (quorum)

Detailed Redlock Steps

Record start time T1
Sequentially acquire lock on all N Redis nodes, each with a small timeout (a few ms) to avoid blocking if a node is down
Calculate elapsed time elapsed = T2 - T1
Lock succeeds if: acquired on ≥ N/2 + 1 nodes AND elapsed < lock TTL
Actual lock validity = TTL - elapsed (lock expires sooner due to acquisition time)
If failed, release lock on all nodes (including failed ones) for cleanup

The Kleppmann vs Sanfilippo Debate

Martin Kleppmann argued that Redlock isn't safe because GC pauses or clock skew can lead a client to believe it still holds the lock after expiration. Salvatore countered that with reasonable clock synchronization (NTP), Redlock is safe enough for most use cases. The pragmatic solution: combine Redlock with fencing tokens (section 7).

5. Implementing Distributed Locks with .NET 10

5.1 Using StackExchange.Redis

using StackExchange.Redis;

public class RedisDistributedLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _lockKey;
    private readonly string _lockValue;
    private readonly TimeSpan _expiry;
    private bool _acquired;

    public RedisDistributedLock(IDatabase db, string resource, TimeSpan expiry)
    {
        _db = db;
        _lockKey = $"lock:{resource}";
        _lockValue = Guid.NewGuid().ToString("N");
        _expiry = expiry;
    }

    public async Task<bool> AcquireAsync(TimeSpan timeout, CancellationToken ct = default)
    {
        var deadline = DateTime.UtcNow + timeout;
        while (DateTime.UtcNow < deadline)
        {
            _acquired = await _db.StringSetAsync(
                _lockKey, _lockValue, _expiry, When.NotExists);

            if (_acquired) return true;
            await Task.Delay(50, ct);
        }
        return false;
    }

    public async ValueTask DisposeAsync()
    {
        if (!_acquired) return;

        const string script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """;

        await _db.ScriptEvaluateAsync(script,
            [new RedisKey(_lockKey)],
            [new RedisValue(_lockValue)]);
    }
}

5.2 Usage in ASP.NET 10 Minimal API

app.MapPost("/api/orders", async (
    OrderRequest req,
    IConnectionMultiplexer redis,
    OrderService orderService) =>
{
    var db = redis.GetDatabase();
    await using var lockObj = new RedisDistributedLock(
        db, $"order:product:{req.ProductId}", TimeSpan.FromSeconds(30));

    if (!await lockObj.AcquireAsync(TimeSpan.FromSeconds(5)))
        return Results.Conflict("Product is being processed by another request");

    var result = await orderService.PlaceOrderAsync(req);
    return Results.Ok(result);
});

5.3 Redlock with RedLock.net

// Register in DI container
builder.Services.AddSingleton<IDistributedLockFactory>(sp =>
{
    var endpoints = new List<RedLockEndPoint>
    {
        new DnsEndPoint("redis-1.internal", 6379),
        new DnsEndPoint("redis-2.internal", 6379),
        new DnsEndPoint("redis-3.internal", 6379),
        new DnsEndPoint("redis-4.internal", 6379),
        new DnsEndPoint("redis-5.internal", 6379),
    };
    return RedLockFactory.Create(endpoints);
});

// Usage
app.MapPost("/api/payments/{id}/process", async (
    string id,
    IDistributedLockFactory lockFactory,
    PaymentService paymentService) =>
{
    var resource = $"payment:{id}";
    var expiry = TimeSpan.FromSeconds(30);
    var wait = TimeSpan.FromSeconds(10);
    var retry = TimeSpan.FromMilliseconds(200);

    await using var redLock = await lockFactory.CreateLockAsync(
        resource, expiry, wait, retry);

    if (!redLock.IsAcquired)
        return Results.Conflict("Payment is already being processed");

    var result = await paymentService.ProcessAsync(id);
    return Results.Ok(result);
});

6. PostgreSQL Advisory Lock — An Alternative Approach

If your system already uses PostgreSQL, you can leverage Advisory Locks without adding Redis as a dependency:

public class PostgresAdvisoryLock
{
    private readonly NpgsqlConnection _conn;
    private readonly long _lockId;

    public PostgresAdvisoryLock(NpgsqlConnection conn, string resource)
    {
        _conn = conn;
        _lockId = resource.GetHashCode();
    }

    public async Task<bool> TryAcquireAsync(CancellationToken ct = default)
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_try_advisory_lock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        return (bool)(await cmd.ExecuteScalarAsync(ct))!;
    }

    public async Task ReleaseAsync()
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_advisory_unlock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        await cmd.ExecuteNonQueryAsync();
    }
}

When to Use Advisory Locks Instead of Redis?

Use when: Simple system, already running PostgreSQL, don't want to add Redis as a dependency.
Avoid when: Cross-database locking needed, ultra-low latency required (<1ms), or system has thousands of concurrent locks (Advisory Locks consume PostgreSQL shared memory).

7. Fencing Tokens — Solving the Split-brain Problem

Even with Redlock, there's a possibility that 2 clients believe they hold the lock (due to GC pauses, network delays). Fencing tokens provide the last line of defense.

sequenceDiagram
    participant C1 as Client 1
    participant LS as Lock Service
    participant DB as Database
    participant C2 as Client 2

    C1->>LS: Acquire lock
    LS-->>C1: Lock granted, token=33
    Note over C1: GC pause for 30 seconds...
    C2->>LS: Acquire lock (lock expired)
    LS-->>C2: Lock granted, token=34
    C2->>DB: WRITE (fencing_token=34) ✓
    Note over C1: GC pause ends
    C1->>DB: WRITE (fencing_token=33)
    DB-->>C1: REJECTED! token 33 < 34
    Note over DB: Database rejects write with stale token

Fencing tokens protect data integrity even when lock safety is violated

public class FencedDistributedLock
{
    private static long _globalCounter = 0;

    public long FencingToken { get; private set; }

    public async Task<bool> AcquireAsync(IDatabase db, string resource)
    {
        var token = Interlocked.Increment(ref _globalCounter);
        var value = $"{Environment.MachineName}:{token}";

        var acquired = await db.StringSetAsync(
            $"lock:{resource}", value,
            TimeSpan.FromSeconds(30), When.NotExists);

        if (acquired)
        {
            FencingToken = token;
            return true;
        }
        return false;
    }
}

// In the repository layer
public async Task UpdateInventoryAsync(
    int productId, int quantity, long fencingToken)
{
    var rows = await _db.ExecuteAsync("""
        UPDATE Inventory
        SET Stock = Stock - @quantity, FencingToken = @token
        WHERE ProductId = @productId
          AND FencingToken < @token
        """,
        new { quantity, token = fencingToken, productId });

    if (rows == 0)
        throw new StaleTokenException("Fencing token rejected");
}

8. Performance Comparison Across Solutions

Criteria	Redis SET NX	Redlock (5 nodes)	PostgreSQL Advisory	ZooKeeper
Acquire latency	~0.5ms	~3-5ms	~1-2ms	~5-10ms
Throughput	~100K ops/s	~20K ops/s	~50K ops/s	~10K ops/s
Safety level	Medium	High	High	Very High
Fault tolerance	Low (single point)	High (N/2+1)	Depends on DB replication	High (quorum)
Additional dependency	Redis	5 Redis instances	None (uses existing DB)	ZooKeeper cluster
Complexity	Simple	Medium	Simple	Complex
Auto-release on crash	Yes (TTL)	Yes (TTL)	Yes (session end)	Yes (ephemeral node)
Best for	Cache stampede, rate limit	Payment, inventory	Cron jobs, batch	Leader election

9. Anti-patterns and Common Mistakes

Anti-pattern 1: Lock Without TTL

// WRONG: If process crashes, lock is never released
await db.StringSetAsync("lock:order", "1", when: When.NotExists);
// ... process crashes here → permanent deadlock

// CORRECT: Always set expiry
await db.StringSetAsync("lock:order", "1",
    TimeSpan.FromSeconds(30), When.NotExists);

Anti-pattern 2: Releasing Without Owner Check

// WRONG: May delete another client's lock
await db.KeyDeleteAsync("lock:order");

// CORRECT: Use Lua script to verify owner before delete
const string script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    end
    return 0
    """;
await db.ScriptEvaluateAsync(script, ...);

Anti-pattern 3: TTL Too Short

// WRONG: TTL 2s but operation may take 5s
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(2));
await GenerateReportAsync(); // takes 5s → lock expires mid-operation!

// CORRECT: TTL must exceed worst-case execution time
// Combine with lock extension (renewal) if needed
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(60));

Anti-pattern 4: Retry Without Backoff

// WRONG: Tight loop causes CPU spike and Redis overload
while (!acquired) { acquired = await TryAcquire(); }

// CORRECT: Exponential backoff + jitter
var delay = 50;
while (!acquired && DateTime.UtcNow < deadline)
{
    acquired = await TryAcquire();
    if (!acquired)
    {
        var jitter = Random.Shared.Next(0, delay / 2);
        await Task.Delay(delay + jitter, ct);
        delay = Math.Min(delay * 2, 1000);
    }
}

10. Production-ready Patterns

10.1 Lock Extension (Auto-renewal)

When an operation may run longer than the TTL, you need automatic lock renewal:

public class AutoRenewingLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _key;
    private readonly string _value;
    private readonly CancellationTokenSource _renewCts = new();
    private Task? _renewTask;

    public async Task<bool> AcquireAsync(TimeSpan ttl)
    {
        var acquired = await _db.StringSetAsync(
            _key, _value, ttl, When.NotExists);

        if (acquired)
        {
            _renewTask = RenewLoopAsync(ttl, _renewCts.Token);
        }
        return acquired;
    }

    private async Task RenewLoopAsync(TimeSpan ttl, CancellationToken ct)
    {
        var renewInterval = ttl / 3;
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(renewInterval, ct);

            const string script = """
                if redis.call('get', KEYS[1]) == ARGV[1] then
                    return redis.call('pexpire', KEYS[1], ARGV[2])
                end
                return 0
                """;
            await _db.ScriptEvaluateAsync(script,
                [new RedisKey(_key)],
                [new RedisValue(_value),
                 new RedisValue(((int)ttl.TotalMilliseconds).ToString())]);
        }
    }

    public async ValueTask DisposeAsync()
    {
        await _renewCts.CancelAsync();
        if (_renewTask != null) await _renewTask;
        // Release lock...
    }
}

10.2 Lock with Observability

public class ObservableDistributedLock
{
    private static readonly Meter Meter = new("DistributedLock");
    private static readonly Counter<long> AcquiredCounter =
        Meter.CreateCounter<long>("lock.acquired");
    private static readonly Counter<long> FailedCounter =
        Meter.CreateCounter<long>("lock.failed");
    private static readonly Histogram<double> AcquireLatency =
        Meter.CreateHistogram<double>("lock.acquire.duration.ms");

    public async Task<bool> AcquireAsync(string resource, TimeSpan expiry)
    {
        var sw = Stopwatch.StartNew();
        var acquired = await InternalAcquireAsync(resource, expiry);
        sw.Stop();

        AcquireLatency.Record(sw.Elapsed.TotalMilliseconds,
            new("resource", resource));

        if (acquired)
            AcquiredCounter.Add(1, new("resource", resource));
        else
            FailedCounter.Add(1, new("resource", resource));

        return acquired;
    }
}

graph LR
    A[Request arrives] --> B{Acquire Lock}
    B -->|Success| C[Execute Critical Section]
    B -->|Failed after retry| D[Return 409 Conflict]
    C --> E{Operation OK?}
    E -->|Yes| F[Release Lock]
    E -->|Error| G[Release Lock + Rollback]
    F --> H[Return 200 OK]
    G --> I[Return 500 Error]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff
    style H fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f44336,stroke:#fff,color:#fff

Production flow for handling requests with distributed locks

Pre-deployment Checklist for Distributed Locks

✅ Lock has TTL — prevents deadlock on process crash
✅ Release checks owner — prevents deleting another client's lock
✅ Retry uses exponential backoff + jitter — prevents thundering herd
✅ Fencing tokens for critical data — protects data integrity
✅ Metrics & alerting — monitors lock contention and latency
✅ TTL > worst-case execution time + buffer — or use auto-renewal
✅ Graceful degradation — system still works (possibly slower) when lock service is down

Distributed Locking is one of the most important building blocks in distributed systems. Understanding the mechanisms, trade-offs between solutions, and applying the right patterns will help you build reliable systems that avoid subtle race conditions — the kind that only surface under heavy production load.

References

#system design #redis #PostgreSQL #Distributed Lock #.NET 10 #Redlock #Race Condition #Fencing Token

# Distributed Locking — Solving Race Conditions in Distributed Systems with Redis and .NET 10

### Table of Contents

- [1. Race Conditions in Distributed Systems](#race-condition)
- [2. What is a Distributed Lock and Why Do You Need One?](#distributed-lock-overview)
- [3. Redis Distributed Lock — From SETNX to Redlock](#redis-distributed-lock)
- [4. The Redlock Algorithm in Detail](#redlock-algorithm)
- [5. Implementing Distributed Locks with .NET 10](#dotnet10-implementation)
- [6. PostgreSQL Advisory Lock — An Alternative Approach](#postgresql-advisory-lock)
- [7. Fencing Tokens — Solving the Split-brain Problem](#fencing-token)
- [8. Performance Comparison Across Solutions](#performance-comparison)
- [9. Anti-patterns and Common Mistakes](#anti-patterns)
- [10. Production-ready Patterns](#production-patterns)

67% Distributed systems hit race conditions without locking

<1ms Average Redis SETNX latency

5 nodes Minimum quorum for Redlock

99.99% Reliability when properly implemented

## 1. Race Conditions in Distributed Systems

Imagine your e-commerce system has 3 instances running concurrently. A product has only 1 unit left in stock, but 2 purchase requests arrive simultaneously on 2 different instances. Both read `stock = 1`, both decrement to `stock = 0`, and you end up selling 2 units when you only had 1.

This is a **race condition** — and it happens far more often than you'd think in distributed environments.

```
sequenceDiagram
    participant I1 as Instance 1
    participant DB as Database
    participant I2 as Instance 2

I1->>DB: SELECT stock WHERE id=1
    I2->>DB: SELECT stock WHERE id=1
    DB-->>I1: stock = 1
    DB-->>I2: stock = 1
    I1->>DB: UPDATE stock = 0
    I2->>DB: UPDATE stock = 0
    Note over DB: ⚠️ Sold 2 items, only 1 in stock!

```

Race condition when 2 instances read and write concurrently without locking

In a monolith, you can use C#'s `lock` statement or Java's `synchronized`. But when your system is distributed across multiple processes on different servers, you need an **external** locking mechanism visible to all instances — that's a **Distributed Lock**.

## 2. What is a Distributed Lock and Why Do You Need One?

A Distributed Lock ensures that at any given time, only **one process** can execute a piece of code or access a specific resource, regardless of which server that process is running on.

#### Three Core Properties of a Distributed Lock

**Safety (Mutual Exclusion):** At most one client holds the lock at any time.  
**Liveness (Deadlock-free):** Even if the lock holder crashes, the lock must eventually be released.  
**Fault Tolerance:** The lock continues to function when parts of the system fail.

### Common Use Cases

| Use Case | Description | Consequence Without Lock |
| --- | --- | --- |
| **Inventory deduction** | Decrement stock on purchase | Overselling beyond available stock |
| **Scheduled jobs** | Run cron job once across multiple instances | Duplicate emails, incorrect calculations |
| **Rate limiting** | Enforce request limits in sliding window | Quota bypass, API abuse |
| **Leader election** | Select one node as leader for task processing | Split-brain, inconsistent data |
| **Payment processing** | Ensure transaction idempotency | Double charges, financial loss |
| **Cache stampede prevention** | Single request rebuilds cache on expiry | Database overload (thundering herd) |

## 3. Redis Distributed Lock — From SETNX to Redlock

### 3.1 The Simple Approach: SET NX EX

Redis provides the atomic command `SET key value NX EX timeout` — sets the key only if it doesn't exist, with automatic expiration:

```bash
# Acquire lock
SET order:lock:12345 "instance-1-uuid" NX EX 30

# Release lock (only release if correct owner)
# Use Lua script to ensure atomicity
EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 order:lock:12345 "instance-1-uuid"
```

```
graph TD
    A[Client wants to acquire lock] --> B{SET key NX EX 30}
    B -->|OK - Lock acquired| C[Execute critical section]
    B -->|nil - Lock exists| D[Retry after delay]
    C --> E[Release lock via Lua script]
    E --> F{Value matches owner?}
    F -->|Yes| G[DEL key - Lock released]
    F -->|No| H[Skip - Lock belongs to another client]
    D --> B

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#ff9800,stroke:#fff,color:#fff

```

Acquire and release flow with Redis SET NX

#### ⚠️ Why Use a Lua Script to Release?

If you use two separate commands (`GET` then `DEL`), between them the lock might expire, another client acquires a new lock, and you accidentally delete their lock. Lua scripts run atomically on Redis, completely eliminating this risk.

### 3.2 Limitations of Single-node Redis Lock

This approach has a critical flaw when the Redis master fails:

1. Client A acquires lock on Redis master
2. Redis master crashes **before replicating** the lock to the replica
3. Replica gets promoted to new master
4. Client B acquires the same lock on the new master — **mutual exclusion violated!**

This is why Martin Kleppmann (author of "Designing Data-Intensive Applications") criticized the single-node approach, and Salvatore Sanfilippo (Redis creator) proposed the **Redlock** algorithm.

## 4. The Redlock Algorithm in Detail

Redlock uses **N independent Redis nodes** (recommended N=5) to ensure safety even when some nodes fail.

```
sequenceDiagram
    participant C as Client
    participant R1 as Redis Node 1
    participant R2 as Redis Node 2
    participant R3 as Redis Node 3
    participant R4 as Redis Node 4
    participant R5 as Redis Node 5

Note over C: Step 1: Record timestamp T1
    C->>R1: SET lock NX EX 30
    R1-->>C: OK ✓
    C->>R2: SET lock NX EX 30
    R2-->>C: OK ✓
    C->>R3: SET lock NX EX 30
    R3-->>C: FAIL ✗
    C->>R4: SET lock NX EX 30
    R4-->>C: OK ✓
    C->>R5: SET lock NX EX 30
    R5-->>C: OK ✓

Note over C: Step 2: Record timestamp T2
    Note over C: Acquired 4/5 nodes ≥ quorum (3)
    Note over C: Lock validity = 30s - (T2-T1)
    Note over C: If validity > 0 → Lock success!

```

Redlock algorithm: acquire lock on majority of nodes (quorum)

### Detailed Redlock Steps

1. **Record start time** T1
2. **Sequentially acquire lock** on all N Redis nodes, each with a small timeout (a few ms) to avoid blocking if a node is down
3. **Calculate elapsed time** elapsed = T2 - T1
4. **Lock succeeds** if: acquired on ≥ N/2 + 1 nodes AND elapsed < lock TTL
5. **Actual lock validity** = TTL - elapsed (lock expires sooner due to acquisition time)
6. **If failed**, release lock on **all** nodes (including failed ones) for cleanup

#### The Kleppmann vs Sanfilippo Debate

## 5. Implementing Distributed Locks with .NET 10

### 5.1 Using StackExchange.Redis

```csharp
using StackExchange.Redis;

public class RedisDistributedLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _lockKey;
    private readonly string _lockValue;
    private readonly TimeSpan _expiry;
    private bool _acquired;

public RedisDistributedLock(IDatabase db, string resource, TimeSpan expiry)
    {
        _db = db;
        _lockKey = $"lock:{resource}";
        _lockValue = Guid.NewGuid().ToString("N");
        _expiry = expiry;
    }

public async Task<bool> AcquireAsync(TimeSpan timeout, CancellationToken ct = default)
    {
        var deadline = DateTime.UtcNow + timeout;
        while (DateTime.UtcNow < deadline)
        {
            _acquired = await _db.StringSetAsync(
                _lockKey, _lockValue, _expiry, When.NotExists);

if (_acquired) return true;
            await Task.Delay(50, ct);
        }
        return false;
    }

public async ValueTask DisposeAsync()
    {
        if (!_acquired) return;

const string script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """;

await _db.ScriptEvaluateAsync(script,
            [new RedisKey(_lockKey)],
            [new RedisValue(_lockValue)]);
    }
}
```

### 5.2 Usage in ASP.NET 10 Minimal API

```csharp
app.MapPost("/api/orders", async (
    OrderRequest req,
    IConnectionMultiplexer redis,
    OrderService orderService) =>
{
    var db = redis.GetDatabase();
    await using var lockObj = new RedisDistributedLock(
        db, $"order:product:{req.ProductId}", TimeSpan.FromSeconds(30));

if (!await lockObj.AcquireAsync(TimeSpan.FromSeconds(5)))
        return Results.Conflict("Product is being processed by another request");

var result = await orderService.PlaceOrderAsync(req);
    return Results.Ok(result);
});
```

### 5.3 Redlock with RedLock.net

```csharp
// Register in DI container
builder.Services.AddSingleton<IDistributedLockFactory>(sp =>
{
    var endpoints = new List<RedLockEndPoint>
    {
        new DnsEndPoint("redis-1.internal", 6379),
        new DnsEndPoint("redis-2.internal", 6379),
        new DnsEndPoint("redis-3.internal", 6379),
        new DnsEndPoint("redis-4.internal", 6379),
        new DnsEndPoint("redis-5.internal", 6379),
    };
    return RedLockFactory.Create(endpoints);
});

// Usage
app.MapPost("/api/payments/{id}/process", async (
    string id,
    IDistributedLockFactory lockFactory,
    PaymentService paymentService) =>
{
    var resource = $"payment:{id}";
    var expiry = TimeSpan.FromSeconds(30);
    var wait = TimeSpan.FromSeconds(10);
    var retry = TimeSpan.FromMilliseconds(200);

await using var redLock = await lockFactory.CreateLockAsync(
        resource, expiry, wait, retry);

if (!redLock.IsAcquired)
        return Results.Conflict("Payment is already being processed");

var result = await paymentService.ProcessAsync(id);
    return Results.Ok(result);
});
```

## 6. PostgreSQL Advisory Lock — An Alternative Approach

If your system already uses PostgreSQL, you can leverage **Advisory Locks** without adding Redis as a dependency:

```csharp
public class PostgresAdvisoryLock
{
    private readonly NpgsqlConnection _conn;
    private readonly long _lockId;

public PostgresAdvisoryLock(NpgsqlConnection conn, string resource)
    {
        _conn = conn;
        _lockId = resource.GetHashCode();
    }

public async Task<bool> TryAcquireAsync(CancellationToken ct = default)
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_try_advisory_lock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        return (bool)(await cmd.ExecuteScalarAsync(ct))!;
    }

public async Task ReleaseAsync()
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_advisory_unlock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        await cmd.ExecuteNonQueryAsync();
    }
}
```

#### When to Use Advisory Locks Instead of Redis?

**Use when:** Simple system, already running PostgreSQL, don't want to add Redis as a dependency.  
**Avoid when:** Cross-database locking needed, ultra-low latency required (<1ms), or system has thousands of concurrent locks (Advisory Locks consume PostgreSQL shared memory).

## 7. Fencing Tokens — Solving the Split-brain Problem

Even with Redlock, there's a possibility that 2 clients believe they hold the lock (due to GC pauses, network delays). **Fencing tokens** provide the last line of defense.

```
sequenceDiagram
    participant C1 as Client 1
    participant LS as Lock Service
    participant DB as Database
    participant C2 as Client 2

C1->>LS: Acquire lock
    LS-->>C1: Lock granted, token=33
    Note over C1: GC pause for 30 seconds...
    C2->>LS: Acquire lock (lock expired)
    LS-->>C2: Lock granted, token=34
    C2->>DB: WRITE (fencing_token=34) ✓
    Note over C1: GC pause ends
    C1->>DB: WRITE (fencing_token=33)
    DB-->>C1: REJECTED! token 33 < 34
    Note over DB: Database rejects write with stale token

```

Fencing tokens protect data integrity even when lock safety is violated

```csharp
public class FencedDistributedLock
{
    private static long _globalCounter = 0;

public long FencingToken { get; private set; }

public async Task<bool> AcquireAsync(IDatabase db, string resource)
    {
        var token = Interlocked.Increment(ref _globalCounter);
        var value = $"{Environment.MachineName}:{token}";

var acquired = await db.StringSetAsync(
            $"lock:{resource}", value,
            TimeSpan.FromSeconds(30), When.NotExists);

if (acquired)
        {
            FencingToken = token;
            return true;
        }
        return false;
    }
}

// In the repository layer
public async Task UpdateInventoryAsync(
    int productId, int quantity, long fencingToken)
{
    var rows = await _db.ExecuteAsync("""
        UPDATE Inventory
        SET Stock = Stock - @quantity, FencingToken = @token
        WHERE ProductId = @productId
          AND FencingToken < @token
        """,
        new { quantity, token = fencingToken, productId });

if (rows == 0)
        throw new StaleTokenException("Fencing token rejected");
}
```

## 8. Performance Comparison Across Solutions

| Criteria | Redis SET NX | Redlock (5 nodes) | PostgreSQL Advisory | ZooKeeper |
| --- | --- | --- | --- | --- |
| **Acquire latency** | ~0.5ms | ~3-5ms | ~1-2ms | ~5-10ms |
| **Throughput** | ~100K ops/s | ~20K ops/s | ~50K ops/s | ~10K ops/s |
| **Safety level** | Medium | High | High | Very High |
| **Fault tolerance** | Low (single point) | High (N/2+1) | Depends on DB replication | High (quorum) |
| **Additional dependency** | Redis | 5 Redis instances | None (uses existing DB) | ZooKeeper cluster |
| **Complexity** | Simple | Medium | Simple | Complex |
| **Auto-release on crash** | Yes (TTL) | Yes (TTL) | Yes (session end) | Yes (ephemeral node) |
| **Best for** | Cache stampede, rate limit | Payment, inventory | Cron jobs, batch | Leader election |

## 9. Anti-patterns and Common Mistakes

### Anti-pattern 1: Lock Without TTL

```csharp
// WRONG: If process crashes, lock is never released
await db.StringSetAsync("lock:order", "1", when: When.NotExists);
// ... process crashes here → permanent deadlock

// CORRECT: Always set expiry
await db.StringSetAsync("lock:order", "1",
    TimeSpan.FromSeconds(30), When.NotExists);
```

### Anti-pattern 2: Releasing Without Owner Check

```csharp
// WRONG: May delete another client's lock
await db.KeyDeleteAsync("lock:order");

// CORRECT: Use Lua script to verify owner before delete
const string script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    end
    return 0
    """;
await db.ScriptEvaluateAsync(script, ...);
```

### Anti-pattern 3: TTL Too Short

```csharp
// WRONG: TTL 2s but operation may take 5s
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(2));
await GenerateReportAsync(); // takes 5s → lock expires mid-operation!

// CORRECT: TTL must exceed worst-case execution time
// Combine with lock extension (renewal) if needed
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(60));
```

### Anti-pattern 4: Retry Without Backoff

```csharp
// WRONG: Tight loop causes CPU spike and Redis overload
while (!acquired) { acquired = await TryAcquire(); }

// CORRECT: Exponential backoff + jitter
var delay = 50;
while (!acquired && DateTime.UtcNow &lt; deadline)
{
    acquired = await TryAcquire();
    if (!acquired)
    {
        var jitter = Random.Shared.Next(0, delay / 2);
        await Task.Delay(delay + jitter, ct);
        delay = Math.Min(delay * 2, 1000);
    }
}
```

## 10. Production-ready Patterns

### 10.1 Lock Extension (Auto-renewal)

When an operation may run longer than the TTL, you need automatic lock renewal:

```csharp
public class AutoRenewingLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _key;
    private readonly string _value;
    private readonly CancellationTokenSource _renewCts = new();
    private Task? _renewTask;

public async Task<bool> AcquireAsync(TimeSpan ttl)
    {
        var acquired = await _db.StringSetAsync(
            _key, _value, ttl, When.NotExists);

if (acquired)
        {
            _renewTask = RenewLoopAsync(ttl, _renewCts.Token);
        }
        return acquired;
    }

private async Task RenewLoopAsync(TimeSpan ttl, CancellationToken ct)
    {
        var renewInterval = ttl / 3;
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(renewInterval, ct);

const string script = """
                if redis.call('get', KEYS[1]) == ARGV[1] then
                    return redis.call('pexpire', KEYS[1], ARGV[2])
                end
                return 0
                """;
            await _db.ScriptEvaluateAsync(script,
                [new RedisKey(_key)],
                [new RedisValue(_value),
                 new RedisValue(((int)ttl.TotalMilliseconds).ToString())]);
        }
    }

public async ValueTask DisposeAsync()
    {
        await _renewCts.CancelAsync();
        if (_renewTask != null) await _renewTask;
        // Release lock...
    }
}
```

### 10.2 Lock with Observability

```csharp
public class ObservableDistributedLock
{
    private static readonly Meter Meter = new("DistributedLock");
    private static readonly Counter<long> AcquiredCounter =
        Meter.CreateCounter<long>("lock.acquired");
    private static readonly Counter<long> FailedCounter =
        Meter.CreateCounter<long>("lock.failed");
    private static readonly Histogram<double> AcquireLatency =
        Meter.CreateHistogram<double>("lock.acquire.duration.ms");

public async Task<bool> AcquireAsync(string resource, TimeSpan expiry)
    {
        var sw = Stopwatch.StartNew();
        var acquired = await InternalAcquireAsync(resource, expiry);
        sw.Stop();

AcquireLatency.Record(sw.Elapsed.TotalMilliseconds,
            new("resource", resource));

if (acquired)
            AcquiredCounter.Add(1, new("resource", resource));
        else
            FailedCounter.Add(1, new("resource", resource));

return acquired;
    }
}
```

```
graph LR
    A[Request arrives] --> B{Acquire Lock}
    B -->|Success| C[Execute Critical Section]
    B -->|Failed after retry| D[Return 409 Conflict]
    C --> E{Operation OK?}
    E -->|Yes| F[Release Lock]
    E -->|Error| G[Release Lock + Rollback]
    F --> H[Return 200 OK]
    G --> I[Return 500 Error]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff
    style H fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f44336,stroke:#fff,color:#fff

```

Production flow for handling requests with distributed locks

#### Pre-deployment Checklist for Distributed Locks

✅ Lock has TTL — prevents deadlock on process crash  
✅ Release checks owner — prevents deleting another client's lock  
✅ Retry uses exponential backoff + jitter — prevents thundering herd  
✅ Fencing tokens for critical data — protects data integrity  
✅ Metrics & alerting — monitors lock contention and latency  
✅ TTL > worst-case execution time + buffer — or use auto-renewal  
✅ Graceful degradation — system still works (possibly slower) when lock service is down

### References

- [Redis Distributed Locks Documentation](https://redis.io/docs/latest/develop/use/patterns/distributed-locks/)
- [Martin Kleppmann — How to do distributed locking](https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)
- [StackExchange.Redis Documentation](https://learn.microsoft.com/en-us/dotnet/api/stackexchange.redis)
- [PostgreSQL Advisory Locks](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS)
- [RedLock.net — Distributed lock with Redis](https://github.com/samcook/RedLock.net)

WebGPU — The New Era of GPU Computing in the Browser

ClickHouse — Real-Time Analytics Database for Large-Scale Systems

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.