Distributed Locking — Solving Race Conditions in Distributed Systems with Redis and .NET 10

Posted on: 4/25/2026 2:12:07 AM

67% Distributed systems hit race conditions without locking
<1ms Average Redis SETNX latency
5 nodes Minimum quorum for Redlock
99.99% Reliability when properly implemented

1. Race Conditions in Distributed Systems

Imagine your e-commerce system has 3 instances running concurrently. A product has only 1 unit left in stock, but 2 purchase requests arrive simultaneously on 2 different instances. Both read stock = 1, both decrement to stock = 0, and you end up selling 2 units when you only had 1.

This is a race condition — and it happens far more often than you'd think in distributed environments.

sequenceDiagram
    participant I1 as Instance 1
    participant DB as Database
    participant I2 as Instance 2

    I1->>DB: SELECT stock WHERE id=1
    I2->>DB: SELECT stock WHERE id=1
    DB-->>I1: stock = 1
    DB-->>I2: stock = 1
    I1->>DB: UPDATE stock = 0
    I2->>DB: UPDATE stock = 0
    Note over DB: ⚠️ Sold 2 items, only 1 in stock!
Race condition when 2 instances read and write concurrently without locking

In a monolith, you can use C#'s lock statement or Java's synchronized. But when your system is distributed across multiple processes on different servers, you need an external locking mechanism visible to all instances — that's a Distributed Lock.

2. What is a Distributed Lock and Why Do You Need One?

A Distributed Lock ensures that at any given time, only one process can execute a piece of code or access a specific resource, regardless of which server that process is running on.

Three Core Properties of a Distributed Lock

Safety (Mutual Exclusion): At most one client holds the lock at any time.
Liveness (Deadlock-free): Even if the lock holder crashes, the lock must eventually be released.
Fault Tolerance: The lock continues to function when parts of the system fail.

Common Use Cases

Use CaseDescriptionConsequence Without Lock
Inventory deductionDecrement stock on purchaseOverselling beyond available stock
Scheduled jobsRun cron job once across multiple instancesDuplicate emails, incorrect calculations
Rate limitingEnforce request limits in sliding windowQuota bypass, API abuse
Leader electionSelect one node as leader for task processingSplit-brain, inconsistent data
Payment processingEnsure transaction idempotencyDouble charges, financial loss
Cache stampede preventionSingle request rebuilds cache on expiryDatabase overload (thundering herd)

3. Redis Distributed Lock — From SETNX to Redlock

3.1 The Simple Approach: SET NX EX

Redis provides the atomic command SET key value NX EX timeout — sets the key only if it doesn't exist, with automatic expiration:

# Acquire lock
SET order:lock:12345 "instance-1-uuid" NX EX 30

# Release lock (only release if correct owner)
# Use Lua script to ensure atomicity
EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 order:lock:12345 "instance-1-uuid"
graph TD
    A[Client wants to acquire lock] --> B{SET key NX EX 30}
    B -->|OK - Lock acquired| C[Execute critical section]
    B -->|nil - Lock exists| D[Retry after delay]
    C --> E[Release lock via Lua script]
    E --> F{Value matches owner?}
    F -->|Yes| G[DEL key - Lock released]
    F -->|No| H[Skip - Lock belongs to another client]
    D --> B

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style H fill:#ff9800,stroke:#fff,color:#fff
Acquire and release flow with Redis SET NX

⚠️ Why Use a Lua Script to Release?

If you use two separate commands (GET then DEL), between them the lock might expire, another client acquires a new lock, and you accidentally delete their lock. Lua scripts run atomically on Redis, completely eliminating this risk.

3.2 Limitations of Single-node Redis Lock

This approach has a critical flaw when the Redis master fails:

  1. Client A acquires lock on Redis master
  2. Redis master crashes before replicating the lock to the replica
  3. Replica gets promoted to new master
  4. Client B acquires the same lock on the new master — mutual exclusion violated!

This is why Martin Kleppmann (author of "Designing Data-Intensive Applications") criticized the single-node approach, and Salvatore Sanfilippo (Redis creator) proposed the Redlock algorithm.

4. The Redlock Algorithm in Detail

Redlock uses N independent Redis nodes (recommended N=5) to ensure safety even when some nodes fail.

sequenceDiagram
    participant C as Client
    participant R1 as Redis Node 1
    participant R2 as Redis Node 2
    participant R3 as Redis Node 3
    participant R4 as Redis Node 4
    participant R5 as Redis Node 5

    Note over C: Step 1: Record timestamp T1
    C->>R1: SET lock NX EX 30
    R1-->>C: OK ✓
    C->>R2: SET lock NX EX 30
    R2-->>C: OK ✓
    C->>R3: SET lock NX EX 30
    R3-->>C: FAIL ✗
    C->>R4: SET lock NX EX 30
    R4-->>C: OK ✓
    C->>R5: SET lock NX EX 30
    R5-->>C: OK ✓

    Note over C: Step 2: Record timestamp T2
    Note over C: Acquired 4/5 nodes ≥ quorum (3)
    Note over C: Lock validity = 30s - (T2-T1)
    Note over C: If validity > 0 → Lock success!
Redlock algorithm: acquire lock on majority of nodes (quorum)

Detailed Redlock Steps

  1. Record start time T1
  2. Sequentially acquire lock on all N Redis nodes, each with a small timeout (a few ms) to avoid blocking if a node is down
  3. Calculate elapsed time elapsed = T2 - T1
  4. Lock succeeds if: acquired on ≥ N/2 + 1 nodes AND elapsed < lock TTL
  5. Actual lock validity = TTL - elapsed (lock expires sooner due to acquisition time)
  6. If failed, release lock on all nodes (including failed ones) for cleanup

The Kleppmann vs Sanfilippo Debate

Martin Kleppmann argued that Redlock isn't safe because GC pauses or clock skew can lead a client to believe it still holds the lock after expiration. Salvatore countered that with reasonable clock synchronization (NTP), Redlock is safe enough for most use cases. The pragmatic solution: combine Redlock with fencing tokens (section 7).

5. Implementing Distributed Locks with .NET 10

5.1 Using StackExchange.Redis

using StackExchange.Redis;

public class RedisDistributedLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _lockKey;
    private readonly string _lockValue;
    private readonly TimeSpan _expiry;
    private bool _acquired;

    public RedisDistributedLock(IDatabase db, string resource, TimeSpan expiry)
    {
        _db = db;
        _lockKey = $"lock:{resource}";
        _lockValue = Guid.NewGuid().ToString("N");
        _expiry = expiry;
    }

    public async Task<bool> AcquireAsync(TimeSpan timeout, CancellationToken ct = default)
    {
        var deadline = DateTime.UtcNow + timeout;
        while (DateTime.UtcNow < deadline)
        {
            _acquired = await _db.StringSetAsync(
                _lockKey, _lockValue, _expiry, When.NotExists);

            if (_acquired) return true;
            await Task.Delay(50, ct);
        }
        return false;
    }

    public async ValueTask DisposeAsync()
    {
        if (!_acquired) return;

        const string script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """;

        await _db.ScriptEvaluateAsync(script,
            [new RedisKey(_lockKey)],
            [new RedisValue(_lockValue)]);
    }
}

5.2 Usage in ASP.NET 10 Minimal API

app.MapPost("/api/orders", async (
    OrderRequest req,
    IConnectionMultiplexer redis,
    OrderService orderService) =>
{
    var db = redis.GetDatabase();
    await using var lockObj = new RedisDistributedLock(
        db, $"order:product:{req.ProductId}", TimeSpan.FromSeconds(30));

    if (!await lockObj.AcquireAsync(TimeSpan.FromSeconds(5)))
        return Results.Conflict("Product is being processed by another request");

    var result = await orderService.PlaceOrderAsync(req);
    return Results.Ok(result);
});

5.3 Redlock with RedLock.net

// Register in DI container
builder.Services.AddSingleton<IDistributedLockFactory>(sp =>
{
    var endpoints = new List<RedLockEndPoint>
    {
        new DnsEndPoint("redis-1.internal", 6379),
        new DnsEndPoint("redis-2.internal", 6379),
        new DnsEndPoint("redis-3.internal", 6379),
        new DnsEndPoint("redis-4.internal", 6379),
        new DnsEndPoint("redis-5.internal", 6379),
    };
    return RedLockFactory.Create(endpoints);
});

// Usage
app.MapPost("/api/payments/{id}/process", async (
    string id,
    IDistributedLockFactory lockFactory,
    PaymentService paymentService) =>
{
    var resource = $"payment:{id}";
    var expiry = TimeSpan.FromSeconds(30);
    var wait = TimeSpan.FromSeconds(10);
    var retry = TimeSpan.FromMilliseconds(200);

    await using var redLock = await lockFactory.CreateLockAsync(
        resource, expiry, wait, retry);

    if (!redLock.IsAcquired)
        return Results.Conflict("Payment is already being processed");

    var result = await paymentService.ProcessAsync(id);
    return Results.Ok(result);
});

6. PostgreSQL Advisory Lock — An Alternative Approach

If your system already uses PostgreSQL, you can leverage Advisory Locks without adding Redis as a dependency:

public class PostgresAdvisoryLock
{
    private readonly NpgsqlConnection _conn;
    private readonly long _lockId;

    public PostgresAdvisoryLock(NpgsqlConnection conn, string resource)
    {
        _conn = conn;
        _lockId = resource.GetHashCode();
    }

    public async Task<bool> TryAcquireAsync(CancellationToken ct = default)
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_try_advisory_lock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        return (bool)(await cmd.ExecuteScalarAsync(ct))!;
    }

    public async Task ReleaseAsync()
    {
        await using var cmd = new NpgsqlCommand(
            "SELECT pg_advisory_unlock(@id)", _conn);
        cmd.Parameters.AddWithValue("id", _lockId);
        await cmd.ExecuteNonQueryAsync();
    }
}

When to Use Advisory Locks Instead of Redis?

Use when: Simple system, already running PostgreSQL, don't want to add Redis as a dependency.
Avoid when: Cross-database locking needed, ultra-low latency required (<1ms), or system has thousands of concurrent locks (Advisory Locks consume PostgreSQL shared memory).

7. Fencing Tokens — Solving the Split-brain Problem

Even with Redlock, there's a possibility that 2 clients believe they hold the lock (due to GC pauses, network delays). Fencing tokens provide the last line of defense.

sequenceDiagram
    participant C1 as Client 1
    participant LS as Lock Service
    participant DB as Database
    participant C2 as Client 2

    C1->>LS: Acquire lock
    LS-->>C1: Lock granted, token=33
    Note over C1: GC pause for 30 seconds...
    C2->>LS: Acquire lock (lock expired)
    LS-->>C2: Lock granted, token=34
    C2->>DB: WRITE (fencing_token=34) ✓
    Note over C1: GC pause ends
    C1->>DB: WRITE (fencing_token=33)
    DB-->>C1: REJECTED! token 33 < 34
    Note over DB: Database rejects write with stale token
Fencing tokens protect data integrity even when lock safety is violated
public class FencedDistributedLock
{
    private static long _globalCounter = 0;

    public long FencingToken { get; private set; }

    public async Task<bool> AcquireAsync(IDatabase db, string resource)
    {
        var token = Interlocked.Increment(ref _globalCounter);
        var value = $"{Environment.MachineName}:{token}";

        var acquired = await db.StringSetAsync(
            $"lock:{resource}", value,
            TimeSpan.FromSeconds(30), When.NotExists);

        if (acquired)
        {
            FencingToken = token;
            return true;
        }
        return false;
    }
}

// In the repository layer
public async Task UpdateInventoryAsync(
    int productId, int quantity, long fencingToken)
{
    var rows = await _db.ExecuteAsync("""
        UPDATE Inventory
        SET Stock = Stock - @quantity, FencingToken = @token
        WHERE ProductId = @productId
          AND FencingToken < @token
        """,
        new { quantity, token = fencingToken, productId });

    if (rows == 0)
        throw new StaleTokenException("Fencing token rejected");
}

8. Performance Comparison Across Solutions

CriteriaRedis SET NXRedlock (5 nodes)PostgreSQL AdvisoryZooKeeper
Acquire latency~0.5ms~3-5ms~1-2ms~5-10ms
Throughput~100K ops/s~20K ops/s~50K ops/s~10K ops/s
Safety levelMediumHighHighVery High
Fault toleranceLow (single point)High (N/2+1)Depends on DB replicationHigh (quorum)
Additional dependencyRedis5 Redis instancesNone (uses existing DB)ZooKeeper cluster
ComplexitySimpleMediumSimpleComplex
Auto-release on crashYes (TTL)Yes (TTL)Yes (session end)Yes (ephemeral node)
Best forCache stampede, rate limitPayment, inventoryCron jobs, batchLeader election

9. Anti-patterns and Common Mistakes

Anti-pattern 1: Lock Without TTL

// WRONG: If process crashes, lock is never released
await db.StringSetAsync("lock:order", "1", when: When.NotExists);
// ... process crashes here → permanent deadlock

// CORRECT: Always set expiry
await db.StringSetAsync("lock:order", "1",
    TimeSpan.FromSeconds(30), When.NotExists);

Anti-pattern 2: Releasing Without Owner Check

// WRONG: May delete another client's lock
await db.KeyDeleteAsync("lock:order");

// CORRECT: Use Lua script to verify owner before delete
const string script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    end
    return 0
    """;
await db.ScriptEvaluateAsync(script, ...);

Anti-pattern 3: TTL Too Short

// WRONG: TTL 2s but operation may take 5s
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(2));
await GenerateReportAsync(); // takes 5s → lock expires mid-operation!

// CORRECT: TTL must exceed worst-case execution time
// Combine with lock extension (renewal) if needed
await using var lockObj = new RedisDistributedLock(
    db, "report", TimeSpan.FromSeconds(60));

Anti-pattern 4: Retry Without Backoff

// WRONG: Tight loop causes CPU spike and Redis overload
while (!acquired) { acquired = await TryAcquire(); }

// CORRECT: Exponential backoff + jitter
var delay = 50;
while (!acquired && DateTime.UtcNow < deadline)
{
    acquired = await TryAcquire();
    if (!acquired)
    {
        var jitter = Random.Shared.Next(0, delay / 2);
        await Task.Delay(delay + jitter, ct);
        delay = Math.Min(delay * 2, 1000);
    }
}

10. Production-ready Patterns

10.1 Lock Extension (Auto-renewal)

When an operation may run longer than the TTL, you need automatic lock renewal:

public class AutoRenewingLock : IAsyncDisposable
{
    private readonly IDatabase _db;
    private readonly string _key;
    private readonly string _value;
    private readonly CancellationTokenSource _renewCts = new();
    private Task? _renewTask;

    public async Task<bool> AcquireAsync(TimeSpan ttl)
    {
        var acquired = await _db.StringSetAsync(
            _key, _value, ttl, When.NotExists);

        if (acquired)
        {
            _renewTask = RenewLoopAsync(ttl, _renewCts.Token);
        }
        return acquired;
    }

    private async Task RenewLoopAsync(TimeSpan ttl, CancellationToken ct)
    {
        var renewInterval = ttl / 3;
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(renewInterval, ct);

            const string script = """
                if redis.call('get', KEYS[1]) == ARGV[1] then
                    return redis.call('pexpire', KEYS[1], ARGV[2])
                end
                return 0
                """;
            await _db.ScriptEvaluateAsync(script,
                [new RedisKey(_key)],
                [new RedisValue(_value),
                 new RedisValue(((int)ttl.TotalMilliseconds).ToString())]);
        }
    }

    public async ValueTask DisposeAsync()
    {
        await _renewCts.CancelAsync();
        if (_renewTask != null) await _renewTask;
        // Release lock...
    }
}

10.2 Lock with Observability

public class ObservableDistributedLock
{
    private static readonly Meter Meter = new("DistributedLock");
    private static readonly Counter<long> AcquiredCounter =
        Meter.CreateCounter<long>("lock.acquired");
    private static readonly Counter<long> FailedCounter =
        Meter.CreateCounter<long>("lock.failed");
    private static readonly Histogram<double> AcquireLatency =
        Meter.CreateHistogram<double>("lock.acquire.duration.ms");

    public async Task<bool> AcquireAsync(string resource, TimeSpan expiry)
    {
        var sw = Stopwatch.StartNew();
        var acquired = await InternalAcquireAsync(resource, expiry);
        sw.Stop();

        AcquireLatency.Record(sw.Elapsed.TotalMilliseconds,
            new("resource", resource));

        if (acquired)
            AcquiredCounter.Add(1, new("resource", resource));
        else
            FailedCounter.Add(1, new("resource", resource));

        return acquired;
    }
}
graph LR
    A[Request arrives] --> B{Acquire Lock}
    B -->|Success| C[Execute Critical Section]
    B -->|Failed after retry| D[Return 409 Conflict]
    C --> E{Operation OK?}
    E -->|Yes| F[Release Lock]
    E -->|Error| G[Release Lock + Rollback]
    F --> H[Return 200 OK]
    G --> I[Return 500 Error]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#e94560,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff
    style H fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f44336,stroke:#fff,color:#fff
Production flow for handling requests with distributed locks

Pre-deployment Checklist for Distributed Locks

✅ Lock has TTL — prevents deadlock on process crash
✅ Release checks owner — prevents deleting another client's lock
✅ Retry uses exponential backoff + jitter — prevents thundering herd
✅ Fencing tokens for critical data — protects data integrity
✅ Metrics & alerting — monitors lock contention and latency
✅ TTL > worst-case execution time + buffer — or use auto-renewal
✅ Graceful degradation — system still works (possibly slower) when lock service is down

Distributed Locking is one of the most important building blocks in distributed systems. Understanding the mechanisms, trade-offs between solutions, and applying the right patterns will help you build reliable systems that avoid subtle race conditions — the kind that only surface under heavy production load.

References