Distributed Locking — Solving Race Conditions in Distributed Systems with Redis and .NET 10
Posted on: 4/25/2026 2:12:07 AM
Table of contents
- Table of Contents
- 1. Race Conditions in Distributed Systems
- 2. What is a Distributed Lock and Why Do You Need One?
- 3. Redis Distributed Lock — From SETNX to Redlock
- 4. The Redlock Algorithm in Detail
- 5. Implementing Distributed Locks with .NET 10
- 6. PostgreSQL Advisory Lock — An Alternative Approach
- 7. Fencing Tokens — Solving the Split-brain Problem
- 8. Performance Comparison Across Solutions
- 9. Anti-patterns and Common Mistakes
- 10. Production-ready Patterns
Table of Contents
- 1. Race Conditions in Distributed Systems
- 2. What is a Distributed Lock and Why Do You Need One?
- 3. Redis Distributed Lock — From SETNX to Redlock
- 4. The Redlock Algorithm in Detail
- 5. Implementing Distributed Locks with .NET 10
- 6. PostgreSQL Advisory Lock — An Alternative Approach
- 7. Fencing Tokens — Solving the Split-brain Problem
- 8. Performance Comparison Across Solutions
- 9. Anti-patterns and Common Mistakes
- 10. Production-ready Patterns
1. Race Conditions in Distributed Systems
Imagine your e-commerce system has 3 instances running concurrently. A product has only 1 unit left in stock, but 2 purchase requests arrive simultaneously on 2 different instances. Both read stock = 1, both decrement to stock = 0, and you end up selling 2 units when you only had 1.
This is a race condition — and it happens far more often than you'd think in distributed environments.
sequenceDiagram
participant I1 as Instance 1
participant DB as Database
participant I2 as Instance 2
I1->>DB: SELECT stock WHERE id=1
I2->>DB: SELECT stock WHERE id=1
DB-->>I1: stock = 1
DB-->>I2: stock = 1
I1->>DB: UPDATE stock = 0
I2->>DB: UPDATE stock = 0
Note over DB: ⚠️ Sold 2 items, only 1 in stock!
In a monolith, you can use C#'s lock statement or Java's synchronized. But when your system is distributed across multiple processes on different servers, you need an external locking mechanism visible to all instances — that's a Distributed Lock.
2. What is a Distributed Lock and Why Do You Need One?
A Distributed Lock ensures that at any given time, only one process can execute a piece of code or access a specific resource, regardless of which server that process is running on.
Three Core Properties of a Distributed Lock
Safety (Mutual Exclusion): At most one client holds the lock at any time.
Liveness (Deadlock-free): Even if the lock holder crashes, the lock must eventually be released.
Fault Tolerance: The lock continues to function when parts of the system fail.
Common Use Cases
| Use Case | Description | Consequence Without Lock |
|---|---|---|
| Inventory deduction | Decrement stock on purchase | Overselling beyond available stock |
| Scheduled jobs | Run cron job once across multiple instances | Duplicate emails, incorrect calculations |
| Rate limiting | Enforce request limits in sliding window | Quota bypass, API abuse |
| Leader election | Select one node as leader for task processing | Split-brain, inconsistent data |
| Payment processing | Ensure transaction idempotency | Double charges, financial loss |
| Cache stampede prevention | Single request rebuilds cache on expiry | Database overload (thundering herd) |
3. Redis Distributed Lock — From SETNX to Redlock
3.1 The Simple Approach: SET NX EX
Redis provides the atomic command SET key value NX EX timeout — sets the key only if it doesn't exist, with automatic expiration:
# Acquire lock
SET order:lock:12345 "instance-1-uuid" NX EX 30
# Release lock (only release if correct owner)
# Use Lua script to ensure atomicity
EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 order:lock:12345 "instance-1-uuid"
graph TD
A[Client wants to acquire lock] --> B{SET key NX EX 30}
B -->|OK - Lock acquired| C[Execute critical section]
B -->|nil - Lock exists| D[Retry after delay]
C --> E[Release lock via Lua script]
E --> F{Value matches owner?}
F -->|Yes| G[DEL key - Lock released]
F -->|No| H[Skip - Lock belongs to another client]
D --> B
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style C fill:#e94560,stroke:#fff,color:#fff
style G fill:#4CAF50,stroke:#fff,color:#fff
style H fill:#ff9800,stroke:#fff,color:#fff
⚠️ Why Use a Lua Script to Release?
If you use two separate commands (GET then DEL), between them the lock might expire, another client acquires a new lock, and you accidentally delete their lock. Lua scripts run atomically on Redis, completely eliminating this risk.
3.2 Limitations of Single-node Redis Lock
This approach has a critical flaw when the Redis master fails:
- Client A acquires lock on Redis master
- Redis master crashes before replicating the lock to the replica
- Replica gets promoted to new master
- Client B acquires the same lock on the new master — mutual exclusion violated!
This is why Martin Kleppmann (author of "Designing Data-Intensive Applications") criticized the single-node approach, and Salvatore Sanfilippo (Redis creator) proposed the Redlock algorithm.
4. The Redlock Algorithm in Detail
Redlock uses N independent Redis nodes (recommended N=5) to ensure safety even when some nodes fail.
sequenceDiagram
participant C as Client
participant R1 as Redis Node 1
participant R2 as Redis Node 2
participant R3 as Redis Node 3
participant R4 as Redis Node 4
participant R5 as Redis Node 5
Note over C: Step 1: Record timestamp T1
C->>R1: SET lock NX EX 30
R1-->>C: OK ✓
C->>R2: SET lock NX EX 30
R2-->>C: OK ✓
C->>R3: SET lock NX EX 30
R3-->>C: FAIL ✗
C->>R4: SET lock NX EX 30
R4-->>C: OK ✓
C->>R5: SET lock NX EX 30
R5-->>C: OK ✓
Note over C: Step 2: Record timestamp T2
Note over C: Acquired 4/5 nodes ≥ quorum (3)
Note over C: Lock validity = 30s - (T2-T1)
Note over C: If validity > 0 → Lock success!
Detailed Redlock Steps
- Record start time T1
- Sequentially acquire lock on all N Redis nodes, each with a small timeout (a few ms) to avoid blocking if a node is down
- Calculate elapsed time elapsed = T2 - T1
- Lock succeeds if: acquired on ≥ N/2 + 1 nodes AND elapsed < lock TTL
- Actual lock validity = TTL - elapsed (lock expires sooner due to acquisition time)
- If failed, release lock on all nodes (including failed ones) for cleanup
The Kleppmann vs Sanfilippo Debate
Martin Kleppmann argued that Redlock isn't safe because GC pauses or clock skew can lead a client to believe it still holds the lock after expiration. Salvatore countered that with reasonable clock synchronization (NTP), Redlock is safe enough for most use cases. The pragmatic solution: combine Redlock with fencing tokens (section 7).
5. Implementing Distributed Locks with .NET 10
5.1 Using StackExchange.Redis
using StackExchange.Redis;
public class RedisDistributedLock : IAsyncDisposable
{
private readonly IDatabase _db;
private readonly string _lockKey;
private readonly string _lockValue;
private readonly TimeSpan _expiry;
private bool _acquired;
public RedisDistributedLock(IDatabase db, string resource, TimeSpan expiry)
{
_db = db;
_lockKey = $"lock:{resource}";
_lockValue = Guid.NewGuid().ToString("N");
_expiry = expiry;
}
public async Task<bool> AcquireAsync(TimeSpan timeout, CancellationToken ct = default)
{
var deadline = DateTime.UtcNow + timeout;
while (DateTime.UtcNow < deadline)
{
_acquired = await _db.StringSetAsync(
_lockKey, _lockValue, _expiry, When.NotExists);
if (_acquired) return true;
await Task.Delay(50, ct);
}
return false;
}
public async ValueTask DisposeAsync()
{
if (!_acquired) return;
const string script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
""";
await _db.ScriptEvaluateAsync(script,
[new RedisKey(_lockKey)],
[new RedisValue(_lockValue)]);
}
}
5.2 Usage in ASP.NET 10 Minimal API
app.MapPost("/api/orders", async (
OrderRequest req,
IConnectionMultiplexer redis,
OrderService orderService) =>
{
var db = redis.GetDatabase();
await using var lockObj = new RedisDistributedLock(
db, $"order:product:{req.ProductId}", TimeSpan.FromSeconds(30));
if (!await lockObj.AcquireAsync(TimeSpan.FromSeconds(5)))
return Results.Conflict("Product is being processed by another request");
var result = await orderService.PlaceOrderAsync(req);
return Results.Ok(result);
});
5.3 Redlock with RedLock.net
// Register in DI container
builder.Services.AddSingleton<IDistributedLockFactory>(sp =>
{
var endpoints = new List<RedLockEndPoint>
{
new DnsEndPoint("redis-1.internal", 6379),
new DnsEndPoint("redis-2.internal", 6379),
new DnsEndPoint("redis-3.internal", 6379),
new DnsEndPoint("redis-4.internal", 6379),
new DnsEndPoint("redis-5.internal", 6379),
};
return RedLockFactory.Create(endpoints);
});
// Usage
app.MapPost("/api/payments/{id}/process", async (
string id,
IDistributedLockFactory lockFactory,
PaymentService paymentService) =>
{
var resource = $"payment:{id}";
var expiry = TimeSpan.FromSeconds(30);
var wait = TimeSpan.FromSeconds(10);
var retry = TimeSpan.FromMilliseconds(200);
await using var redLock = await lockFactory.CreateLockAsync(
resource, expiry, wait, retry);
if (!redLock.IsAcquired)
return Results.Conflict("Payment is already being processed");
var result = await paymentService.ProcessAsync(id);
return Results.Ok(result);
});
6. PostgreSQL Advisory Lock — An Alternative Approach
If your system already uses PostgreSQL, you can leverage Advisory Locks without adding Redis as a dependency:
public class PostgresAdvisoryLock
{
private readonly NpgsqlConnection _conn;
private readonly long _lockId;
public PostgresAdvisoryLock(NpgsqlConnection conn, string resource)
{
_conn = conn;
_lockId = resource.GetHashCode();
}
public async Task<bool> TryAcquireAsync(CancellationToken ct = default)
{
await using var cmd = new NpgsqlCommand(
"SELECT pg_try_advisory_lock(@id)", _conn);
cmd.Parameters.AddWithValue("id", _lockId);
return (bool)(await cmd.ExecuteScalarAsync(ct))!;
}
public async Task ReleaseAsync()
{
await using var cmd = new NpgsqlCommand(
"SELECT pg_advisory_unlock(@id)", _conn);
cmd.Parameters.AddWithValue("id", _lockId);
await cmd.ExecuteNonQueryAsync();
}
}
When to Use Advisory Locks Instead of Redis?
Use when: Simple system, already running PostgreSQL, don't want to add Redis as a dependency.
Avoid when: Cross-database locking needed, ultra-low latency required (<1ms), or system has thousands of concurrent locks (Advisory Locks consume PostgreSQL shared memory).
7. Fencing Tokens — Solving the Split-brain Problem
Even with Redlock, there's a possibility that 2 clients believe they hold the lock (due to GC pauses, network delays). Fencing tokens provide the last line of defense.
sequenceDiagram
participant C1 as Client 1
participant LS as Lock Service
participant DB as Database
participant C2 as Client 2
C1->>LS: Acquire lock
LS-->>C1: Lock granted, token=33
Note over C1: GC pause for 30 seconds...
C2->>LS: Acquire lock (lock expired)
LS-->>C2: Lock granted, token=34
C2->>DB: WRITE (fencing_token=34) ✓
Note over C1: GC pause ends
C1->>DB: WRITE (fencing_token=33)
DB-->>C1: REJECTED! token 33 < 34
Note over DB: Database rejects write with stale token
public class FencedDistributedLock
{
private static long _globalCounter = 0;
public long FencingToken { get; private set; }
public async Task<bool> AcquireAsync(IDatabase db, string resource)
{
var token = Interlocked.Increment(ref _globalCounter);
var value = $"{Environment.MachineName}:{token}";
var acquired = await db.StringSetAsync(
$"lock:{resource}", value,
TimeSpan.FromSeconds(30), When.NotExists);
if (acquired)
{
FencingToken = token;
return true;
}
return false;
}
}
// In the repository layer
public async Task UpdateInventoryAsync(
int productId, int quantity, long fencingToken)
{
var rows = await _db.ExecuteAsync("""
UPDATE Inventory
SET Stock = Stock - @quantity, FencingToken = @token
WHERE ProductId = @productId
AND FencingToken < @token
""",
new { quantity, token = fencingToken, productId });
if (rows == 0)
throw new StaleTokenException("Fencing token rejected");
}
8. Performance Comparison Across Solutions
| Criteria | Redis SET NX | Redlock (5 nodes) | PostgreSQL Advisory | ZooKeeper |
|---|---|---|---|---|
| Acquire latency | ~0.5ms | ~3-5ms | ~1-2ms | ~5-10ms |
| Throughput | ~100K ops/s | ~20K ops/s | ~50K ops/s | ~10K ops/s |
| Safety level | Medium | High | High | Very High |
| Fault tolerance | Low (single point) | High (N/2+1) | Depends on DB replication | High (quorum) |
| Additional dependency | Redis | 5 Redis instances | None (uses existing DB) | ZooKeeper cluster |
| Complexity | Simple | Medium | Simple | Complex |
| Auto-release on crash | Yes (TTL) | Yes (TTL) | Yes (session end) | Yes (ephemeral node) |
| Best for | Cache stampede, rate limit | Payment, inventory | Cron jobs, batch | Leader election |
9. Anti-patterns and Common Mistakes
Anti-pattern 1: Lock Without TTL
// WRONG: If process crashes, lock is never released
await db.StringSetAsync("lock:order", "1", when: When.NotExists);
// ... process crashes here → permanent deadlock
// CORRECT: Always set expiry
await db.StringSetAsync("lock:order", "1",
TimeSpan.FromSeconds(30), When.NotExists);
Anti-pattern 2: Releasing Without Owner Check
// WRONG: May delete another client's lock
await db.KeyDeleteAsync("lock:order");
// CORRECT: Use Lua script to verify owner before delete
const string script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
end
return 0
""";
await db.ScriptEvaluateAsync(script, ...);
Anti-pattern 3: TTL Too Short
// WRONG: TTL 2s but operation may take 5s
await using var lockObj = new RedisDistributedLock(
db, "report", TimeSpan.FromSeconds(2));
await GenerateReportAsync(); // takes 5s → lock expires mid-operation!
// CORRECT: TTL must exceed worst-case execution time
// Combine with lock extension (renewal) if needed
await using var lockObj = new RedisDistributedLock(
db, "report", TimeSpan.FromSeconds(60));
Anti-pattern 4: Retry Without Backoff
// WRONG: Tight loop causes CPU spike and Redis overload
while (!acquired) { acquired = await TryAcquire(); }
// CORRECT: Exponential backoff + jitter
var delay = 50;
while (!acquired && DateTime.UtcNow < deadline)
{
acquired = await TryAcquire();
if (!acquired)
{
var jitter = Random.Shared.Next(0, delay / 2);
await Task.Delay(delay + jitter, ct);
delay = Math.Min(delay * 2, 1000);
}
}
10. Production-ready Patterns
10.1 Lock Extension (Auto-renewal)
When an operation may run longer than the TTL, you need automatic lock renewal:
public class AutoRenewingLock : IAsyncDisposable
{
private readonly IDatabase _db;
private readonly string _key;
private readonly string _value;
private readonly CancellationTokenSource _renewCts = new();
private Task? _renewTask;
public async Task<bool> AcquireAsync(TimeSpan ttl)
{
var acquired = await _db.StringSetAsync(
_key, _value, ttl, When.NotExists);
if (acquired)
{
_renewTask = RenewLoopAsync(ttl, _renewCts.Token);
}
return acquired;
}
private async Task RenewLoopAsync(TimeSpan ttl, CancellationToken ct)
{
var renewInterval = ttl / 3;
while (!ct.IsCancellationRequested)
{
await Task.Delay(renewInterval, ct);
const string script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('pexpire', KEYS[1], ARGV[2])
end
return 0
""";
await _db.ScriptEvaluateAsync(script,
[new RedisKey(_key)],
[new RedisValue(_value),
new RedisValue(((int)ttl.TotalMilliseconds).ToString())]);
}
}
public async ValueTask DisposeAsync()
{
await _renewCts.CancelAsync();
if (_renewTask != null) await _renewTask;
// Release lock...
}
}
10.2 Lock with Observability
public class ObservableDistributedLock
{
private static readonly Meter Meter = new("DistributedLock");
private static readonly Counter<long> AcquiredCounter =
Meter.CreateCounter<long>("lock.acquired");
private static readonly Counter<long> FailedCounter =
Meter.CreateCounter<long>("lock.failed");
private static readonly Histogram<double> AcquireLatency =
Meter.CreateHistogram<double>("lock.acquire.duration.ms");
public async Task<bool> AcquireAsync(string resource, TimeSpan expiry)
{
var sw = Stopwatch.StartNew();
var acquired = await InternalAcquireAsync(resource, expiry);
sw.Stop();
AcquireLatency.Record(sw.Elapsed.TotalMilliseconds,
new("resource", resource));
if (acquired)
AcquiredCounter.Add(1, new("resource", resource));
else
FailedCounter.Add(1, new("resource", resource));
return acquired;
}
}
graph LR
A[Request arrives] --> B{Acquire Lock}
B -->|Success| C[Execute Critical Section]
B -->|Failed after retry| D[Return 409 Conflict]
C --> E{Operation OK?}
E -->|Yes| F[Release Lock]
E -->|Error| G[Release Lock + Rollback]
F --> H[Return 200 OK]
G --> I[Return 500 Error]
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style C fill:#e94560,stroke:#fff,color:#fff
style D fill:#ff9800,stroke:#fff,color:#fff
style H fill:#4CAF50,stroke:#fff,color:#fff
style I fill:#f44336,stroke:#fff,color:#fff
Pre-deployment Checklist for Distributed Locks
✅ Lock has TTL — prevents deadlock on process crash
✅ Release checks owner — prevents deleting another client's lock
✅ Retry uses exponential backoff + jitter — prevents thundering herd
✅ Fencing tokens for critical data — protects data integrity
✅ Metrics & alerting — monitors lock contention and latency
✅ TTL > worst-case execution time + buffer — or use auto-renewal
✅ Graceful degradation — system still works (possibly slower) when lock service is down
Distributed Locking is one of the most important building blocks in distributed systems. Understanding the mechanisms, trade-offs between solutions, and applying the right patterns will help you build reliable systems that avoid subtle race conditions — the kind that only surface under heavy production load.
References
WebGPU — The New Era of GPU Computing in the Browser
ClickHouse — Real-Time Analytics Database for Large-Scale Systems
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.