Design a Distributed Rate Limiter (Token Bucket on Redis)
End-to-end design of a distributed rate limiter: token bucket vs sliding window vs fixed window, Lua atomic scripts on Redis, and the .NET wrapper used by every service.
Table of contents
- When does a distributed limiter become necessary?
- What numbers should I budget for?
- What does the architecture look like?
- What is the .NET 10 implementation?
- How does this compose with the other building blocks?
- What failure modes does this introduce?
- When is the in-memory limiter still the right answer?
- Where should you go from here?
The built-in ASP.NET Core rate limiter from chapter 14 is in-memory; it counts only what one instance sees. The case study here is the distributed version - the one that gives honest enforcement across a fleet of replicas. The interview asks for it; production needs it. The answer is one Redis Lua script and a thin .NET wrapper.
When does a distributed limiter become necessary?
Three signals.
More than one replica. Three boxes with 100 RPS each
collectively allow 300 RPS. If your real intent was 100 RPS, the
in-memory limiter has lied to you.
Cross-process state matters. A login endpoint that allows
5 attempts per IP per 15 minutes. If the attacker's requests bounce
between replicas, each replica sees only a fraction; the limit is
not enforced.
Multi-service consistency. The same user must not exceed
1000 calls/min total across web, mobile, and partner integrations.
Each entry point is a separate service; they need a shared counter.
What numbers should I budget for?
Algorithm Memory per key Accuracy CPU per check
Fixed window counter 16 bytes edge bursts O(1) Lua
Sliding window log ~24 B per req exact O(N) - N stored
Sliding window counter 32 bytes ~95% O(1) Lua
Token bucket 24 bytes smooth O(1) Lua
Token bucket and sliding-window counter are the practical defaults. Both fit one Lua script under 30 lines. Sliding-window log gives the most accurate enforcement but at the cost of growing memory under load.
What does the architecture look like?
flowchart LR
App1[ASP.NET Core 1] -->|EVAL Lua| Redis[(Redis)]
App2[ASP.NET Core 2] -->|EVAL Lua| Redis
App3[ASP.NET Core 3] -->|EVAL Lua| Redis
Redis -->|allowed/denied| App1
Redis -->|allowed/denied| App2
Redis -->|allowed/denied| App3
Every replica runs the same Lua script against the same Redis. The script is the source of truth for the counter; the .NET code is a typed wrapper.
What is the .NET 10 implementation?
Token bucket as the default. The Lua script:
-- KEYS[1] = bucket key
-- ARGV[1] = capacity
-- ARGV[2] = refill_rate per second
-- ARGV[3] = current unix time in seconds
-- ARGV[4] = cost (usually 1)
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(bucket[1]) or tonumber(ARGV[1])
local last_ts = tonumber(bucket[2]) or tonumber(ARGV[3])
local now = tonumber(ARGV[3])
local capacity = tonumber(ARGV[1])
local refill = tonumber(ARGV[2])
local cost = tonumber(ARGV[4])
local elapsed = math.max(0, now - last_ts)
tokens = math.min(capacity, tokens + elapsed * refill)
if tokens >= cost then
tokens = tokens - cost
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill) * 2)
return 1
else
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill) * 2)
return 0
end
The .NET wrapper:
public class DistributedRateLimiter(IConnectionMultiplexer redis)
{
private const string TokenBucketScript = "..."; // the Lua above
private static readonly LoadedLuaScript Script =
LuaScript.Prepare(TokenBucketScript).Load(/* server */);
public async Task<bool> TryAcquireAsync(
string key, int capacity, double refillPerSecond, int cost = 1)
{
var db = redis.GetDatabase();
var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
var result = (long)await db.ScriptEvaluateAsync(
Script,
new RedisKey[] { $"rl:{key}" },
new RedisValue[] { capacity, refillPerSecond, now, cost });
return result == 1;
}
}
// Usage in a middleware or endpoint filter:
public class DistributedLimiterMiddleware(DistributedRateLimiter limiter) : IMiddleware
{
public async Task InvokeAsync(HttpContext ctx, RequestDelegate next)
{
var userId = ctx.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anon";
var allowed = await limiter.TryAcquireAsync($"user:{userId}", capacity: 100, refillPerSecond: 100.0 / 60);
if (!allowed)
{
ctx.Response.StatusCode = 429;
ctx.Response.Headers.RetryAfter = "1";
return;
}
await next(ctx);
}
}
Three details. The script is loaded once on Redis (Redis caches by
SHA1) so subsequent calls are EVALSHA, not EVAL. The TTL
prevents memory leak from idle keys. The wrapper exposes one
method - everything else is configuration.
How does this compose with the other building blocks?
flowchart LR
Client --> CDN
CDN --> LB[Load Balancer<br/>L4 limit per IP]
LB --> Edge[ASP.NET Core]
Edge --> Mid[Limiter middleware<br/>per user/tenant]
Mid --> Endpoint[Handler]
Mid -.calls.-> Redis[(Redis)]
Endpoint --> Cache[(Redis cache)]
Endpoint --> DB[(Postgres)]
The limiter shares the same Redis instance as the
cache - they are different
key namespaces (rl: vs cache:). The middleware sits before
authentication for unauthenticated paths and after for authenticated
ones, exactly like the built-in version.
What failure modes does this introduce?
- Redis outage - if Redis is down, the limiter cannot decide. Mitigation: fail-open (allow all requests but page on-call) or fail-closed with a circuit breaker fallback to in-memory.
- Clock drift - the script uses the .NET process's time. If
one replica is 30 seconds ahead, it computes more refill than
expected. Mitigation: pass server-side
redis.call('TIME')from inside the script. - Hot key - one tenant generates so many limiter calls that one
Redis CPU saturates. Mitigation: shard the key
(
tenant:{id}:{shard%4}) and check all shards. - Script reload after restart - Redis flushes script cache
occasionally; first call after that gets
NOSCRIPT. Mitigation: catch the error and fall back toEVALonce.
When is the in-memory limiter still the right answer?
When you have one replica or your tolerance for a 2-3x effective limit is high. A side-project deployed as a single Azure App Service instance does not need Redis. The custom limiter is for production fleets where the gap between intent and reality matters.
Where should you go from here?
Next case study: news feed design - the canonical fan-out problem, where the same hot-key trick from the URL shortener gets pushed to its limit. After that the chat realtime chapter brings WebSockets and SignalR into the picture.
Frequently asked questions
Why a custom rate limiter when ASP.NET Core has one?
Token bucket or sliding window?
Why must the check-and-decrement be atomic?
How do I size the bucket?
bucket_size = peak_rps * 1 second (one second of burst tolerance). Tune up if legitimate users hit the limit - look at the rejection ratio in observability. Tune down if abuse gets through. The right number is whichever you can defend with a graph.