Case Studies Advanced 5 min read

Design a Distributed Rate Limiter (Token Bucket on Redis)

Q: Why a custom rate limiter when ASP.NET Core has one?

Because the built-in limiter is in-memory ([chapter 14](/system-design/rate-limiting-dotnet)). For a fleet of replicas you need shared state. The custom limiter is a thin wrapper around a Redis Lua script - 50 lines of code, deployed once, used by every service.

Q: How do I size the bucket?

Start with `bucket_size = peak_rps * 1 second` (one second of burst tolerance). Tune up if legitimate users hit the limit - look at the rejection ratio in [observability](/system-design/observability-otel-dotnet). Tune down if abuse gets through. The right number is whichever you can defend with a graph.

End-to-end design of a distributed rate limiter: token bucket vs sliding window vs fixed window, Lua atomic scripts on Redis, and the .NET wrapper used by every service.

Phùng Anh Tú · May 21, 2026

Table of contents

When does a distributed limiter become necessary?
What numbers should I budget for?
What does the architecture look like?
What is the .NET 10 implementation?
How does this compose with the other building blocks?
What failure modes does this introduce?
When is the in-memory limiter still the right answer?
Where should you go from here?

The built-in ASP.NET Core rate limiter from chapter 14 is in-memory; it counts only what one instance sees. The case study here is the distributed version - the one that gives honest enforcement across a fleet of replicas. The interview asks for it; production needs it. The answer is one Redis Lua script and a thin .NET wrapper.

When does a distributed limiter become necessary?

Three signals.

More than one replica. Three boxes with 100 RPS each collectively allow 300 RPS. If your real intent was 100 RPS, the in-memory limiter has lied to you.

Cross-process state matters. A login endpoint that allows 5 attempts per IP per 15 minutes. If the attacker's requests bounce between replicas, each replica sees only a fraction; the limit is not enforced.

Multi-service consistency. The same user must not exceed 1000 calls/min total across web, mobile, and partner integrations. Each entry point is a separate service; they need a shared counter.

What numbers should I budget for?

Algorithm                    Memory per key    Accuracy           CPU per check
Fixed window counter         16 bytes          edge bursts        O(1) Lua
Sliding window log           ~24 B per req     exact              O(N) - N stored
Sliding window counter       32 bytes          ~95%               O(1) Lua
Token bucket                 24 bytes          smooth             O(1) Lua

Token bucket and sliding-window counter are the practical defaults. Both fit one Lua script under 30 lines. Sliding-window log gives the most accurate enforcement but at the cost of growing memory under load.

What does the architecture look like?

flowchart LR
    App1[ASP.NET Core 1] -->|EVAL Lua| Redis[(Redis)]
    App2[ASP.NET Core 2] -->|EVAL Lua| Redis
    App3[ASP.NET Core 3] -->|EVAL Lua| Redis
    Redis -->|allowed/denied| App1
    Redis -->|allowed/denied| App2
    Redis -->|allowed/denied| App3

Every replica runs the same Lua script against the same Redis. The script is the source of truth for the counter; the .NET code is a typed wrapper.

What is the .NET 10 implementation?

Token bucket as the default. The Lua script:

-- KEYS[1] = bucket key
-- ARGV[1] = capacity
-- ARGV[2] = refill_rate per second
-- ARGV[3] = current unix time in seconds
-- ARGV[4] = cost (usually 1)

local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(bucket[1]) or tonumber(ARGV[1])
local last_ts = tonumber(bucket[2]) or tonumber(ARGV[3])

local now = tonumber(ARGV[3])
local capacity = tonumber(ARGV[1])
local refill = tonumber(ARGV[2])
local cost = tonumber(ARGV[4])

local elapsed = math.max(0, now - last_ts)
tokens = math.min(capacity, tokens + elapsed * refill)

if tokens >= cost then
    tokens = tokens - cost
    redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', now)
    redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill) * 2)
    return 1
else
    redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', now)
    redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill) * 2)
    return 0
end

The .NET wrapper:

public class DistributedRateLimiter(IConnectionMultiplexer redis)
{
    private const string TokenBucketScript = "..."; // the Lua above
    private static readonly LoadedLuaScript Script =
        LuaScript.Prepare(TokenBucketScript).Load(/* server */);

    public async Task<bool> TryAcquireAsync(
        string key, int capacity, double refillPerSecond, int cost = 1)
    {
        var db = redis.GetDatabase();
        var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
        var result = (long)await db.ScriptEvaluateAsync(
            Script,
            new RedisKey[] { $"rl:{key}" },
            new RedisValue[] { capacity, refillPerSecond, now, cost });
        return result == 1;
    }
}

// Usage in a middleware or endpoint filter:
public class DistributedLimiterMiddleware(DistributedRateLimiter limiter) : IMiddleware
{
    public async Task InvokeAsync(HttpContext ctx, RequestDelegate next)
    {
        var userId = ctx.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anon";
        var allowed = await limiter.TryAcquireAsync($"user:{userId}", capacity: 100, refillPerSecond: 100.0 / 60);
        if (!allowed)
        {
            ctx.Response.StatusCode = 429;
            ctx.Response.Headers.RetryAfter = "1";
            return;
        }
        await next(ctx);
    }
}

Three details. The script is loaded once on Redis (Redis caches by SHA1) so subsequent calls are EVALSHA, not EVAL. The TTL prevents memory leak from idle keys. The wrapper exposes one method - everything else is configuration.

How does this compose with the other building blocks?

flowchart LR
    Client --> CDN
    CDN --> LB[Load Balancer<br/>L4 limit per IP]
    LB --> Edge[ASP.NET Core]
    Edge --> Mid[Limiter middleware<br/>per user/tenant]
    Mid --> Endpoint[Handler]
    Mid -.calls.-> Redis[(Redis)]
    Endpoint --> Cache[(Redis cache)]
    Endpoint --> DB[(Postgres)]

The limiter shares the same Redis instance as the cache - they are different key namespaces (rl: vs cache:). The middleware sits before authentication for unauthenticated paths and after for authenticated ones, exactly like the built-in version.

What failure modes does this introduce?

Redis outage - if Redis is down, the limiter cannot decide. Mitigation: fail-open (allow all requests but page on-call) or fail-closed with a circuit breaker fallback to in-memory.
Clock drift - the script uses the .NET process's time. If one replica is 30 seconds ahead, it computes more refill than expected. Mitigation: pass server-side redis.call('TIME') from inside the script.
Hot key - one tenant generates so many limiter calls that one Redis CPU saturates. Mitigation: shard the key (tenant:{id}:{shard%4}) and check all shards.
Script reload after restart - Redis flushes script cache occasionally; first call after that gets NOSCRIPT. Mitigation: catch the error and fall back to EVAL once.

When is the in-memory limiter still the right answer?

When you have one replica or your tolerance for a 2-3x effective limit is high. A side-project deployed as a single Azure App Service instance does not need Redis. The custom limiter is for production fleets where the gap between intent and reality matters.

Where should you go from here?

Next case study: news feed design - the canonical fan-out problem, where the same hot-key trick from the URL shortener gets pushed to its limit. After that the chat realtime chapter brings WebSockets and SignalR into the picture.

Frequently asked questions

Why a custom rate limiter when ASP.NET Core has one?

Because the built-in limiter is in-memory (chapter 14). For a fleet of replicas you need shared state. The custom limiter is a thin wrapper around a Redis Lua script - 50 lines of code, deployed once, used by every service.

Token bucket or sliding window?

Token bucket for most cases - bursty allowances are user-friendly and the algorithm is forgiving of clock drift between Redis and clients. Sliding window log is more strict but uses memory proportional to allowed RPS (one timestamp per request). Sliding window counter is the middle ground - approximate sliding window with two fixed buckets.

Why must the check-and-decrement be atomic?

Because two concurrent requests doing 'GET, check, DECR' will both pass when there is one token left and both decrement, going below zero. A Lua script in Redis runs atomically - the entire check-and-decrement is one operation from Redis's perspective. Without it, the limiter leaks under concurrency.

How do I size the bucket?

Start with bucket_size = peak_rps * 1 second (one second of burst tolerance). Tune up if legitimate users hit the limit - look at the rejection ratio in observability. Tune down if abuse gets through. The right number is whichever you can defend with a graph.