Observability & Ops Intermediate 5 min read

Rate Limiting in ASP.NET Core: Token, Sliding, Concurrency

How to use the built-in ASP.NET Core rate limiter: fixed window, sliding window, token bucket, concurrency. Per IP vs per user, distributed via Redis.

Table of contents
  1. When does rate limiting move from "nice to have" to "shipping it"?
  2. What numbers should I budget for the limiter tier?
  3. What does the limiter pipeline look like?
  4. What is the .NET 10 wiring with the built-in limiter?
  5. How do I make the limiter distributed across replicas?
  6. What failure modes does rate limiting introduce?
  7. When should you skip rate limiting?
  8. Where should you go from here?

The day a single misconfigured client floods your service with 10K requests per second is the day rate limiting stops being optional. This chapter wires the ASP.NET Core rate limiter, picks the right algorithm per use case, and shows how to make it distributed across replicas with Redis.

When does rate limiting move from "nice to have" to "shipping it"?

Three signals.

The service has external clients. A public API, a webhook endpoint, a public web form. Anything an attacker, a bot, or a buggy integration can hit must have a limit. Defaults: permissive on the happy path, strict on /auth/* and write endpoints.

Traffic is bursty. A flash sale spikes 100x normal load. Without a limiter, your downstream (database, payment provider, queue) absorbs the burst and may collapse. The limiter is the back-pressure valve.

Cost scales with requests. Cloud egress, third-party APIs, expensive computations. A misbehaving client can run up the bill overnight. Per-tenant limits cap the blast radius.

If the service is internal, traffic is steady, and cost is fixed, limiting is overhead. Most public-facing .NET services are none of those.

What numbers should I budget for the limiter tier?

Algorithm           Memory per key   CPU per check    Burst behaviour
Fixed window        ~16 bytes        O(1)             allows 2x at edges
Sliding window      ~64 bytes        O(1)             smooth
Token bucket        ~24 bytes        O(1)             allows bursts up to size
Concurrency         O(N) in-flight   O(1)             caps simultaneous
Redis-backed        +network 0.5 ms  network          smooth, distributed

Per-key memory matters when you have many keys (per-user limits on a million-user service). Sliding window with high precision can reach 1 KB per key. Tune precision down before you tune storage up.

What does the limiter pipeline look like?

flowchart LR
    Req[Request] --> Edge[Reverse proxy<br/>per-IP cap]
    Edge --> App[ASP.NET Core]
    App --> RL[Rate limiter middleware]
    RL -->|under limit| Handler[Endpoint handler]
    RL -->|over limit| R429[429 Too Many Requests]
    Handler --> Down[Downstream services]

Two-tier defence. The reverse proxy (Nginx, Cloud Front, Azure Front Door) handles the per-IP DDoS cap; the application enforces business-level limits per user or tenant. The middleware runs before authentication for unauthenticated paths and after for authenticated ones - a RateLimiter policy can be combined with Authorize.

What is the .NET 10 wiring with the built-in limiter?

builder.Services.AddRateLimiter(opt =>
{
    // Default: 100 req/min per IP, returns 429
    opt.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
        RateLimitPartition.GetTokenBucketLimiter(
            partitionKey: ctx.Connection.RemoteIpAddress?.ToString() ?? "unknown",
            factory: _ => new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 100,
                AutoReplenishment = true,
                QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
                QueueLimit = 0
            }));

    // Stricter named policy for /auth/login
    opt.AddPolicy("auth", ctx =>
        RateLimitPartition.GetSlidingWindowLimiter(
            partitionKey: ctx.Connection.RemoteIpAddress?.ToString() ?? "unknown",
            factory: _ => new SlidingWindowRateLimiterOptions
            {
                PermitLimit = 5,
                Window = TimeSpan.FromMinutes(1),
                SegmentsPerWindow = 6
            }));

    // Per-user policy for authenticated paths
    opt.AddPolicy("per-user", ctx =>
        RateLimitPartition.GetTokenBucketLimiter(
            partitionKey: ctx.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anon",
            factory: _ => new TokenBucketRateLimiterOptions
            {
                TokenLimit = 1000,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 1000,
                AutoReplenishment = true
            }));

    opt.OnRejected = async (ctx, ct) =>
    {
        ctx.HttpContext.Response.StatusCode = 429;
        if (ctx.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retry))
            ctx.HttpContext.Response.Headers.RetryAfter = ((int)retry.TotalSeconds).ToString();
        await ctx.HttpContext.Response.WriteAsync("Too many requests.", ct);
    };
});

app.UseRateLimiter();

app.MapPost("/auth/login", LoginHandler).RequireRateLimiting("auth");
app.MapGet("/me", MeHandler).RequireAuthorization().RequireRateLimiting("per-user");

Three details. PartitionedRateLimiter lets one policy split per IP (or per user) without writing a custom data structure. GlobalLimiter applies before route matching - good for blanket DDoS protection. Named policies attach to specific endpoints via RequireRateLimiting.

How do I make the limiter distributed across replicas?

The built-in limiter is in-memory; three replicas with a 100/min limit collectively allow 300/min. For honest enforcement you need shared state - usually Redis.

// Use Microsoft.AspNetCore.RateLimiting.Redis or a custom limiter
public class RedisTokenBucket(IConnectionMultiplexer redis)
{
    public async Task<bool> TryAcquireAsync(string key, int cost = 1)
    {
        var script = """
            local current = redis.call('GET', KEYS[1])
            if not current then current = ARGV[1] else current = tonumber(current) end
            if current >= tonumber(ARGV[2]) then
                redis.call('SET', KEYS[1], current - ARGV[2], 'EX', ARGV[3])
                return 1
            else
                return 0
            end
        """;
        var result = (long)await redis.GetDatabase().ScriptEvaluateAsync(
            script,
            new RedisKey[] { $"rl:{key}" },
            new RedisValue[] { 100, cost, 60 });
        return result == 1;
    }
}

The Lua script makes the check-and-decrement atomic on Redis. The case-study chapter covers the algorithm choices (token bucket, sliding-window log, sliding-window counter) in detail.

What failure modes does rate limiting introduce?

When should you skip rate limiting?

When the service is internal, the clients are known, and traffic is bounded by upstream limits. An internal microservice called only by a sister service that already does its own rate limiting does not need its own. Add a limiter the moment a third-party (or a user) can reach the service.

Where should you go from here?

You have completed the foundations, building blocks, reliability, and ops groups. Next: the case-study chapters, starting with the URL shortener - the simplest end-to-end design that uses cache + DB + observability + rate limit in one service. After that, eight more case studies compose the same blocks into Twitter, Uber, Stripe-style systems.

Frequently asked questions

Token bucket or sliding window - which one wins?
Token bucket allows bursts up to the bucket size, smoothing afterwards - fits APIs that have legitimate peaks (e.g. user clicks 'submit' five times). Sliding window is stricter and fairer over time - fits abuse prevention. Fixed window is the simplest but has 'window edge' burst problems. Concurrency is a different axis: it limits in-flight requests, not request rate.
Per IP, per user, or per API key?
Per the principal that maps to a person paying you. For unauthenticated public APIs that means per IP - imperfect because of NAT but the only option. For authenticated APIs, per user (or tenant). For B2B APIs, per API key. Layer them: a high per-tenant limit, a lower per-user limit inside it, a lower per-IP limit on the unauthenticated paths.
Why do I need a distributed rate limiter?
Because the in-memory limiter in ASP.NET Core counts only what one instance sees. Three replicas with a 100 RPS limit each will collectively allow 300 RPS - which violates your real intent. Redis-backed rate limiting gives one shared counter; the chapter 16 case study covers the algorithm in detail.
What HTTP status code should I return?
429 Too Many Requests, with Retry-After set to a reasonable interval. Optionally include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset so well-behaved clients can throttle themselves. Never silently drop the request - the client retries and you waste both your time and theirs.