Rate Limiting in ASP.NET Core: Token, Sliding, Concurrency
How to use the built-in ASP.NET Core rate limiter: fixed window, sliding window, token bucket, concurrency. Per IP vs per user, distributed via Redis.
Table of contents
- When does rate limiting move from "nice to have" to "shipping it"?
- What numbers should I budget for the limiter tier?
- What does the limiter pipeline look like?
- What is the .NET 10 wiring with the built-in limiter?
- How do I make the limiter distributed across replicas?
- What failure modes does rate limiting introduce?
- When should you skip rate limiting?
- Where should you go from here?
The day a single misconfigured client floods your service with 10K requests per second is the day rate limiting stops being optional. This chapter wires the ASP.NET Core rate limiter, picks the right algorithm per use case, and shows how to make it distributed across replicas with Redis.
When does rate limiting move from "nice to have" to "shipping it"?
Three signals.
The service has external clients. A public API, a webhook
endpoint, a public web form. Anything an attacker, a bot, or a
buggy integration can hit must have a limit. Defaults: permissive
on the happy path, strict on /auth/* and write endpoints.
Traffic is bursty. A flash sale spikes 100x normal load. Without a limiter, your downstream (database, payment provider, queue) absorbs the burst and may collapse. The limiter is the back-pressure valve.
Cost scales with requests. Cloud egress, third-party APIs, expensive computations. A misbehaving client can run up the bill overnight. Per-tenant limits cap the blast radius.
If the service is internal, traffic is steady, and cost is fixed, limiting is overhead. Most public-facing .NET services are none of those.
What numbers should I budget for the limiter tier?
Algorithm Memory per key CPU per check Burst behaviour
Fixed window ~16 bytes O(1) allows 2x at edges
Sliding window ~64 bytes O(1) smooth
Token bucket ~24 bytes O(1) allows bursts up to size
Concurrency O(N) in-flight O(1) caps simultaneous
Redis-backed +network 0.5 ms network smooth, distributed
Per-key memory matters when you have many keys (per-user limits on a million-user service). Sliding window with high precision can reach 1 KB per key. Tune precision down before you tune storage up.
What does the limiter pipeline look like?
flowchart LR
Req[Request] --> Edge[Reverse proxy<br/>per-IP cap]
Edge --> App[ASP.NET Core]
App --> RL[Rate limiter middleware]
RL -->|under limit| Handler[Endpoint handler]
RL -->|over limit| R429[429 Too Many Requests]
Handler --> Down[Downstream services]
Two-tier defence. The reverse proxy (Nginx, Cloud Front, Azure Front
Door) handles the per-IP DDoS cap; the application enforces
business-level limits per user or tenant. The middleware runs before
authentication for unauthenticated paths and after for authenticated
ones - a RateLimiter policy can be combined with Authorize.
What is the .NET 10 wiring with the built-in limiter?
builder.Services.AddRateLimiter(opt =>
{
// Default: 100 req/min per IP, returns 429
opt.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
RateLimitPartition.GetTokenBucketLimiter(
partitionKey: ctx.Connection.RemoteIpAddress?.ToString() ?? "unknown",
factory: _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 100,
AutoReplenishment = true,
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0
}));
// Stricter named policy for /auth/login
opt.AddPolicy("auth", ctx =>
RateLimitPartition.GetSlidingWindowLimiter(
partitionKey: ctx.Connection.RemoteIpAddress?.ToString() ?? "unknown",
factory: _ => new SlidingWindowRateLimiterOptions
{
PermitLimit = 5,
Window = TimeSpan.FromMinutes(1),
SegmentsPerWindow = 6
}));
// Per-user policy for authenticated paths
opt.AddPolicy("per-user", ctx =>
RateLimitPartition.GetTokenBucketLimiter(
partitionKey: ctx.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anon",
factory: _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 1000,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 1000,
AutoReplenishment = true
}));
opt.OnRejected = async (ctx, ct) =>
{
ctx.HttpContext.Response.StatusCode = 429;
if (ctx.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retry))
ctx.HttpContext.Response.Headers.RetryAfter = ((int)retry.TotalSeconds).ToString();
await ctx.HttpContext.Response.WriteAsync("Too many requests.", ct);
};
});
app.UseRateLimiter();
app.MapPost("/auth/login", LoginHandler).RequireRateLimiting("auth");
app.MapGet("/me", MeHandler).RequireAuthorization().RequireRateLimiting("per-user");
Three details. PartitionedRateLimiter lets one policy split per IP
(or per user) without writing a custom data structure. GlobalLimiter
applies before route matching - good for blanket DDoS protection.
Named policies attach to specific endpoints via RequireRateLimiting.
How do I make the limiter distributed across replicas?
The built-in limiter is in-memory; three replicas with a 100/min
limit collectively allow 300/min. For honest enforcement you need
shared state - usually Redis.
// Use Microsoft.AspNetCore.RateLimiting.Redis or a custom limiter
public class RedisTokenBucket(IConnectionMultiplexer redis)
{
public async Task<bool> TryAcquireAsync(string key, int cost = 1)
{
var script = """
local current = redis.call('GET', KEYS[1])
if not current then current = ARGV[1] else current = tonumber(current) end
if current >= tonumber(ARGV[2]) then
redis.call('SET', KEYS[1], current - ARGV[2], 'EX', ARGV[3])
return 1
else
return 0
end
""";
var result = (long)await redis.GetDatabase().ScriptEvaluateAsync(
script,
new RedisKey[] { $"rl:{key}" },
new RedisValue[] { 100, cost, 60 });
return result == 1;
}
}
The Lua script makes the check-and-decrement atomic on Redis. The case-study chapter covers the algorithm choices (token bucket, sliding-window log, sliding-window counter) in detail.
What failure modes does rate limiting introduce?
- Overly strict default - legitimate users hit 429 during normal use. Mitigation: instrument the limiter via OpenTelemetry and watch the rejection ratio; iterate.
- Limiter bypass via NAT - thousands of users behind one corporate NAT all share an IP. Per-IP limit blocks them all. Mitigation: prefer per-user limits when authenticated; raise per-IP limits for known proxies.
- Redis outage breaks limiter - if the limiter is hard-fail on Redis errors, the whole app stops. Mitigation: fail-open on errors with a tight in-memory cap as the safety net.
- Hot key on Redis - one heavy user generates so many limiter ops that Redis CPU saturates. Mitigation: shard the limiter key or fall back to local fixed-window for that key.
When should you skip rate limiting?
When the service is internal, the clients are known, and traffic is bounded by upstream limits. An internal microservice called only by a sister service that already does its own rate limiting does not need its own. Add a limiter the moment a third-party (or a user) can reach the service.
Where should you go from here?
You have completed the foundations, building blocks, reliability, and ops groups. Next: the case-study chapters, starting with the URL shortener - the simplest end-to-end design that uses cache + DB + observability + rate limit in one service. After that, eight more case studies compose the same blocks into Twitter, Uber, Stripe-style systems.
Frequently asked questions
Token bucket or sliding window - which one wins?
Per IP, per user, or per API key?
Why do I need a distributed rate limiter?
What HTTP status code should I return?
Retry-After set to a reasonable interval. Optionally include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset so well-behaved clients can throttle themselves. Never silently drop the request - the client retries and you waste both your time and theirs.