Rate Limiting — Controlling API Traffic in Distributed Systems
Posted on: 4/23/2026 5:11:11 AM
Table of Contents
- Why Rate Limiting?
- 4 Core Rate Limiting Algorithms
- Implementation on ASP.NET Core 10
- Distributed Rate Limiting
- Rate Limiting at the API Gateway Layer
- Best Practices & Anti-Patterns
- Conclusion
Why Rate Limiting?
Imagine you're running a public API serving millions of requests per day. One morning, traffic spikes 50x — not because your product went viral, but because a scraping bot or a competitor launched a DDoS attack. Without traffic control, your entire system collapses, impacting every legitimate user.
Rate Limiting is the technique of controlling the number of requests a client can send to a server within a given time period. It is one of the essential components in modern distributed system architecture.
What does Rate Limiting solve?
- Resource protection: Prevents CPU, memory, and database connection exhaustion
- Fairness: Ensures every client gets a fair share of API access
- Security: Mitigates brute-force attacks, credential stuffing, and scraping
- Cost control: Prevents cloud bill shock from abnormal traffic spikes
- SLA compliance: Guarantees service quality for paying customers
flowchart LR
A[Client Request] --> B{Rate Limiter}
B -->|Allowed| C[API Server]
B -->|Rejected| D[429 Too Many Requests]
C --> E[Database / Service]
D --> F[Retry-After Header]
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B fill:#e94560,stroke:#fff,color:#fff
style C fill:#4CAF50,stroke:#fff,color:#fff
style D fill:#ff9800,stroke:#fff,color:#fff
style E fill:#2c3e50,stroke:#fff,color:#fff
style F fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
4 Core Rate Limiting Algorithms
1. Fixed Window Counter
The simplest algorithm: divide time into fixed windows (e.g., every minute), count requests in each window. When the counter exceeds the threshold, reject requests until a new window begins.
gantt
title Fixed Window Counter — 100 req/min
dateFormat X
axisFormat %s
section Window 1 (00-60s)
80 requests (OK) :done, 0, 60
section Window 2 (60-120s)
120 requests (20 rejected) :crit, 60, 120
section Window 3 (120-180s)
45 requests (OK) :done, 120, 180
// Fixed Window Counter — simple illustration
public class FixedWindowLimiter
{
private int _counter;
private DateTime _windowStart;
private readonly int _maxRequests;
private readonly TimeSpan _windowSize;
private readonly object _lock = new();
public FixedWindowLimiter(int maxRequests, TimeSpan windowSize)
{
_maxRequests = maxRequests;
_windowSize = windowSize;
_windowStart = DateTime.UtcNow;
}
public bool TryAcquire()
{
lock (_lock)
{
var now = DateTime.UtcNow;
if (now - _windowStart >= _windowSize)
{
_counter = 0;
_windowStart = now;
}
if (_counter < _maxRequests)
{
_counter++;
return true;
}
return false;
}
}
}
Boundary burst problem
The biggest weakness of Fixed Window: if a client sends 100 requests at the end of window 1 and 100 requests at the start of window 2, the system receives 200 requests in a very short time — double the limit. This is called the boundary burst problem.
2. Sliding Window Log
Fixes the boundary burst by storing each request's timestamp. When a new request arrives, remove all timestamps older than the window size, then count the remaining entries.
// Sliding Window Log
public class SlidingWindowLog
{
private readonly Queue<DateTime> _timestamps = new();
private readonly int _maxRequests;
private readonly TimeSpan _windowSize;
private readonly object _lock = new();
public SlidingWindowLog(int maxRequests, TimeSpan windowSize)
{
_maxRequests = maxRequests;
_windowSize = windowSize;
}
public bool TryAcquire()
{
lock (_lock)
{
var now = DateTime.UtcNow;
var windowStart = now - _windowSize;
while (_timestamps.Count > 0 && _timestamps.Peek() < windowStart)
_timestamps.Dequeue();
if (_timestamps.Count < _maxRequests)
{
_timestamps.Enqueue(now);
return true;
}
return false;
}
}
}
Pros: Perfectly accurate, no boundary burst. Cons: Memory-intensive since every timestamp must be stored — at 10,000 req/s, you need 10,000 entries per second.
3. Sliding Window Counter
Combines the best of both worlds: uses counters (memory-efficient) but slides in real-time (prevents boundary burst). Calculates the allowed request count using a weighted average between the previous window and the current window.
// Sliding Window Counter — hybrid approach
public class SlidingWindowCounter
{
private int _prevCount;
private int _currCount;
private DateTime _windowStart;
private readonly int _maxRequests;
private readonly TimeSpan _windowSize;
private readonly object _lock = new();
public bool TryAcquire()
{
lock (_lock)
{
var now = DateTime.UtcNow;
var elapsed = now - _windowStart;
if (elapsed >= _windowSize * 2)
{
_prevCount = 0;
_currCount = 0;
_windowStart = now;
}
else if (elapsed >= _windowSize)
{
_prevCount = _currCount;
_currCount = 0;
_windowStart += _windowSize;
elapsed = now - _windowStart;
}
// Weighted count: remaining portion of previous window + current window
double weight = 1.0 - (elapsed.TotalMilliseconds / _windowSize.TotalMilliseconds);
double estimatedCount = (_prevCount * weight) + _currCount;
if (estimatedCount < _maxRequests)
{
_currCount++;
return true;
}
return false;
}
}
}
4. Token Bucket
The most intuitive model: a "bucket" holds tokens that are refilled at a constant rate. Each request consumes one token. When the bucket is empty, requests are rejected. The bucket has a maximum capacity, allowing controlled bursts.
flowchart TB
subgraph TB["Token Bucket"]
direction TB
R[Refill: 10 tokens/sec] --> B[Bucket
capacity: 100 tokens]
B --> C{Request arrives}
C -->|Token available| D[Consume token, allow]
C -->|No tokens| E[Reject 429]
end
style R fill:#4CAF50,stroke:#fff,color:#fff
style B fill:#e94560,stroke:#fff,color:#fff
style D fill:#4CAF50,stroke:#fff,color:#fff
style E fill:#ff9800,stroke:#fff,color:#fff
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
// Token Bucket
public class TokenBucket
{
private double _tokens;
private DateTime _lastRefill;
private readonly double _maxTokens;
private readonly double _refillRate; // tokens per second
private readonly object _lock = new();
public TokenBucket(double maxTokens, double refillRate)
{
_maxTokens = maxTokens;
_refillRate = refillRate;
_tokens = maxTokens;
_lastRefill = DateTime.UtcNow;
}
public bool TryAcquire(int tokens = 1)
{
lock (_lock)
{
Refill();
if (_tokens >= tokens)
{
_tokens -= tokens;
return true;
}
return false;
}
}
private void Refill()
{
var now = DateTime.UtcNow;
var elapsed = (now - _lastRefill).TotalSeconds;
_tokens = Math.Min(_maxTokens, _tokens + elapsed * _refillRate);
_lastRefill = now;
}
}
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Control | Complexity | Use Case |
|---|---|---|---|---|---|
| Fixed Window | O(1) | Moderate | Poor (boundary burst) | Low | Internal APIs, MVP |
| Sliding Window Log | O(n) | Highest | Good | Medium | Billing, audit-critical |
| Sliding Window Counter | O(1) | High | Good | Medium | Public APIs (recommended) |
| Token Bucket | O(1) | High | Best (configurable) | Low | API Gateway, microservices |
Implementation on ASP.NET Core 10
Since .NET 7, Microsoft has included built-in Rate Limiting middleware in the System.Threading.RateLimiting namespace. By .NET 10, this middleware has matured with multi-tenant partitioning, chained limiters, and deeper integration with minimal APIs.
Basic Configuration with Fixed Window
// Program.cs — ASP.NET Core 10
using System.Threading.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.AddFixedWindowLimiter("fixed", opt =>
{
opt.PermitLimit = 100;
opt.Window = TimeSpan.FromMinutes(1);
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
opt.QueueLimit = 10;
});
});
var app = builder.Build();
app.UseRateLimiter();
app.MapGet("/api/products", () => Results.Ok(new { products = new[] { "A", "B" } }))
.RequireRateLimiting("fixed");
app.Run();
Token Bucket for Public APIs
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = 429;
options.AddTokenBucketLimiter("api-public", opt =>
{
opt.TokenLimit = 100; // Burst capacity
opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
opt.TokensPerPeriod = 20; // 20 tokens every 10s = 2 req/s sustained
opt.AutoReplenishment = true;
opt.QueueLimit = 5;
});
});
Sliding Window with Per-User Partition
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("per-user", httpContext =>
{
var userId = httpContext.User.FindFirst("sub")?.Value
?? httpContext.Connection.RemoteIpAddress?.ToString()
?? "anonymous";
return RateLimitPartition.GetSlidingWindowLimiter(userId, _ => new SlidingWindowRateLimiterOptions
{
PermitLimit = 60,
Window = TimeSpan.FromMinutes(1),
SegmentsPerWindow = 6, // 6 segments × 10 seconds each
QueueLimit = 0
});
});
options.OnRejected = async (context, ct) =>
{
context.HttpContext.Response.Headers["Retry-After"] = "60";
await context.HttpContext.Response.WriteAsJsonAsync(new
{
error = "rate_limit_exceeded",
message = "Too many requests. Please try again later.",
retryAfter = 60
}, ct);
};
});
Multi-tenant Rate Limiting (.NET 10)
.NET 10 improves RateLimitPartition to support combining multiple partition keys simultaneously — for example, limiting by user + endpoint + plan tier. This is particularly useful for SaaS APIs where each pricing plan has different rate limits.
Chained Rate Limiters — multi-layer protection
builder.Services.AddRateLimiter(options =>
{
// Layer 1: Global — protects the entire server
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
return RateLimitPartition.GetFixedWindowLimiter("global", _ =>
new FixedWindowRateLimiterOptions
{
PermitLimit = 10_000,
Window = TimeSpan.FromMinutes(1)
});
});
// Layer 2: Per-IP — prevents abuse from a single source
options.AddPolicy("per-ip", context =>
{
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return RateLimitPartition.GetTokenBucketLimiter(ip, _ =>
new TokenBucketRateLimiterOptions
{
TokenLimit = 50,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10,
AutoReplenishment = true
});
});
// Layer 3: Per-User-Per-Endpoint — fine-grained control
options.AddPolicy("user-endpoint", context =>
{
var user = context.User.FindFirst("sub")?.Value ?? "anon";
var endpoint = context.GetEndpoint()?.DisplayName ?? "default";
var key = $"{user}:{endpoint}";
return RateLimitPartition.GetSlidingWindowLimiter(key, _ =>
new SlidingWindowRateLimiterOptions
{
PermitLimit = 30,
Window = TimeSpan.FromMinutes(1),
SegmentsPerWindow = 6
});
});
});
Distributed Rate Limiting
The algorithms above work well on a single instance. But in microservices architecture with multiple replicas behind a load balancer, each instance maintains its own counter — a client can bypass the limit by sending requests to different instances.
flowchart TB
C[Client
Limit: 100 req/min] --> LB[Load Balancer]
LB --> S1[Server 1
Counter: 40]
LB --> S2[Server 2
Counter: 35]
LB --> S3[Server 3
Counter: 38]
S1 -.->|Actual total: 113 req| NOTE[Over limit!
But no instance knows]
S2 -.-> NOTE
S3 -.-> NOTE
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style LB fill:#2c3e50,stroke:#fff,color:#fff
style S1 fill:#4CAF50,stroke:#fff,color:#fff
style S2 fill:#4CAF50,stroke:#fff,color:#fff
style S3 fill:#4CAF50,stroke:#fff,color:#fff
style NOTE fill:#ff9800,stroke:#fff,color:#fff
Solution: Centralized Counter with Redis
Use Redis as a shared store for counters — all instances read/write to the same counter. Redis Lua scripting ensures atomicity for increment + check operations.
// Distributed Sliding Window Counter with Redis
public class RedisRateLimiter
{
private readonly IConnectionMultiplexer _redis;
private readonly string _luaScript = @"
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
-- Remove entries older than window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count entries within window
local count = redis.call('ZCARD', key)
if count < limit then
-- Add new request with score = timestamp
redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
redis.call('EXPIRE', key, math.ceil(window / 1000))
return 1
end
return 0
";
public RedisRateLimiter(IConnectionMultiplexer redis)
{
_redis = redis;
}
public async Task<bool> TryAcquireAsync(string clientId, int limit, TimeSpan window)
{
var db = _redis.GetDatabase();
var now = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
var result = await db.ScriptEvaluateAsync(
_luaScript,
new RedisKey[] { $"rate:{clientId}" },
new RedisValue[] { now, (long)window.TotalMilliseconds, limit });
return (int)result == 1;
}
}
Why use Lua scripts?
Redis executes Lua scripts atomically — there's no race condition between ZREMRANGEBYSCORE, ZCARD, and ZADD. If separated into individual Redis commands, two concurrent requests might both read count = 99 (under the limit of 100) and both get allowed, exceeding the limit.
Distributed Token Bucket with Redis
// Distributed Token Bucket — Redis Lua
public class RedisTokenBucket
{
private readonly IConnectionMultiplexer _redis;
private readonly string _luaScript = @"
local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or max_tokens
local last_refill = tonumber(bucket[2]) or now
-- Calculate refilled tokens
local elapsed = (now - last_refill) / 1000.0
tokens = math.min(max_tokens, tokens + elapsed * refill_rate)
if tokens >= requested then
tokens = tokens - requested
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(max_tokens / refill_rate) + 10)
return 1
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(max_tokens / refill_rate) + 10)
return 0
";
public async Task<bool> TryAcquireAsync(
string clientId, int maxTokens, double refillRate, int tokens = 1)
{
var db = _redis.GetDatabase();
var now = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
var result = await db.ScriptEvaluateAsync(
_luaScript,
new RedisKey[] { $"bucket:{clientId}" },
new RedisValue[] { maxTokens, refillRate, now, tokens });
return (int)result == 1;
}
}
Integrating Redis Rate Limiter into ASP.NET Core
// Register services
builder.Services.AddSingleton<IConnectionMultiplexer>(
ConnectionMultiplexer.Connect("localhost:6379"));
builder.Services.AddSingleton<RedisRateLimiter>();
// Custom Rate Limiting Policy
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("redis-distributed", context =>
{
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return RateLimitPartition.Get(ip, key =>
{
var redis = context.RequestServices.GetRequiredService<RedisRateLimiter>();
// Return custom RateLimiter wrapper
return new RedisPartitionedLimiter(redis, key, limit: 100,
window: TimeSpan.FromMinutes(1));
});
});
});
Rate Limiting at the API Gateway Layer
In practice, rate limiting is typically implemented at multiple layers (defense in depth). The outermost layer — API Gateway or Edge — blocks traffic as early as possible, conserving resources for the application layer.
flowchart LR
C[Client] --> CF[Cloudflare
Edge Rate Limit]
CF --> GW[API Gateway
YARP / Kong]
GW --> SVC[Service
App-level Limit]
SVC --> DB[(Database)]
CF -.->|"Layer 1: IP-based
1000 req/min"| N1[ ]
GW -.->|"Layer 2: API Key
200 req/min"| N2[ ]
SVC -.->|"Layer 3: User+Endpoint
30 req/min"| N3[ ]
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style CF fill:#e94560,stroke:#fff,color:#fff
style GW fill:#2c3e50,stroke:#fff,color:#fff
style SVC fill:#4CAF50,stroke:#fff,color:#fff
style DB fill:#16213e,stroke:#fff,color:#fff
style N1 fill:transparent,stroke:transparent
style N2 fill:transparent,stroke:transparent
style N3 fill:transparent,stroke:transparent
Cloudflare Rate Limiting (Free Tier)
Cloudflare provides Rate Limiting rules even on the Free plan — allowing you to block abnormal traffic before it reaches your origin server. With the Enterprise plan, you get Advanced Rate Limiting based on request characteristics (headers, cookies, body fields).
// Cloudflare Rate Limiting Rule — example configuration
{
"description": "API rate limit — 100 req/min per IP",
"expression": "(http.request.uri.path matches \"^/api/\")",
"action": "block",
"ratelimit": {
"characteristics": ["ip.src"],
"period": 60,
"requests_per_period": 100,
"mitigation_timeout": 120,
"counting_expression": ""
}
}
YARP Reverse Proxy with Rate Limiting
// YARP + Rate Limiting — ASP.NET Core 10
builder.Services.AddReverseProxy()
.LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("gateway-limit", context =>
{
// Rate limit by API Key from header
var apiKey = context.Request.Headers["X-API-Key"].FirstOrDefault() ?? "no-key";
return RateLimitPartition.GetTokenBucketLimiter(apiKey, _ =>
new TokenBucketRateLimiterOptions
{
TokenLimit = 200,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 40,
AutoReplenishment = true
});
});
});
app.UseRateLimiter();
app.MapReverseProxy();
Best Practices & Anti-Patterns
Standard Response Headers
Always return rate limit headers so clients know their quota status. This is the recommended standard from RFC 6585 and RFC 7231:
// Middleware to add Rate Limit Headers
app.Use(async (context, next) =>
{
await next();
var limiterFeature = context.Features.Get<IRateLimiterStatisticsFeature>();
if (limiterFeature is not null)
{
var stats = limiterFeature.GetStatistics();
context.Response.Headers["X-RateLimit-Limit"] = "100";
context.Response.Headers["X-RateLimit-Remaining"] =
stats?.CurrentAvailablePermits.ToString() ?? "0";
context.Response.Headers["X-RateLimit-Reset"] =
DateTimeOffset.UtcNow.AddMinutes(1).ToUnixTimeSeconds().ToString();
}
});
Anti-patterns to Avoid
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Rate limit only by IP | Shared IPs (NAT, corporate proxy) affect multiple users | Combine IP + API Key + User ID |
| No Retry-After header | Clients retry immediately → thundering herd | Always return Retry-After + exponential backoff |
| Hard limit with no bypass | Health checks, admin, webhooks get blocked | Whitelist trusted IPs/service accounts |
| Limit too low on deploy | False positives → user frustration | Start permissive, monitor, then tighten |
| Not logging rejections | No visibility into who's getting rejected and why | Log clientId, endpoint, reject reason → dashboard |
Graceful degradation instead of hard rejection
Instead of immediately returning 429, consider softer strategies: return cached responses, reduce quality (smaller images, less data), or queue requests for later processing. This maintains a better user experience during peak traffic.
Monitoring & Alerting
// Integration with OpenTelemetry Metrics
builder.Services.AddRateLimiter(options =>
{
options.OnRejected = async (context, ct) =>
{
// Emit metric
var meter = context.HttpContext.RequestServices
.GetRequiredService<IMeterFactory>()
.Create("RateLimiting");
var counter = meter.CreateCounter<long>("rate_limit.rejections");
counter.Add(1, new KeyValuePair<string, object?>("endpoint",
context.HttpContext.GetEndpoint()?.DisplayName));
context.HttpContext.Response.StatusCode = 429;
context.HttpContext.Response.Headers["Retry-After"] = "60";
await context.HttpContext.Response.WriteAsJsonAsync(new
{
error = "rate_limit_exceeded",
retryAfter = 60
}, ct);
};
});
Conclusion
Rate Limiting is not just about "blocking bots" — it's a critical architectural component that determines your system's load capacity, security posture, and user experience. Choosing the right algorithm and deployment layer depends on your specific requirements:
- MVP / Internal API: Fixed Window on ASP.NET Core is sufficient
- Public API: Sliding Window Counter or Token Bucket, partitioned per user
- Microservices: Distributed Token Bucket with Redis + Edge rate limiting (Cloudflare)
- SaaS multi-tenant: Chained limiters — Global → Per-Tenant → Per-User → Per-Endpoint
The key takeaway: start simple with .NET 10's built-in middleware, monitor carefully, then scale to distributed solutions when your architecture demands it. Don't over-engineer from the start, but never forget that every public API without rate limiting is a ticking time bomb.
C# 14 — Breakthrough Features in .NET 10
Vertical Slice Architecture on .NET 10 — Organize Code by Feature
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.