Rate Limiting — Controlling API Traffic in Distributed Systems

Posted on: 4/23/2026 5:11:11 AM

~90% API attacks are brute-force or DDoS
4 Most common Rate Limiting algorithms
<1ms Overhead of an optimized Rate Limiter
.NET 10 Built-in Rate Limiting Middleware

Table of Contents

Why Rate Limiting?

Imagine you're running a public API serving millions of requests per day. One morning, traffic spikes 50x — not because your product went viral, but because a scraping bot or a competitor launched a DDoS attack. Without traffic control, your entire system collapses, impacting every legitimate user.

Rate Limiting is the technique of controlling the number of requests a client can send to a server within a given time period. It is one of the essential components in modern distributed system architecture.

What does Rate Limiting solve?

  • Resource protection: Prevents CPU, memory, and database connection exhaustion
  • Fairness: Ensures every client gets a fair share of API access
  • Security: Mitigates brute-force attacks, credential stuffing, and scraping
  • Cost control: Prevents cloud bill shock from abnormal traffic spikes
  • SLA compliance: Guarantees service quality for paying customers
flowchart LR
    A[Client Request] --> B{Rate Limiter}
    B -->|Allowed| C[API Server]
    B -->|Rejected| D[429 Too Many Requests]
    C --> E[Database / Service]
    D --> F[Retry-After Header]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style C fill:#4CAF50,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
  
Basic Rate Limiter request flow

4 Core Rate Limiting Algorithms

1. Fixed Window Counter

The simplest algorithm: divide time into fixed windows (e.g., every minute), count requests in each window. When the counter exceeds the threshold, reject requests until a new window begins.

gantt
    title Fixed Window Counter — 100 req/min
    dateFormat X
    axisFormat %s

    section Window 1 (00-60s)
    80 requests (OK)        :done, 0, 60
    section Window 2 (60-120s)
    120 requests (20 rejected) :crit, 60, 120
    section Window 3 (120-180s)
    45 requests (OK)        :done, 120, 180
  
Fixed Window divides time into fixed intervals
// Fixed Window Counter — simple illustration
public class FixedWindowLimiter
{
    private int _counter;
    private DateTime _windowStart;
    private readonly int _maxRequests;
    private readonly TimeSpan _windowSize;
    private readonly object _lock = new();

    public FixedWindowLimiter(int maxRequests, TimeSpan windowSize)
    {
        _maxRequests = maxRequests;
        _windowSize = windowSize;
        _windowStart = DateTime.UtcNow;
    }

    public bool TryAcquire()
    {
        lock (_lock)
        {
            var now = DateTime.UtcNow;
            if (now - _windowStart >= _windowSize)
            {
                _counter = 0;
                _windowStart = now;
            }
            if (_counter < _maxRequests)
            {
                _counter++;
                return true;
            }
            return false;
        }
    }
}

Boundary burst problem

The biggest weakness of Fixed Window: if a client sends 100 requests at the end of window 1 and 100 requests at the start of window 2, the system receives 200 requests in a very short time — double the limit. This is called the boundary burst problem.

2. Sliding Window Log

Fixes the boundary burst by storing each request's timestamp. When a new request arrives, remove all timestamps older than the window size, then count the remaining entries.

// Sliding Window Log
public class SlidingWindowLog
{
    private readonly Queue<DateTime> _timestamps = new();
    private readonly int _maxRequests;
    private readonly TimeSpan _windowSize;
    private readonly object _lock = new();

    public SlidingWindowLog(int maxRequests, TimeSpan windowSize)
    {
        _maxRequests = maxRequests;
        _windowSize = windowSize;
    }

    public bool TryAcquire()
    {
        lock (_lock)
        {
            var now = DateTime.UtcNow;
            var windowStart = now - _windowSize;

            while (_timestamps.Count > 0 && _timestamps.Peek() < windowStart)
                _timestamps.Dequeue();

            if (_timestamps.Count < _maxRequests)
            {
                _timestamps.Enqueue(now);
                return true;
            }
            return false;
        }
    }
}

Pros: Perfectly accurate, no boundary burst. Cons: Memory-intensive since every timestamp must be stored — at 10,000 req/s, you need 10,000 entries per second.

3. Sliding Window Counter

Combines the best of both worlds: uses counters (memory-efficient) but slides in real-time (prevents boundary burst). Calculates the allowed request count using a weighted average between the previous window and the current window.

// Sliding Window Counter — hybrid approach
public class SlidingWindowCounter
{
    private int _prevCount;
    private int _currCount;
    private DateTime _windowStart;
    private readonly int _maxRequests;
    private readonly TimeSpan _windowSize;
    private readonly object _lock = new();

    public bool TryAcquire()
    {
        lock (_lock)
        {
            var now = DateTime.UtcNow;
            var elapsed = now - _windowStart;

            if (elapsed >= _windowSize * 2)
            {
                _prevCount = 0;
                _currCount = 0;
                _windowStart = now;
            }
            else if (elapsed >= _windowSize)
            {
                _prevCount = _currCount;
                _currCount = 0;
                _windowStart += _windowSize;
                elapsed = now - _windowStart;
            }

            // Weighted count: remaining portion of previous window + current window
            double weight = 1.0 - (elapsed.TotalMilliseconds / _windowSize.TotalMilliseconds);
            double estimatedCount = (_prevCount * weight) + _currCount;

            if (estimatedCount < _maxRequests)
            {
                _currCount++;
                return true;
            }
            return false;
        }
    }
}

4. Token Bucket

The most intuitive model: a "bucket" holds tokens that are refilled at a constant rate. Each request consumes one token. When the bucket is empty, requests are rejected. The bucket has a maximum capacity, allowing controlled bursts.

flowchart TB
    subgraph TB["Token Bucket"]
        direction TB
        R[Refill: 10 tokens/sec] --> B[Bucket
capacity: 100 tokens] B --> C{Request arrives} C -->|Token available| D[Consume token, allow] C -->|No tokens| E[Reject 429] end style R fill:#4CAF50,stroke:#fff,color:#fff style B fill:#e94560,stroke:#fff,color:#fff style D fill:#4CAF50,stroke:#fff,color:#fff style E fill:#ff9800,stroke:#fff,color:#fff style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Token Bucket allows bursts while controlling average rate
// Token Bucket
public class TokenBucket
{
    private double _tokens;
    private DateTime _lastRefill;
    private readonly double _maxTokens;
    private readonly double _refillRate; // tokens per second
    private readonly object _lock = new();

    public TokenBucket(double maxTokens, double refillRate)
    {
        _maxTokens = maxTokens;
        _refillRate = refillRate;
        _tokens = maxTokens;
        _lastRefill = DateTime.UtcNow;
    }

    public bool TryAcquire(int tokens = 1)
    {
        lock (_lock)
        {
            Refill();
            if (_tokens >= tokens)
            {
                _tokens -= tokens;
                return true;
            }
            return false;
        }
    }

    private void Refill()
    {
        var now = DateTime.UtcNow;
        var elapsed = (now - _lastRefill).TotalSeconds;
        _tokens = Math.Min(_maxTokens, _tokens + elapsed * _refillRate);
        _lastRefill = now;
    }
}

Algorithm Comparison

Algorithm Memory Accuracy Burst Control Complexity Use Case
Fixed Window O(1) Moderate Poor (boundary burst) Low Internal APIs, MVP
Sliding Window Log O(n) Highest Good Medium Billing, audit-critical
Sliding Window Counter O(1) High Good Medium Public APIs (recommended)
Token Bucket O(1) High Best (configurable) Low API Gateway, microservices

Implementation on ASP.NET Core 10

Since .NET 7, Microsoft has included built-in Rate Limiting middleware in the System.Threading.RateLimiting namespace. By .NET 10, this middleware has matured with multi-tenant partitioning, chained limiters, and deeper integration with minimal APIs.

Basic Configuration with Fixed Window

// Program.cs — ASP.NET Core 10
using System.Threading.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.AddFixedWindowLimiter("fixed", opt =>
    {
        opt.PermitLimit = 100;
        opt.Window = TimeSpan.FromMinutes(1);
        opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
        opt.QueueLimit = 10;
    });
});

var app = builder.Build();
app.UseRateLimiter();

app.MapGet("/api/products", () => Results.Ok(new { products = new[] { "A", "B" } }))
   .RequireRateLimiting("fixed");

app.Run();

Token Bucket for Public APIs

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = 429;

    options.AddTokenBucketLimiter("api-public", opt =>
    {
        opt.TokenLimit = 100;           // Burst capacity
        opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
        opt.TokensPerPeriod = 20;       // 20 tokens every 10s = 2 req/s sustained
        opt.AutoReplenishment = true;
        opt.QueueLimit = 5;
    });
});

Sliding Window with Per-User Partition

builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("per-user", httpContext =>
    {
        var userId = httpContext.User.FindFirst("sub")?.Value
                     ?? httpContext.Connection.RemoteIpAddress?.ToString()
                     ?? "anonymous";

        return RateLimitPartition.GetSlidingWindowLimiter(userId, _ => new SlidingWindowRateLimiterOptions
        {
            PermitLimit = 60,
            Window = TimeSpan.FromMinutes(1),
            SegmentsPerWindow = 6,      // 6 segments × 10 seconds each
            QueueLimit = 0
        });
    });

    options.OnRejected = async (context, ct) =>
    {
        context.HttpContext.Response.Headers["Retry-After"] = "60";
        await context.HttpContext.Response.WriteAsJsonAsync(new
        {
            error = "rate_limit_exceeded",
            message = "Too many requests. Please try again later.",
            retryAfter = 60
        }, ct);
    };
});

Multi-tenant Rate Limiting (.NET 10)

.NET 10 improves RateLimitPartition to support combining multiple partition keys simultaneously — for example, limiting by user + endpoint + plan tier. This is particularly useful for SaaS APIs where each pricing plan has different rate limits.

Chained Rate Limiters — multi-layer protection

builder.Services.AddRateLimiter(options =>
{
    // Layer 1: Global — protects the entire server
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
    {
        return RateLimitPartition.GetFixedWindowLimiter("global", _ =>
            new FixedWindowRateLimiterOptions
            {
                PermitLimit = 10_000,
                Window = TimeSpan.FromMinutes(1)
            });
    });

    // Layer 2: Per-IP — prevents abuse from a single source
    options.AddPolicy("per-ip", context =>
    {
        var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        return RateLimitPartition.GetTokenBucketLimiter(ip, _ =>
            new TokenBucketRateLimiterOptions
            {
                TokenLimit = 50,
                ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                TokensPerPeriod = 10,
                AutoReplenishment = true
            });
    });

    // Layer 3: Per-User-Per-Endpoint — fine-grained control
    options.AddPolicy("user-endpoint", context =>
    {
        var user = context.User.FindFirst("sub")?.Value ?? "anon";
        var endpoint = context.GetEndpoint()?.DisplayName ?? "default";
        var key = $"{user}:{endpoint}";

        return RateLimitPartition.GetSlidingWindowLimiter(key, _ =>
            new SlidingWindowRateLimiterOptions
            {
                PermitLimit = 30,
                Window = TimeSpan.FromMinutes(1),
                SegmentsPerWindow = 6
            });
    });
});

Distributed Rate Limiting

The algorithms above work well on a single instance. But in microservices architecture with multiple replicas behind a load balancer, each instance maintains its own counter — a client can bypass the limit by sending requests to different instances.

flowchart TB
    C[Client
Limit: 100 req/min] --> LB[Load Balancer] LB --> S1[Server 1
Counter: 40] LB --> S2[Server 2
Counter: 35] LB --> S3[Server 3
Counter: 38] S1 -.->|Actual total: 113 req| NOTE[Over limit!
But no instance knows] S2 -.-> NOTE S3 -.-> NOTE style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style LB fill:#2c3e50,stroke:#fff,color:#fff style S1 fill:#4CAF50,stroke:#fff,color:#fff style S2 fill:#4CAF50,stroke:#fff,color:#fff style S3 fill:#4CAF50,stroke:#fff,color:#fff style NOTE fill:#ff9800,stroke:#fff,color:#fff
Rate limiting problem with multiple server instances

Solution: Centralized Counter with Redis

Use Redis as a shared store for counters — all instances read/write to the same counter. Redis Lua scripting ensures atomicity for increment + check operations.

// Distributed Sliding Window Counter with Redis
public class RedisRateLimiter
{
    private readonly IConnectionMultiplexer _redis;
    private readonly string _luaScript = @"
        local key = KEYS[1]
        local now = tonumber(ARGV[1])
        local window = tonumber(ARGV[2])
        local limit = tonumber(ARGV[3])

        -- Remove entries older than window
        redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

        -- Count entries within window
        local count = redis.call('ZCARD', key)

        if count < limit then
            -- Add new request with score = timestamp
            redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
            redis.call('EXPIRE', key, math.ceil(window / 1000))
            return 1
        end
        return 0
    ";

    public RedisRateLimiter(IConnectionMultiplexer redis)
    {
        _redis = redis;
    }

    public async Task<bool> TryAcquireAsync(string clientId, int limit, TimeSpan window)
    {
        var db = _redis.GetDatabase();
        var now = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();

        var result = await db.ScriptEvaluateAsync(
            _luaScript,
            new RedisKey[] { $"rate:{clientId}" },
            new RedisValue[] { now, (long)window.TotalMilliseconds, limit });

        return (int)result == 1;
    }
}

Why use Lua scripts?

Redis executes Lua scripts atomically — there's no race condition between ZREMRANGEBYSCORE, ZCARD, and ZADD. If separated into individual Redis commands, two concurrent requests might both read count = 99 (under the limit of 100) and both get allowed, exceeding the limit.

Distributed Token Bucket with Redis

// Distributed Token Bucket — Redis Lua
public class RedisTokenBucket
{
    private readonly IConnectionMultiplexer _redis;
    private readonly string _luaScript = @"
        local key = KEYS[1]
        local max_tokens = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        local requested = tonumber(ARGV[4])

        local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
        local tokens = tonumber(bucket[1]) or max_tokens
        local last_refill = tonumber(bucket[2]) or now

        -- Calculate refilled tokens
        local elapsed = (now - last_refill) / 1000.0
        tokens = math.min(max_tokens, tokens + elapsed * refill_rate)

        if tokens >= requested then
            tokens = tokens - requested
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, math.ceil(max_tokens / refill_rate) + 10)
            return 1
        end

        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(max_tokens / refill_rate) + 10)
        return 0
    ";

    public async Task<bool> TryAcquireAsync(
        string clientId, int maxTokens, double refillRate, int tokens = 1)
    {
        var db = _redis.GetDatabase();
        var now = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();

        var result = await db.ScriptEvaluateAsync(
            _luaScript,
            new RedisKey[] { $"bucket:{clientId}" },
            new RedisValue[] { maxTokens, refillRate, now, tokens });

        return (int)result == 1;
    }
}

Integrating Redis Rate Limiter into ASP.NET Core

// Register services
builder.Services.AddSingleton<IConnectionMultiplexer>(
    ConnectionMultiplexer.Connect("localhost:6379"));
builder.Services.AddSingleton<RedisRateLimiter>();

// Custom Rate Limiting Policy
builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("redis-distributed", context =>
    {
        var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";

        return RateLimitPartition.Get(ip, key =>
        {
            var redis = context.RequestServices.GetRequiredService<RedisRateLimiter>();
            // Return custom RateLimiter wrapper
            return new RedisPartitionedLimiter(redis, key, limit: 100,
                window: TimeSpan.FromMinutes(1));
        });
    });
});

Rate Limiting at the API Gateway Layer

In practice, rate limiting is typically implemented at multiple layers (defense in depth). The outermost layer — API Gateway or Edge — blocks traffic as early as possible, conserving resources for the application layer.

flowchart LR
    C[Client] --> CF[Cloudflare
Edge Rate Limit] CF --> GW[API Gateway
YARP / Kong] GW --> SVC[Service
App-level Limit] SVC --> DB[(Database)] CF -.->|"Layer 1: IP-based
1000 req/min"| N1[ ] GW -.->|"Layer 2: API Key
200 req/min"| N2[ ] SVC -.->|"Layer 3: User+Endpoint
30 req/min"| N3[ ] style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style CF fill:#e94560,stroke:#fff,color:#fff style GW fill:#2c3e50,stroke:#fff,color:#fff style SVC fill:#4CAF50,stroke:#fff,color:#fff style DB fill:#16213e,stroke:#fff,color:#fff style N1 fill:transparent,stroke:transparent style N2 fill:transparent,stroke:transparent style N3 fill:transparent,stroke:transparent
Defense in depth: multi-layer rate limiting

Cloudflare Rate Limiting (Free Tier)

Cloudflare provides Rate Limiting rules even on the Free plan — allowing you to block abnormal traffic before it reaches your origin server. With the Enterprise plan, you get Advanced Rate Limiting based on request characteristics (headers, cookies, body fields).

// Cloudflare Rate Limiting Rule — example configuration
{
  "description": "API rate limit — 100 req/min per IP",
  "expression": "(http.request.uri.path matches \"^/api/\")",
  "action": "block",
  "ratelimit": {
    "characteristics": ["ip.src"],
    "period": 60,
    "requests_per_period": 100,
    "mitigation_timeout": 120,
    "counting_expression": ""
  }
}

YARP Reverse Proxy with Rate Limiting

// YARP + Rate Limiting — ASP.NET Core 10
builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("gateway-limit", context =>
    {
        // Rate limit by API Key from header
        var apiKey = context.Request.Headers["X-API-Key"].FirstOrDefault() ?? "no-key";

        return RateLimitPartition.GetTokenBucketLimiter(apiKey, _ =>
            new TokenBucketRateLimiterOptions
            {
                TokenLimit = 200,
                ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                TokensPerPeriod = 40,
                AutoReplenishment = true
            });
    });
});

app.UseRateLimiter();
app.MapReverseProxy();

Best Practices & Anti-Patterns

Standard Response Headers

Always return rate limit headers so clients know their quota status. This is the recommended standard from RFC 6585 and RFC 7231:

// Middleware to add Rate Limit Headers
app.Use(async (context, next) =>
{
    await next();

    var limiterFeature = context.Features.Get<IRateLimiterStatisticsFeature>();
    if (limiterFeature is not null)
    {
        var stats = limiterFeature.GetStatistics();
        context.Response.Headers["X-RateLimit-Limit"] = "100";
        context.Response.Headers["X-RateLimit-Remaining"] =
            stats?.CurrentAvailablePermits.ToString() ?? "0";
        context.Response.Headers["X-RateLimit-Reset"] =
            DateTimeOffset.UtcNow.AddMinutes(1).ToUnixTimeSeconds().ToString();
    }
});

Anti-patterns to Avoid

Anti-Pattern Problem Solution
Rate limit only by IP Shared IPs (NAT, corporate proxy) affect multiple users Combine IP + API Key + User ID
No Retry-After header Clients retry immediately → thundering herd Always return Retry-After + exponential backoff
Hard limit with no bypass Health checks, admin, webhooks get blocked Whitelist trusted IPs/service accounts
Limit too low on deploy False positives → user frustration Start permissive, monitor, then tighten
Not logging rejections No visibility into who's getting rejected and why Log clientId, endpoint, reject reason → dashboard

Graceful degradation instead of hard rejection

Instead of immediately returning 429, consider softer strategies: return cached responses, reduce quality (smaller images, less data), or queue requests for later processing. This maintains a better user experience during peak traffic.

Monitoring & Alerting

// Integration with OpenTelemetry Metrics
builder.Services.AddRateLimiter(options =>
{
    options.OnRejected = async (context, ct) =>
    {
        // Emit metric
        var meter = context.HttpContext.RequestServices
            .GetRequiredService<IMeterFactory>()
            .Create("RateLimiting");

        var counter = meter.CreateCounter<long>("rate_limit.rejections");
        counter.Add(1, new KeyValuePair<string, object?>("endpoint",
            context.HttpContext.GetEndpoint()?.DisplayName));

        context.HttpContext.Response.StatusCode = 429;
        context.HttpContext.Response.Headers["Retry-After"] = "60";

        await context.HttpContext.Response.WriteAsJsonAsync(new
        {
            error = "rate_limit_exceeded",
            retryAfter = 60
        }, ct);
    };
});

Conclusion

Rate Limiting is not just about "blocking bots" — it's a critical architectural component that determines your system's load capacity, security posture, and user experience. Choosing the right algorithm and deployment layer depends on your specific requirements:

  • MVP / Internal API: Fixed Window on ASP.NET Core is sufficient
  • Public API: Sliding Window Counter or Token Bucket, partitioned per user
  • Microservices: Distributed Token Bucket with Redis + Edge rate limiting (Cloudflare)
  • SaaS multi-tenant: Chained limiters — Global → Per-Tenant → Per-User → Per-Endpoint

The key takeaway: start simple with .NET 10's built-in middleware, monitor carefully, then scale to distributed solutions when your architecture demands it. Don't over-engineer from the start, but never forget that every public API without rate limiting is a ticking time bomb.