API Gateway 2026 — Central Gateway Architecture for Microservices with YARP, Kong, and the BFF Pattern

Posted on: 4/18/2026 10:12:46 AM

In microservices architecture, one of the biggest challenges is: how many services should a client connect to? When a system has 10, 50, or 200 services, letting clients call each service directly is both complex and a security/operations nightmare. The API Gateway is the answer — a single central gateway sitting between the client and the entire backend, handling routing, authentication, rate limiting, load balancing, and a long list of cross-cutting concerns.

This article dives deep into API Gateway architecture for 2026, from core design patterns and a real-world YARP implementation on .NET 10, to comparisons with Kong and AWS API Gateway, and the Backend-for-Frontend (BFF) pattern for production systems.

1. What Is an API Gateway and Why Do Microservices Need One?

An API Gateway is a reverse proxy that sits between clients (web, mobile, third-party) and backend services. Instead of the client knowing each service's address and protocol, every request goes through a single point — the gateway — and is then routed to the correct service behind it.

graph TB
    subgraph Clients
        WEB[Web App]
        MOB[Mobile App]
        EXT[3rd-party API]
    end

    GW[API Gateway]

    subgraph Backend Services
        US[User Service]
        OS[Order Service]
        PS[Product Service]
        NS[Notification Service]
    end

    WEB --> GW
    MOB --> GW
    EXT --> GW
    GW --> US
    GW --> OS
    GW --> PS
    GW --> NS

    style GW fill:#e94560,stroke:#fff,color:#fff
    style WEB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style MOB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style EXT fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style US fill:#2c3e50,stroke:#fff,color:#fff
    style OS fill:#2c3e50,stroke:#fff,color:#fff
    style PS fill:#2c3e50,stroke:#fff,color:#fff
    style NS fill:#2c3e50,stroke:#fff,color:#fff
Figure 1: API Gateway — the central point connecting clients to microservices

Why not call each service directly?

When clients call directly: (1) they must manage N different endpoints, (2) every service has to handle auth, rate limiting, and CORS itself, (3) internal changes (splitting/merging services) directly affect clients, (4) there's no central place to monitor traffic. The API Gateway solves all of these by centralizing cross-cutting concerns into a single layer.

1.1. What an API Gateway Handles

Routing & Path Rewriting

Map public URLs to internal service endpoints. For example: /api/orders/* → Orders Service, /api/users/* → User Service. Supports path prefix stripping and query string forwarding.

Authentication & Authorization

Validate the JWT/OAuth2 token once at the gateway and forward claims to downstream services. Eliminates the need for every service to validate tokens.

Rate Limiting & Throttling

Protect the backend from abuse using fixed window, sliding window, or token bucket algorithms. Apply per IP, user, API key, or a specific route.

Load Balancing

Distribute traffic across multiple instances: Round Robin, Least Connections, Weighted, or Consistent Hashing. Combine with health checks to remove failing instances.

Response Caching

Cache responses at the gateway layer for read-heavy endpoints, reducing backend load. Supports cache invalidation via headers or TTL.

Observability & Logging

Trace every request from client to backend with a correlation ID. Export metrics (latency, error rate, throughput) to Prometheus/OpenTelemetry.

2. API Gateway Architecture — Design Patterns

2.1. Single Gateway vs Gateway per Client

There are two main approaches to designing an API Gateway:

graph LR
    subgraph "Pattern 1: Single Gateway"
        C1[All Clients] --> SG[Shared Gateway]
        SG --> S1[Service A]
        SG --> S2[Service B]
    end

    subgraph "Pattern 2: BFF — Gateway per Client"
        WA[Web App] --> WG[Web Gateway]
        MA[Mobile App] --> MG[Mobile Gateway]
        PA[Partner API] --> PG[Partner Gateway]
        WG --> S3[Service A]
        MG --> S3
        PG --> S3
        WG --> S4[Service B]
        MG --> S4
        PG --> S4
    end

    style SG fill:#e94560,stroke:#fff,color:#fff
    style WG fill:#e94560,stroke:#fff,color:#fff
    style MG fill:#e94560,stroke:#fff,color:#fff
    style PG fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style S4 fill:#2c3e50,stroke:#fff,color:#fff
Figure 2: Single Gateway vs Backend-for-Frontend (BFF) Pattern

Single Gateway is simple and easy to manage, suitable for small to mid-sized systems. But when Web needs a different response than Mobile (fewer fields, different format), the single gateway becomes a bottleneck — every mobile change affects web and vice versa.

The BFF Pattern (Backend-for-Frontend) solves this: each client type gets its own gateway, customized to its response shape, aggregation logic, and caching strategy. Netflix, Spotify, and Shopify all use BFFs in production.

2.2. Request Pipeline in an API Gateway

A request passes through multiple middleware layers in a fixed order. That order is critical — a wrong arrangement can introduce security holes or unwanted behavior.

graph TB
    REQ[Incoming Request] --> CORS[CORS Middleware]
    CORS --> RL[Rate Limiting]
    RL --> AUTH[Authentication]
    AUTH --> AUTHZ[Authorization]
    AUTHZ --> CACHE[Response Cache Check]
    CACHE --> TRANSFORM[Request Transform]
    TRANSFORM --> ROUTE[Route Matching]
    ROUTE --> LB[Load Balancer]
    LB --> HEALTH[Health Check Filter]
    HEALTH --> PROXY[Proxy to Backend]
    PROXY --> RES_TRANSFORM[Response Transform]
    RES_TRANSFORM --> LOG[Logging & Metrics]
    LOG --> RES[Response to Client]

    style REQ fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style CORS fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style AUTH fill:#e94560,stroke:#fff,color:#fff
    style AUTHZ fill:#e94560,stroke:#fff,color:#fff
    style CACHE fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style ROUTE fill:#2c3e50,stroke:#fff,color:#fff
    style LB fill:#2c3e50,stroke:#fff,color:#fff
    style HEALTH fill:#2c3e50,stroke:#fff,color:#fff
    style PROXY fill:#2c3e50,stroke:#fff,color:#fff
    style RES_TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LOG fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RES fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
Figure 3: Request Pipeline — middleware order inside the API Gateway

⚠ Middleware order matters

Rate Limiting MUST come before Authentication. If reversed, attackers can spam requests with invalid tokens — the gateway still burns resources validating JWTs before rate limiting kicks in. Placing rate limiting first blocks abuse as early as possible without wasting CPU on crypto operations.

2.3. Gateway Aggregation Pattern

One of the most powerful API Gateway patterns is request aggregation — combining multiple backend calls into a single response for the client. Instead of a mobile app making 3 separate API calls (user profile, orders, recommendations), the gateway fans them out in parallel, merges the results, and returns a single response.

// Aggregation example: client calls GET /api/dashboard
// Internally, the gateway calls in parallel:
//   GET /users/123/profile
//   GET /orders?userId=123&limit=5
//   GET /recommendations?userId=123
// Merged into 1 response { profile, recentOrders, recommendations }

This pattern is especially useful for mobile apps (fewer HTTP round-trips on slow networks) and dashboard pages (reduces waterfall loading).

3. YARP — An API Gateway on .NET 10

YARP (Yet Another Reverse Proxy) is Microsoft's open-source library designed as a fully programmable reverse proxy on ASP.NET Core. Unlike Kong or AWS API Gateway — which are feature-rich products — YARP is a toolkit you use to build a gateway matching your exact requirements.

200M+ NuGet Downloads
<1ms Proxy Overhead
100% ASP.NET Core Pipeline
.NET 10 LTS Support

3.1. Basic YARP Configuration

YARP works entirely via configuration — no need to write routing logic by hand. Each route maps a URL pattern to a cluster (a group of backend instances).

// Program.cs — Minimal API with YARP
var builder = WebApplication.CreateBuilder(args);

// Add YARP services
builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

// Add Rate Limiting
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("api-limit", opt =>
    {
        opt.PermitLimit = 100;
        opt.Window = TimeSpan.FromMinutes(1);
        opt.QueueLimit = 10;
    });
});

// Add Authentication
builder.Services.AddAuthentication("Bearer")
    .AddJwtBearer(options =>
    {
        options.Authority = "https://auth.example.com";
        options.TokenValidationParameters = new()
        {
            ValidateIssuer = true,
            ValidateAudience = true,
            ValidAudience = "my-api"
        };
    });

var app = builder.Build();

app.UseRateLimiter();
app.UseAuthentication();
app.UseAuthorization();
app.MapReverseProxy();

app.Run();
// appsettings.json — YARP routing configuration
{
  "ReverseProxy": {
    "Routes": {
      "orders-route": {
        "ClusterId": "orders-cluster",
        "AuthorizationPolicy": "default",
        "RateLimiterPolicy": "api-limit",
        "Match": {
          "Path": "/api/orders/{**catch-all}"
        },
        "Transforms": [
          { "PathRemovePrefix": "/api/orders" }
        ]
      },
      "users-route": {
        "ClusterId": "users-cluster",
        "Match": {
          "Path": "/api/users/{**catch-all}"
        },
        "Transforms": [
          { "PathRemovePrefix": "/api/users" },
          { "RequestHeader": "X-Forwarded-Prefix", "Set": "/api/users" }
        ]
      }
    },
    "Clusters": {
      "orders-cluster": {
        "LoadBalancingPolicy": "RoundRobin",
        "HealthCheck": {
          "Active": {
            "Enabled": true,
            "Interval": "00:00:30",
            "Timeout": "00:00:10",
            "Path": "/health"
          }
        },
        "Destinations": {
          "orders-1": { "Address": "https://orders-1:5001" },
          "orders-2": { "Address": "https://orders-2:5002" }
        }
      },
      "users-cluster": {
        "Destinations": {
          "users-1": { "Address": "https://users:5003" }
        }
      }
    }
  }
}

3.2. Custom Middleware in the YARP Pipeline

YARP's biggest advantage over other gateways: full control over the ASP.NET Core middleware pipeline. You can write custom middleware and insert it anywhere in the pipeline.

// Custom middleware: API Key validation for partner routes
app.MapReverseProxy(proxyPipeline =>
{
    proxyPipeline.Use(async (context, next) =>
    {
        var route = context.GetReverseProxyFeature().Route;

        if (route.Config.RouteId.StartsWith("partner-"))
        {
            if (!context.Request.Headers
                .TryGetValue("X-API-Key", out var apiKey)
                || !await ValidateApiKey(apiKey!))
            {
                context.Response.StatusCode = 401;
                await context.Response.WriteAsJsonAsync(
                    new { error = "Invalid API key" });
                return;
            }
        }

        await next();
    });

    // Add passive health checks
    proxyPipeline.UsePassiveHealthChecks();
});

3.3. Health Checks — Active and Passive

YARP supports two kinds of health checks so traffic flows only to healthy instances:

Active Health Check

The gateway proactively calls each destination's /health endpoint on a fixed interval (30 seconds by default). If a destination doesn't respond or returns an error status code, it's marked unhealthy and removed from the load balancer rotation. When it recovers, it's automatically added back.

Passive Health Check

Detects failures at request time. When a destination returns errors (5xx, timeout), YARP tracks the failure rate. If it crosses a threshold (e.g., 3 errors in 60 seconds), the destination is temporarily removed from the pool. Works like a circuit breaker — no extra health endpoint calls required.

Criterion YARP (.NET) Kong (OSS) AWS API Gateway Envoy
Deployment model Library (in-process) Standalone / K8s Managed service Sidecar / Standalone
Language C# / .NET Lua + Nginx/Kong N/A (managed) C++ / WASM filters
Performance overhead <1ms (in-process) 1-5ms (proxy hop) 5-29ms (managed) ~1ms (sidecar)
Customization Full middleware pipeline Plugin system (Lua/Go) Lambda authorizers WASM / Lua filters
Rate Limiting Native .NET (since .NET 7) Built-in plugin Built-in (throttling) Local/Global filters
Service Discovery Config / Custom provider DNS / Consul / K8s VPC Link / ALB xDS API (Istio)
Health Check Active + Passive Active (upstream) Managed Active + Passive + EDS
Cost Free (OSS) Free (OSS) / Enterprise $3.50/million requests Free (OSS)
Best fit .NET team, full control Multi-language, plugins AWS-native, serverless K8s / Service Mesh

💡 Which gateway to choose?

.NET team, want full control: YARP — runs in-process, zero network hop, customize freely with C# middleware. Multi-language team with plugin needs: Kong — rich plugin ecosystem, strong admin API. Full AWS, want zero ops: AWS API Gateway — managed, auto-scale, Lambda integration. Kubernetes/Service Mesh: Envoy — the standard data plane for Istio with WASM extensibility.

5. Authentication at the Gateway Layer

One of the biggest benefits of an API Gateway is centralizing authentication. Instead of every service validating JWTs itself, the gateway validates once and forwards claims (user ID, roles) to downstream services via headers.

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant IDP as Identity Provider
    participant SVC as Backend Service

    C->>GW: Request + Bearer Token
    GW->>IDP: Validate JWT (cached JWKS)
    IDP-->>GW: Token Valid + Claims
    GW->>GW: Check Authorization Policy
    GW->>SVC: Forward Request + X-User-Id + X-Roles
    SVC-->>GW: Response
    GW-->>C: Response

    Note over GW: Gateway caches JWKS keys
to reduce round-trips to the IDP
Figure 4: Authentication flow — validate the token at the gateway, forward claims downstream
// YARP: Forward user claims to the backend service via header transforms
{
  "Routes": {
    "secured-route": {
      "ClusterId": "backend",
      "AuthorizationPolicy": "authenticated-users",
      "Match": { "Path": "/api/secure/{**catch-all}" },
      "Transforms": [
        { "RequestHeader": "X-User-Id", "Set": "{Claims:sub}" },
        { "RequestHeader": "X-User-Email", "Set": "{Claims:email}" },
        { "RequestHeader": "X-User-Roles", "Set": "{Claims:role}" },
        { "RequestHeaderRemove": "Authorization" }
      ]
    }
  }
}

⚠ Always strip the Authorization header when forwarding

After the gateway validates the token, strip the Authorization header before forwarding to the backend. Reasons: (1) the backend doesn't need to validate again, (2) you avoid token leakage if the backend logs request headers, (3) it reduces attack surface — if the backend is compromised, attackers can't extract bearer tokens from requests.

6. Rate Limiting — Protecting the Backend

6.1. Rate Limiting Algorithms

Algorithm How it works Pros Cons
Fixed Window Count requests in a fixed window (e.g., 100 req/min) Simple, low memory Burst at window boundaries (199 req in 2 seconds)
Sliding Window Continuously sliding window, weighted count Smoother than fixed window More complex, needs timestamp storage
Token Bucket Tokens refill steadily; each request consumes one token Allows short bursts, smooth overall Requires tuning bucket size + refill rate
Concurrency Limiter Limits concurrent (parallel) requests Protects against slow-request attacks Doesn't cap overall throughput
// .NET 10: Multi-tier rate limiting in YARP
builder.Services.AddRateLimiter(options =>
{
    // Global: 1,000 req/min for the whole gateway
    options.GlobalLimiter = PartitionedRateLimiter
        .Create<HttpContext, string>(context =>
            RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: "global",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 1000,
                    Window = TimeSpan.FromMinutes(1)
                }));

    // Per-user: 100 req/min by user ID
    options.AddPolicy("per-user", context =>
        RateLimitPartition.GetTokenBucketLimiter(
            partitionKey: context.User?.FindFirst("sub")?.Value
                          ?? context.Connection.RemoteIpAddress?.ToString()
                          ?? "anonymous",
            factory: _ => new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 100,
                QueueLimit = 10
            }));

    // Strict: for sensitive endpoints (login, password reset)
    options.AddFixedWindowLimiter("strict", opt =>
    {
        opt.PermitLimit = 5;
        opt.Window = TimeSpan.FromMinutes(15);
    });

    options.OnRejected = async (context, ct) =>
    {
        context.HttpContext.Response.StatusCode = 429;
        context.HttpContext.Response.Headers
            .RetryAfter = "60";
        await context.HttpContext.Response.WriteAsJsonAsync(
            new { error = "Too many requests. Retry after 60 seconds." }, ct);
    };
});

7. Load Balancing and Circuit Breaker

7.1. Load Balancing Strategies in YARP

YARP supports multiple load balancing algorithms configurable at the cluster level:

Policy Description Use case
RoundRobin Rotate through each destination Instances with uniform capacity
Random Pick a destination at random Simple, stateless
LeastRequests Pick the destination with the fewest active requests Uneven instances, varying request durations
PowerOfTwoChoices Randomly pick 2, choose the one with fewer requests Good balance between randomness and load awareness
FirstAlphabetical Always pick the alphabetically first destination Primary-secondary failover

7.2. Circuit Breaker at the Gateway

When a backend service fails continuously, the gateway shouldn't keep sending it requests — that causes cascading failure. The Circuit Breaker pattern solves this:

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure threshold reached
    Open --> HalfOpen : Timeout expires
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails

    state Closed {
        [*] --> Monitoring
        Monitoring --> Monitoring : Request OK (reset counter)
        Monitoring --> CountFailure : Request Failed
        CountFailure --> Monitoring : Below threshold
    }
Figure 5: Circuit Breaker state machine — Closed → Open → Half-Open
// Combine YARP with Polly for Circuit Breaker
builder.Services.AddHttpClient("yarp-forwarder")
    .AddResilienceHandler("gateway-resilience", pipeline =>
    {
        pipeline.AddCircuitBreaker(new()
        {
            SamplingDuration = TimeSpan.FromSeconds(30),
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            BreakDuration = TimeSpan.FromSeconds(15)
        });

        pipeline.AddTimeout(TimeSpan.FromSeconds(10));

        pipeline.AddRetry(new()
        {
            MaxRetryAttempts = 2,
            Delay = TimeSpan.FromMilliseconds(500),
            BackoffType = DelayBackoffType.Exponential,
            ShouldHandle = new PredicateBuilder()
                .Handle<HttpRequestException>()
                .Handle<TimeoutRejectedException>()
        });
    });

8. Backend-for-Frontend (BFF) Pattern — Implementation

The BFF pattern provides a dedicated gateway per client type, allowing response format, aggregation logic, and caching strategy to be tailored to each platform's needs.

graph TB
    subgraph "Client Layer"
        WEB[Vue.js SPA]
        MOB[React Native App]
        IOT[IoT Dashboard]
    end

    subgraph "BFF Layer"
        WBFF[Web BFF
Full response, SSR support] MBFF[Mobile BFF
Compact response, pagination] IBFF[IoT BFF
Minimal payload, batch writes] end subgraph "Domain Services" USER[User Service] ORDER[Order Service] PRODUCT[Product Service] ANALYTICS[Analytics Service] end WEB --> WBFF MOB --> MBFF IOT --> IBFF WBFF --> USER WBFF --> ORDER WBFF --> PRODUCT WBFF --> ANALYTICS MBFF --> USER MBFF --> ORDER MBFF --> PRODUCT IBFF --> ANALYTICS style WBFF fill:#e94560,stroke:#fff,color:#fff style MBFF fill:#e94560,stroke:#fff,color:#fff style IBFF fill:#e94560,stroke:#fff,color:#fff style USER fill:#2c3e50,stroke:#fff,color:#fff style ORDER fill:#2c3e50,stroke:#fff,color:#fff style PRODUCT fill:#2c3e50,stroke:#fff,color:#fff style ANALYTICS fill:#2c3e50,stroke:#fff,color:#fff
Figure 6: BFF Pattern — each client type gets its own gateway with a customized response shape
// Web BFF: Aggregation endpoint for the Dashboard page
app.MapGet("/bff/dashboard", async (
    IUserService userService,
    IOrderService orderService,
    IProductService productService,
    HttpContext context) =>
{
    var userId = context.User.FindFirst("sub")!.Value;

    // Call 3 services in parallel
    var profileTask = userService.GetProfileAsync(userId);
    var ordersTask = orderService.GetRecentAsync(userId, limit: 10);
    var recommendationsTask = productService
        .GetRecommendationsAsync(userId, limit: 8);

    await Task.WhenAll(profileTask, ordersTask, recommendationsTask);

    return Results.Ok(new
    {
        Profile = profileTask.Result,
        RecentOrders = ordersTask.Result,
        Recommendations = recommendationsTask.Result,
        ServerTime = DateTime.UtcNow
    });
}).RequireAuthorization();

// Mobile BFF: Compact response, only essential fields
app.MapGet("/bff/mobile/dashboard", async (
    IUserService userService,
    IOrderService orderService,
    HttpContext context) =>
{
    var userId = context.User.FindFirst("sub")!.Value;

    var profileTask = userService.GetProfileAsync(userId);
    var ordersTask = orderService.GetRecentAsync(userId, limit: 5);

    await Task.WhenAll(profileTask, ordersTask);

    var profile = profileTask.Result;
    return Results.Ok(new
    {
        DisplayName = profile.DisplayName,
        AvatarUrl = profile.AvatarUrl,
        OrderCount = ordersTask.Result.Count,
        LastOrderStatus = ordersTask.Result.FirstOrDefault()?.Status
    });
}).RequireAuthorization();

9. Observability — Monitoring the API Gateway

The API Gateway is the best place for observability since all traffic passes through it. Track the 4 key metrics (RED + Saturation):

Rate Requests per second (throughput)
Errors Error rate (4xx/5xx %)
Duration Latency percentiles (p50, p95, p99)
Saturation Connection pool, queue depth
// OpenTelemetry integration for YARP
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .AddMeter("Yarp.ReverseProxy")
               .AddOtlpExporter(opt =>
                   opt.Endpoint = new Uri("http://otel-collector:4317"));
    })
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .SetResourceBuilder(ResourceBuilder.CreateDefault()
                   .AddService("api-gateway"))
               .AddOtlpExporter();
    });

// Custom middleware: add a correlation ID
app.Use(async (context, next) =>
{
    if (!context.Request.Headers.ContainsKey("X-Correlation-Id"))
    {
        context.Request.Headers["X-Correlation-Id"] =
            Guid.NewGuid().ToString("N");
    }
    context.Response.Headers["X-Correlation-Id"] =
        context.Request.Headers["X-Correlation-Id"];

    await next();
});

10. Best Practices for Production

💡 10 API Gateway principles for production

  1. Stateless gateway: Don't store session state at the gateway — use JWTs or an external session store. Makes horizontal scaling easy.
  2. Timeout cascade: Gateway timeout must exceed backend timeout. Example: backend 5s → gateway 8s → client 15s. Avoid the gateway timing out before the backend responds.
  3. Only retry idempotent operations: Retry GET, PUT, DELETE only. DO NOT retry POST (can create duplicates). If you need POST retries, require the client to send an Idempotency-Key header.
  4. Rate limit before auth: Put rate limiting at the front of the pipeline to block abuse before spending CPU on JWT validation.
  5. Separate gateway health checks: Split /health/live (gateway alive) and /health/ready (gateway + backends ready). Kubernetes uses different liveness and readiness probes.
  6. Request/response size limits: Set a max body size (e.g., 10MB) to block payload abuse. Use streaming for file uploads instead of buffering the whole thing.
  7. Graceful shutdown: When restarting the gateway, drain active connections before shutting down. YARP supports the app.Lifetime.ApplicationStopping event.
  8. Configuration hot-reload: Routing changes shouldn't require a gateway restart. YARP supports IProxyConfigProvider to load config from DB, Consul, or etcd.
  9. Avoid the gateway monolith: Keep business logic out of the gateway. The gateway only handles cross-cutting concerns — routing, auth, rate limiting, transforms. Business logic belongs in domain services.
  10. Canary routing: Use header-based or weight-based routing to roll out new versions gradually. YARP supports HeaderRouteMatch to route by a custom header.

11. Full Architecture — The Gateway in a Production Stack

graph TB
    CDN[CDN / Cloudflare] --> LB[External Load Balancer]
    LB --> GW1[API Gateway Instance 1]
    LB --> GW2[API Gateway Instance 2]

    subgraph "Gateway Responsibilities"
        GW1 --> |"Auth, Rate Limit,
Route, Transform"| GW1 end GW1 --> SD[Service Discovery
Consul / K8s DNS] GW2 --> SD SD --> US1[User Service x3] SD --> OS1[Order Service x2] SD --> PS1[Product Service x4] GW1 --> OTEL[OpenTelemetry
Collector] GW2 --> OTEL OTEL --> GRAF[Grafana / Dashboard] style CDN fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style GW1 fill:#e94560,stroke:#fff,color:#fff style GW2 fill:#e94560,stroke:#fff,color:#fff style SD fill:#2c3e50,stroke:#fff,color:#fff style US1 fill:#2c3e50,stroke:#fff,color:#fff style OS1 fill:#2c3e50,stroke:#fff,color:#fff style PS1 fill:#2c3e50,stroke:#fff,color:#fff style OTEL fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style GRAF fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Figure 7: Production stack — CDN → Load Balancer → API Gateway → Service Discovery → Backend Services

Conclusion

The API Gateway is not an optional component in microservices — it's an essential infrastructure layer once a system exceeds 3-5 services. By centralizing routing, authentication, rate limiting, and observability in one place, the gateway lets teams focus on business logic instead of repeating cross-cutting concerns in every service.

For .NET teams, YARP stands out because it runs in-process on the ASP.NET Core pipeline — near-zero overhead and unlimited customization via C# middleware. For multi-language teams or those needing a ready-made plugin ecosystem, Kong remains the industry standard. If you're all-in on AWS and want zero ops, AWS API Gateway is a worthwhile managed choice.

Whatever you choose, remember the golden rule: the gateway handles only cross-cutting concerns. Business logic belongs to domain services.

References