API Gateway 2026 — Central Gateway Architecture for Microservices with YARP, Kong, and the BFF Pattern

Posted on: 4/18/2026 10:12:46 AM

Table of contents

1. What Is an API Gateway and Why Do Microservices Need One?
1. Why not call each service directly?
2. 1.1. What an API Gateway Handles
2. API Gateway Architecture — Design Patterns
3. YARP — An API Gateway on .NET 10
4. Popular API Gateways Compared — 2026
1. 💡 Which gateway to choose?
5. Authentication at the Gateway Layer
1. ⚠ Always strip the Authorization header when forwarding
6. Rate Limiting — Protecting the Backend
1. 6.1. Rate Limiting Algorithms
7. Load Balancing and Circuit Breaker
1. 7.1. Load Balancing Strategies in YARP
2. 7.2. Circuit Breaker at the Gateway
8. Backend-for-Frontend (BFF) Pattern — Implementation
9. Observability — Monitoring the API Gateway
10. Best Practices for Production
1. 💡 10 API Gateway principles for production
11. Full Architecture — The Gateway in a Production Stack
Conclusion
References

In microservices architecture, one of the biggest challenges is: how many services should a client connect to? When a system has 10, 50, or 200 services, letting clients call each service directly is both complex and a security/operations nightmare. The API Gateway is the answer — a single central gateway sitting between the client and the entire backend, handling routing, authentication, rate limiting, load balancing, and a long list of cross-cutting concerns.

This article dives deep into API Gateway architecture for 2026, from core design patterns and a real-world YARP implementation on .NET 10, to comparisons with Kong and AWS API Gateway, and the Backend-for-Frontend (BFF) pattern for production systems.

1. What Is an API Gateway and Why Do Microservices Need One?

An API Gateway is a reverse proxy that sits between clients (web, mobile, third-party) and backend services. Instead of the client knowing each service's address and protocol, every request goes through a single point — the gateway — and is then routed to the correct service behind it.

graph TB
    subgraph Clients
        WEB[Web App]
        MOB[Mobile App]
        EXT[3rd-party API]
    end

    GW[API Gateway]

    subgraph Backend Services
        US[User Service]
        OS[Order Service]
        PS[Product Service]
        NS[Notification Service]
    end

    WEB --> GW
    MOB --> GW
    EXT --> GW
    GW --> US
    GW --> OS
    GW --> PS
    GW --> NS

    style GW fill:#e94560,stroke:#fff,color:#fff
    style WEB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style MOB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style EXT fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style US fill:#2c3e50,stroke:#fff,color:#fff
    style OS fill:#2c3e50,stroke:#fff,color:#fff
    style PS fill:#2c3e50,stroke:#fff,color:#fff
    style NS fill:#2c3e50,stroke:#fff,color:#fff

Figure 1: API Gateway — the central point connecting clients to microservices

Why not call each service directly?

When clients call directly: (1) they must manage N different endpoints, (2) every service has to handle auth, rate limiting, and CORS itself, (3) internal changes (splitting/merging services) directly affect clients, (4) there's no central place to monitor traffic. The API Gateway solves all of these by centralizing cross-cutting concerns into a single layer.

1.1. What an API Gateway Handles

Routing & Path Rewriting

Map public URLs to internal service endpoints. For example: /api/orders/* → Orders Service, /api/users/* → User Service. Supports path prefix stripping and query string forwarding.

Authentication & Authorization

Validate the JWT/OAuth2 token once at the gateway and forward claims to downstream services. Eliminates the need for every service to validate tokens.

Rate Limiting & Throttling

Protect the backend from abuse using fixed window, sliding window, or token bucket algorithms. Apply per IP, user, API key, or a specific route.

Load Balancing

Distribute traffic across multiple instances: Round Robin, Least Connections, Weighted, or Consistent Hashing. Combine with health checks to remove failing instances.

Response Caching

Cache responses at the gateway layer for read-heavy endpoints, reducing backend load. Supports cache invalidation via headers or TTL.

Observability & Logging

Trace every request from client to backend with a correlation ID. Export metrics (latency, error rate, throughput) to Prometheus/OpenTelemetry.

2. API Gateway Architecture — Design Patterns

2.1. Single Gateway vs Gateway per Client

There are two main approaches to designing an API Gateway:

graph LR
    subgraph "Pattern 1: Single Gateway"
        C1[All Clients] --> SG[Shared Gateway]
        SG --> S1[Service A]
        SG --> S2[Service B]
    end

    subgraph "Pattern 2: BFF — Gateway per Client"
        WA[Web App] --> WG[Web Gateway]
        MA[Mobile App] --> MG[Mobile Gateway]
        PA[Partner API] --> PG[Partner Gateway]
        WG --> S3[Service A]
        MG --> S3
        PG --> S3
        WG --> S4[Service B]
        MG --> S4
        PG --> S4
    end

    style SG fill:#e94560,stroke:#fff,color:#fff
    style WG fill:#e94560,stroke:#fff,color:#fff
    style MG fill:#e94560,stroke:#fff,color:#fff
    style PG fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style S4 fill:#2c3e50,stroke:#fff,color:#fff

Figure 2: Single Gateway vs Backend-for-Frontend (BFF) Pattern

Single Gateway is simple and easy to manage, suitable for small to mid-sized systems. But when Web needs a different response than Mobile (fewer fields, different format), the single gateway becomes a bottleneck — every mobile change affects web and vice versa.

The BFF Pattern (Backend-for-Frontend) solves this: each client type gets its own gateway, customized to its response shape, aggregation logic, and caching strategy. Netflix, Spotify, and Shopify all use BFFs in production.

2.2. Request Pipeline in an API Gateway

A request passes through multiple middleware layers in a fixed order. That order is critical — a wrong arrangement can introduce security holes or unwanted behavior.

graph TB
    REQ[Incoming Request] --> CORS[CORS Middleware]
    CORS --> RL[Rate Limiting]
    RL --> AUTH[Authentication]
    AUTH --> AUTHZ[Authorization]
    AUTHZ --> CACHE[Response Cache Check]
    CACHE --> TRANSFORM[Request Transform]
    TRANSFORM --> ROUTE[Route Matching]
    ROUTE --> LB[Load Balancer]
    LB --> HEALTH[Health Check Filter]
    HEALTH --> PROXY[Proxy to Backend]
    PROXY --> RES_TRANSFORM[Response Transform]
    RES_TRANSFORM --> LOG[Logging & Metrics]
    LOG --> RES[Response to Client]

    style REQ fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style CORS fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style AUTH fill:#e94560,stroke:#fff,color:#fff
    style AUTHZ fill:#e94560,stroke:#fff,color:#fff
    style CACHE fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style ROUTE fill:#2c3e50,stroke:#fff,color:#fff
    style LB fill:#2c3e50,stroke:#fff,color:#fff
    style HEALTH fill:#2c3e50,stroke:#fff,color:#fff
    style PROXY fill:#2c3e50,stroke:#fff,color:#fff
    style RES_TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LOG fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RES fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

Figure 3: Request Pipeline — middleware order inside the API Gateway

⚠ Middleware order matters

Rate Limiting MUST come before Authentication. If reversed, attackers can spam requests with invalid tokens — the gateway still burns resources validating JWTs before rate limiting kicks in. Placing rate limiting first blocks abuse as early as possible without wasting CPU on crypto operations.

2.3. Gateway Aggregation Pattern

One of the most powerful API Gateway patterns is request aggregation — combining multiple backend calls into a single response for the client. Instead of a mobile app making 3 separate API calls (user profile, orders, recommendations), the gateway fans them out in parallel, merges the results, and returns a single response.

// Aggregation example: client calls GET /api/dashboard
// Internally, the gateway calls in parallel:
//   GET /users/123/profile
//   GET /orders?userId=123&limit=5
//   GET /recommendations?userId=123
// Merged into 1 response { profile, recentOrders, recommendations }

This pattern is especially useful for mobile apps (fewer HTTP round-trips on slow networks) and dashboard pages (reduces waterfall loading).

3. YARP — An API Gateway on .NET 10

YARP (Yet Another Reverse Proxy) is Microsoft's open-source library designed as a fully programmable reverse proxy on ASP.NET Core. Unlike Kong or AWS API Gateway — which are feature-rich products — YARP is a toolkit you use to build a gateway matching your exact requirements.

200M+ NuGet Downloads

<1ms Proxy Overhead

100% ASP.NET Core Pipeline

.NET 10 LTS Support

3.1. Basic YARP Configuration

YARP works entirely via configuration — no need to write routing logic by hand. Each route maps a URL pattern to a cluster (a group of backend instances).

// Program.cs — Minimal API with YARP
var builder = WebApplication.CreateBuilder(args);

// Add YARP services
builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

// Add Rate Limiting
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("api-limit", opt =>
    {
        opt.PermitLimit = 100;
        opt.Window = TimeSpan.FromMinutes(1);
        opt.QueueLimit = 10;
    });
});

// Add Authentication
builder.Services.AddAuthentication("Bearer")
    .AddJwtBearer(options =>
    {
        options.Authority = "https://auth.example.com";
        options.TokenValidationParameters = new()
        {
            ValidateIssuer = true,
            ValidateAudience = true,
            ValidAudience = "my-api"
        };
    });

var app = builder.Build();

app.UseRateLimiter();
app.UseAuthentication();
app.UseAuthorization();
app.MapReverseProxy();

app.Run();

// appsettings.json — YARP routing configuration
{
  "ReverseProxy": {
    "Routes": {
      "orders-route": {
        "ClusterId": "orders-cluster",
        "AuthorizationPolicy": "default",
        "RateLimiterPolicy": "api-limit",
        "Match": {
          "Path": "/api/orders/{**catch-all}"
        },
        "Transforms": [
          { "PathRemovePrefix": "/api/orders" }
        ]
      },
      "users-route": {
        "ClusterId": "users-cluster",
        "Match": {
          "Path": "/api/users/{**catch-all}"
        },
        "Transforms": [
          { "PathRemovePrefix": "/api/users" },
          { "RequestHeader": "X-Forwarded-Prefix", "Set": "/api/users" }
        ]
      }
    },
    "Clusters": {
      "orders-cluster": {
        "LoadBalancingPolicy": "RoundRobin",
        "HealthCheck": {
          "Active": {
            "Enabled": true,
            "Interval": "00:00:30",
            "Timeout": "00:00:10",
            "Path": "/health"
          }
        },
        "Destinations": {
          "orders-1": { "Address": "https://orders-1:5001" },
          "orders-2": { "Address": "https://orders-2:5002" }
        }
      },
      "users-cluster": {
        "Destinations": {
          "users-1": { "Address": "https://users:5003" }
        }
      }
    }
  }
}

3.2. Custom Middleware in the YARP Pipeline

YARP's biggest advantage over other gateways: full control over the ASP.NET Core middleware pipeline. You can write custom middleware and insert it anywhere in the pipeline.

// Custom middleware: API Key validation for partner routes
app.MapReverseProxy(proxyPipeline =>
{
    proxyPipeline.Use(async (context, next) =>
    {
        var route = context.GetReverseProxyFeature().Route;

        if (route.Config.RouteId.StartsWith("partner-"))
        {
            if (!context.Request.Headers
                .TryGetValue("X-API-Key", out var apiKey)
                || !await ValidateApiKey(apiKey!))
            {
                context.Response.StatusCode = 401;
                await context.Response.WriteAsJsonAsync(
                    new { error = "Invalid API key" });
                return;
            }
        }

        await next();
    });

    // Add passive health checks
    proxyPipeline.UsePassiveHealthChecks();
});

3.3. Health Checks — Active and Passive

YARP supports two kinds of health checks so traffic flows only to healthy instances:

Active Health Check

The gateway proactively calls each destination's /health endpoint on a fixed interval (30 seconds by default). If a destination doesn't respond or returns an error status code, it's marked unhealthy and removed from the load balancer rotation. When it recovers, it's automatically added back.

Passive Health Check

Detects failures at request time. When a destination returns errors (5xx, timeout), YARP tracks the failure rate. If it crosses a threshold (e.g., 3 errors in 60 seconds), the destination is temporarily removed from the pool. Works like a circuit breaker — no extra health endpoint calls required.

4. Popular API Gateways Compared — 2026

Criterion	YARP (.NET)	Kong (OSS)	AWS API Gateway	Envoy
Deployment model	Library (in-process)	Standalone / K8s	Managed service	Sidecar / Standalone
Language	C# / .NET	Lua + Nginx/Kong	N/A (managed)	C++ / WASM filters
Performance overhead	<1ms (in-process)	1-5ms (proxy hop)	5-29ms (managed)	~1ms (sidecar)
Customization	Full middleware pipeline	Plugin system (Lua/Go)	Lambda authorizers	WASM / Lua filters
Rate Limiting	Native .NET (since .NET 7)	Built-in plugin	Built-in (throttling)	Local/Global filters
Service Discovery	Config / Custom provider	DNS / Consul / K8s	VPC Link / ALB	xDS API (Istio)
Health Check	Active + Passive	Active (upstream)	Managed	Active + Passive + EDS
Cost	Free (OSS)	Free (OSS) / Enterprise	$3.50/million requests	Free (OSS)
Best fit	.NET team, full control	Multi-language, plugins	AWS-native, serverless	K8s / Service Mesh

💡 Which gateway to choose?

.NET team, want full control: YARP — runs in-process, zero network hop, customize freely with C# middleware. Multi-language team with plugin needs: Kong — rich plugin ecosystem, strong admin API. Full AWS, want zero ops: AWS API Gateway — managed, auto-scale, Lambda integration. Kubernetes/Service Mesh: Envoy — the standard data plane for Istio with WASM extensibility.

5. Authentication at the Gateway Layer

One of the biggest benefits of an API Gateway is centralizing authentication. Instead of every service validating JWTs itself, the gateway validates once and forwards claims (user ID, roles) to downstream services via headers.

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant IDP as Identity Provider
    participant SVC as Backend Service

    C->>GW: Request + Bearer Token
    GW->>IDP: Validate JWT (cached JWKS)
    IDP-->>GW: Token Valid + Claims
    GW->>GW: Check Authorization Policy
    GW->>SVC: Forward Request + X-User-Id + X-Roles
    SVC-->>GW: Response
    GW-->>C: Response

    Note over GW: Gateway caches JWKS keys
to reduce round-trips to the IDP

Figure 4: Authentication flow — validate the token at the gateway, forward claims downstream

// YARP: Forward user claims to the backend service via header transforms
{
  "Routes": {
    "secured-route": {
      "ClusterId": "backend",
      "AuthorizationPolicy": "authenticated-users",
      "Match": { "Path": "/api/secure/{**catch-all}" },
      "Transforms": [
        { "RequestHeader": "X-User-Id", "Set": "{Claims:sub}" },
        { "RequestHeader": "X-User-Email", "Set": "{Claims:email}" },
        { "RequestHeader": "X-User-Roles", "Set": "{Claims:role}" },
        { "RequestHeaderRemove": "Authorization" }
      ]
    }
  }
}

⚠ Always strip the Authorization header when forwarding

After the gateway validates the token, strip the Authorization header before forwarding to the backend. Reasons: (1) the backend doesn't need to validate again, (2) you avoid token leakage if the backend logs request headers, (3) it reduces attack surface — if the backend is compromised, attackers can't extract bearer tokens from requests.

6. Rate Limiting — Protecting the Backend

6.1. Rate Limiting Algorithms

Algorithm	How it works	Pros	Cons
Fixed Window	Count requests in a fixed window (e.g., 100 req/min)	Simple, low memory	Burst at window boundaries (199 req in 2 seconds)
Sliding Window	Continuously sliding window, weighted count	Smoother than fixed window	More complex, needs timestamp storage
Token Bucket	Tokens refill steadily; each request consumes one token	Allows short bursts, smooth overall	Requires tuning bucket size + refill rate
Concurrency Limiter	Limits concurrent (parallel) requests	Protects against slow-request attacks	Doesn't cap overall throughput

// .NET 10: Multi-tier rate limiting in YARP
builder.Services.AddRateLimiter(options =>
{
    // Global: 1,000 req/min for the whole gateway
    options.GlobalLimiter = PartitionedRateLimiter
        .Create<HttpContext, string>(context =>
            RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: "global",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 1000,
                    Window = TimeSpan.FromMinutes(1)
                }));

    // Per-user: 100 req/min by user ID
    options.AddPolicy("per-user", context =>
        RateLimitPartition.GetTokenBucketLimiter(
            partitionKey: context.User?.FindFirst("sub")?.Value
                          ?? context.Connection.RemoteIpAddress?.ToString()
                          ?? "anonymous",
            factory: _ => new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 100,
                QueueLimit = 10
            }));

    // Strict: for sensitive endpoints (login, password reset)
    options.AddFixedWindowLimiter("strict", opt =>
    {
        opt.PermitLimit = 5;
        opt.Window = TimeSpan.FromMinutes(15);
    });

    options.OnRejected = async (context, ct) =>
    {
        context.HttpContext.Response.StatusCode = 429;
        context.HttpContext.Response.Headers
            .RetryAfter = "60";
        await context.HttpContext.Response.WriteAsJsonAsync(
            new { error = "Too many requests. Retry after 60 seconds." }, ct);
    };
});

7. Load Balancing and Circuit Breaker

7.1. Load Balancing Strategies in YARP

YARP supports multiple load balancing algorithms configurable at the cluster level:

Policy	Description	Use case
RoundRobin	Rotate through each destination	Instances with uniform capacity
Random	Pick a destination at random	Simple, stateless
LeastRequests	Pick the destination with the fewest active requests	Uneven instances, varying request durations
PowerOfTwoChoices	Randomly pick 2, choose the one with fewer requests	Good balance between randomness and load awareness
FirstAlphabetical	Always pick the alphabetically first destination	Primary-secondary failover

7.2. Circuit Breaker at the Gateway

When a backend service fails continuously, the gateway shouldn't keep sending it requests — that causes cascading failure. The Circuit Breaker pattern solves this:

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure threshold reached
    Open --> HalfOpen : Timeout expires
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails

    state Closed {
        [*] --> Monitoring
        Monitoring --> Monitoring : Request OK (reset counter)
        Monitoring --> CountFailure : Request Failed
        CountFailure --> Monitoring : Below threshold
    }

Figure 5: Circuit Breaker state machine — Closed → Open → Half-Open

// Combine YARP with Polly for Circuit Breaker
builder.Services.AddHttpClient("yarp-forwarder")
    .AddResilienceHandler("gateway-resilience", pipeline =>
    {
        pipeline.AddCircuitBreaker(new()
        {
            SamplingDuration = TimeSpan.FromSeconds(30),
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            BreakDuration = TimeSpan.FromSeconds(15)
        });

        pipeline.AddTimeout(TimeSpan.FromSeconds(10));

        pipeline.AddRetry(new()
        {
            MaxRetryAttempts = 2,
            Delay = TimeSpan.FromMilliseconds(500),
            BackoffType = DelayBackoffType.Exponential,
            ShouldHandle = new PredicateBuilder()
                .Handle<HttpRequestException>()
                .Handle<TimeoutRejectedException>()
        });
    });

8. Backend-for-Frontend (BFF) Pattern — Implementation

The BFF pattern provides a dedicated gateway per client type, allowing response format, aggregation logic, and caching strategy to be tailored to each platform's needs.

graph TB
    subgraph "Client Layer"
        WEB[Vue.js SPA]
        MOB[React Native App]
        IOT[IoT Dashboard]
    end

    subgraph "BFF Layer"
        WBFF[Web BFF
Full response, SSR support]
        MBFF[Mobile BFF
Compact response, pagination]
        IBFF[IoT BFF
Minimal payload, batch writes]
    end

    subgraph "Domain Services"
        USER[User Service]
        ORDER[Order Service]
        PRODUCT[Product Service]
        ANALYTICS[Analytics Service]
    end

    WEB --> WBFF
    MOB --> MBFF
    IOT --> IBFF

    WBFF --> USER
    WBFF --> ORDER
    WBFF --> PRODUCT
    WBFF --> ANALYTICS

    MBFF --> USER
    MBFF --> ORDER
    MBFF --> PRODUCT

    IBFF --> ANALYTICS

    style WBFF fill:#e94560,stroke:#fff,color:#fff
    style MBFF fill:#e94560,stroke:#fff,color:#fff
    style IBFF fill:#e94560,stroke:#fff,color:#fff
    style USER fill:#2c3e50,stroke:#fff,color:#fff
    style ORDER fill:#2c3e50,stroke:#fff,color:#fff
    style PRODUCT fill:#2c3e50,stroke:#fff,color:#fff
    style ANALYTICS fill:#2c3e50,stroke:#fff,color:#fff

Figure 6: BFF Pattern — each client type gets its own gateway with a customized response shape

// Web BFF: Aggregation endpoint for the Dashboard page
app.MapGet("/bff/dashboard", async (
    IUserService userService,
    IOrderService orderService,
    IProductService productService,
    HttpContext context) =>
{
    var userId = context.User.FindFirst("sub")!.Value;

    // Call 3 services in parallel
    var profileTask = userService.GetProfileAsync(userId);
    var ordersTask = orderService.GetRecentAsync(userId, limit: 10);
    var recommendationsTask = productService
        .GetRecommendationsAsync(userId, limit: 8);

    await Task.WhenAll(profileTask, ordersTask, recommendationsTask);

    return Results.Ok(new
    {
        Profile = profileTask.Result,
        RecentOrders = ordersTask.Result,
        Recommendations = recommendationsTask.Result,
        ServerTime = DateTime.UtcNow
    });
}).RequireAuthorization();

// Mobile BFF: Compact response, only essential fields
app.MapGet("/bff/mobile/dashboard", async (
    IUserService userService,
    IOrderService orderService,
    HttpContext context) =>
{
    var userId = context.User.FindFirst("sub")!.Value;

    var profileTask = userService.GetProfileAsync(userId);
    var ordersTask = orderService.GetRecentAsync(userId, limit: 5);

    await Task.WhenAll(profileTask, ordersTask);

    var profile = profileTask.Result;
    return Results.Ok(new
    {
        DisplayName = profile.DisplayName,
        AvatarUrl = profile.AvatarUrl,
        OrderCount = ordersTask.Result.Count,
        LastOrderStatus = ordersTask.Result.FirstOrDefault()?.Status
    });
}).RequireAuthorization();

9. Observability — Monitoring the API Gateway

The API Gateway is the best place for observability since all traffic passes through it. Track the 4 key metrics (RED + Saturation):

Rate Requests per second (throughput)

Errors Error rate (4xx/5xx %)

Duration Latency percentiles (p50, p95, p99)

Saturation Connection pool, queue depth

// OpenTelemetry integration for YARP
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .AddMeter("Yarp.ReverseProxy")
               .AddOtlpExporter(opt =>
                   opt.Endpoint = new Uri("http://otel-collector:4317"));
    })
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .SetResourceBuilder(ResourceBuilder.CreateDefault()
                   .AddService("api-gateway"))
               .AddOtlpExporter();
    });

// Custom middleware: add a correlation ID
app.Use(async (context, next) =>
{
    if (!context.Request.Headers.ContainsKey("X-Correlation-Id"))
    {
        context.Request.Headers["X-Correlation-Id"] =
            Guid.NewGuid().ToString("N");
    }
    context.Response.Headers["X-Correlation-Id"] =
        context.Request.Headers["X-Correlation-Id"];

    await next();
});

10. Best Practices for Production

💡 10 API Gateway principles for production

Stateless gateway: Don't store session state at the gateway — use JWTs or an external session store. Makes horizontal scaling easy.
Timeout cascade: Gateway timeout must exceed backend timeout. Example: backend 5s → gateway 8s → client 15s. Avoid the gateway timing out before the backend responds.
Only retry idempotent operations: Retry GET, PUT, DELETE only. DO NOT retry POST (can create duplicates). If you need POST retries, require the client to send an Idempotency-Key header.
Rate limit before auth: Put rate limiting at the front of the pipeline to block abuse before spending CPU on JWT validation.
Separate gateway health checks: Split /health/live (gateway alive) and /health/ready (gateway + backends ready). Kubernetes uses different liveness and readiness probes.
Request/response size limits: Set a max body size (e.g., 10MB) to block payload abuse. Use streaming for file uploads instead of buffering the whole thing.
Graceful shutdown: When restarting the gateway, drain active connections before shutting down. YARP supports the app.Lifetime.ApplicationStopping event.
Configuration hot-reload: Routing changes shouldn't require a gateway restart. YARP supports IProxyConfigProvider to load config from DB, Consul, or etcd.
Avoid the gateway monolith: Keep business logic out of the gateway. The gateway only handles cross-cutting concerns — routing, auth, rate limiting, transforms. Business logic belongs in domain services.
Canary routing: Use header-based or weight-based routing to roll out new versions gradually. YARP supports HeaderRouteMatch to route by a custom header.

11. Full Architecture — The Gateway in a Production Stack

graph TB
    CDN[CDN / Cloudflare] --> LB[External Load Balancer]
    LB --> GW1[API Gateway Instance 1]
    LB --> GW2[API Gateway Instance 2]

    subgraph "Gateway Responsibilities"
        GW1 --> |"Auth, Rate Limit,
Route, Transform"| GW1
    end

    GW1 --> SD[Service Discovery
Consul / K8s DNS]
    GW2 --> SD

    SD --> US1[User Service x3]
    SD --> OS1[Order Service x2]
    SD --> PS1[Product Service x4]

    GW1 --> OTEL[OpenTelemetry
Collector]
    GW2 --> OTEL
    OTEL --> GRAF[Grafana / Dashboard]

    style CDN fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GW1 fill:#e94560,stroke:#fff,color:#fff
    style GW2 fill:#e94560,stroke:#fff,color:#fff
    style SD fill:#2c3e50,stroke:#fff,color:#fff
    style US1 fill:#2c3e50,stroke:#fff,color:#fff
    style OS1 fill:#2c3e50,stroke:#fff,color:#fff
    style PS1 fill:#2c3e50,stroke:#fff,color:#fff
    style OTEL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GRAF fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 7: Production stack — CDN → Load Balancer → API Gateway → Service Discovery → Backend Services

Conclusion

The API Gateway is not an optional component in microservices — it's an essential infrastructure layer once a system exceeds 3-5 services. By centralizing routing, authentication, rate limiting, and observability in one place, the gateway lets teams focus on business logic instead of repeating cross-cutting concerns in every service.

For .NET teams, YARP stands out because it runs in-process on the ASP.NET Core pipeline — near-zero overhead and unlimited customization via C# middleware. For multi-language teams or those needing a ready-made plugin ecosystem, Kong remains the industry standard. If you're all-in on AWS and want zero ops, AWS API Gateway is a worthwhile managed choice.

Whatever you choose, remember the golden rule: the gateway handles only cross-cutting concerns. Business logic belongs to domain services.

References

#API Gateway #YARP #.NET 10 #Microservices #Kong #Reverse Proxy #Rate Limiting #Load Balancing #BFF Pattern #system design

# API Gateway 2026 — Central Gateway Architecture for Microservices with YARP, Kong, and the BFF Pattern

In microservices architecture, one of the biggest challenges is: **how many services should a client connect to?** When a system has 10, 50, or 200 services, letting clients call each service directly is both complex and a security/operations nightmare. The API Gateway is the answer — a single central gateway sitting between the client and the entire backend, handling routing, authentication, rate limiting, load balancing, and a long list of cross-cutting concerns.

## 1. What Is an API Gateway and Why Do Microservices Need One?

```
graph TB
    subgraph Clients
        WEB[Web App]
        MOB[Mobile App]
        EXT[3rd-party API]
    end

GW[API Gateway]

subgraph Backend Services
        US[User Service]
        OS[Order Service]
        PS[Product Service]
        NS[Notification Service]
    end

WEB --> GW
    MOB --> GW
    EXT --> GW
    GW --> US
    GW --> OS
    GW --> PS
    GW --> NS

style GW fill:#e94560,stroke:#fff,color:#fff
    style WEB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style MOB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style EXT fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style US fill:#2c3e50,stroke:#fff,color:#fff
    style OS fill:#2c3e50,stroke:#fff,color:#fff
    style PS fill:#2c3e50,stroke:#fff,color:#fff
    style NS fill:#2c3e50,stroke:#fff,color:#fff

```

Figure 1: API Gateway — the central point connecting clients to microservices

#### Why not call each service directly?

### 1.1. What an API Gateway Handles

#### Routing & Path Rewriting

Map public URLs to internal service endpoints. For example: `/api/orders/*` → Orders Service, `/api/users/*` → User Service. Supports path prefix stripping and query string forwarding.

#### Authentication & Authorization

Validate the JWT/OAuth2 token once at the gateway and forward claims to downstream services. Eliminates the need for every service to validate tokens.

#### Rate Limiting & Throttling

Protect the backend from abuse using fixed window, sliding window, or token bucket algorithms. Apply per IP, user, API key, or a specific route.

#### Load Balancing

Distribute traffic across multiple instances: Round Robin, Least Connections, Weighted, or Consistent Hashing. Combine with health checks to remove failing instances.

#### Response Caching

Cache responses at the gateway layer for read-heavy endpoints, reducing backend load. Supports cache invalidation via headers or TTL.

#### Observability & Logging

Trace every request from client to backend with a correlation ID. Export metrics (latency, error rate, throughput) to Prometheus/OpenTelemetry.

## 2. API Gateway Architecture — Design Patterns

### 2.1. Single Gateway vs Gateway per Client

There are two main approaches to designing an API Gateway:

```
graph LR
    subgraph "Pattern 1: Single Gateway"
        C1[All Clients] --> SG[Shared Gateway]
        SG --> S1[Service A]
        SG --> S2[Service B]
    end

subgraph "Pattern 2: BFF — Gateway per Client"
        WA[Web App] --> WG[Web Gateway]
        MA[Mobile App] --> MG[Mobile Gateway]
        PA[Partner API] --> PG[Partner Gateway]
        WG --> S3[Service A]
        MG --> S3
        PG --> S3
        WG --> S4[Service B]
        MG --> S4
        PG --> S4
    end

style SG fill:#e94560,stroke:#fff,color:#fff
    style WG fill:#e94560,stroke:#fff,color:#fff
    style MG fill:#e94560,stroke:#fff,color:#fff
    style PG fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style S4 fill:#2c3e50,stroke:#fff,color:#fff

```

Figure 2: Single Gateway vs Backend-for-Frontend (BFF) Pattern

**Single Gateway** is simple and easy to manage, suitable for small to mid-sized systems. But when Web needs a different response than Mobile (fewer fields, different format), the single gateway becomes a bottleneck — every mobile change affects web and vice versa.

The **BFF Pattern (Backend-for-Frontend)** solves this: each client type gets its own gateway, customized to its response shape, aggregation logic, and caching strategy. Netflix, Spotify, and Shopify all use BFFs in production.

### 2.2. Request Pipeline in an API Gateway

A request passes through multiple middleware layers in a fixed order. That order is critical — a wrong arrangement can introduce security holes or unwanted behavior.

```
graph TB
    REQ[Incoming Request] --> CORS[CORS Middleware]
    CORS --> RL[Rate Limiting]
    RL --> AUTH[Authentication]
    AUTH --> AUTHZ[Authorization]
    AUTHZ --> CACHE[Response Cache Check]
    CACHE --> TRANSFORM[Request Transform]
    TRANSFORM --> ROUTE[Route Matching]
    ROUTE --> LB[Load Balancer]
    LB --> HEALTH[Health Check Filter]
    HEALTH --> PROXY[Proxy to Backend]
    PROXY --> RES_TRANSFORM[Response Transform]
    RES_TRANSFORM --> LOG[Logging & Metrics]
    LOG --> RES[Response to Client]

style REQ fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style CORS fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style AUTH fill:#e94560,stroke:#fff,color:#fff
    style AUTHZ fill:#e94560,stroke:#fff,color:#fff
    style CACHE fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style ROUTE fill:#2c3e50,stroke:#fff,color:#fff
    style LB fill:#2c3e50,stroke:#fff,color:#fff
    style HEALTH fill:#2c3e50,stroke:#fff,color:#fff
    style PROXY fill:#2c3e50,stroke:#fff,color:#fff
    style RES_TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LOG fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RES fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50

```

Figure 3: Request Pipeline — middleware order inside the API Gateway

#### ⚠ Middleware order matters

**Rate Limiting MUST come before Authentication.** If reversed, attackers can spam requests with invalid tokens — the gateway still burns resources validating JWTs before rate limiting kicks in. Placing rate limiting first blocks abuse as early as possible without wasting CPU on crypto operations.

### 2.3. Gateway Aggregation Pattern

One of the most powerful API Gateway patterns is **request aggregation** — combining multiple backend calls into a single response for the client. Instead of a mobile app making 3 separate API calls (user profile, orders, recommendations), the gateway fans them out in parallel, merges the results, and returns a single response.

```
// Aggregation example: client calls GET /api/dashboard
// Internally, the gateway calls in parallel:
//   GET /users/123/profile
//   GET /orders?userId=123&limit=5
//   GET /recommendations?userId=123
// Merged into 1 response { profile, recentOrders, recommendations }
```
This pattern is especially useful for mobile apps (fewer HTTP round-trips on slow networks) and dashboard pages (reduces waterfall loading).

## 3. YARP — An API Gateway on .NET 10

YARP (Yet Another Reverse Proxy) is Microsoft's open-source library designed as a fully programmable reverse proxy on ASP.NET Core. Unlike Kong or AWS API Gateway — which are feature-rich products — YARP is a *toolkit* you use to build a gateway matching your exact requirements.

200M+ NuGet Downloads

<1ms Proxy Overhead

100% ASP.NET Core Pipeline

.NET 10 LTS Support

### 3.1. Basic YARP Configuration

YARP works entirely via configuration — no need to write routing logic by hand. Each route maps a URL pattern to a cluster (a group of backend instances).

```
// Program.cs — Minimal API with YARP
var builder = WebApplication.CreateBuilder(args);

// Add YARP services
builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

// Add Rate Limiting
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("api-limit", opt =>
    {
        opt.PermitLimit = 100;
        opt.Window = TimeSpan.FromMinutes(1);
        opt.QueueLimit = 10;
    });
});

// Add Authentication
builder.Services.AddAuthentication("Bearer")
    .AddJwtBearer(options =>
    {
        options.Authority = "https://auth.example.com";
        options.TokenValidationParameters = new()
        {
            ValidateIssuer = true,
            ValidateAudience = true,
            ValidAudience = "my-api"
        };
    });

var app = builder.Build();

app.UseRateLimiter();
app.UseAuthentication();
app.UseAuthorization();
app.MapReverseProxy();

app.Run();
```

```
// appsettings.json — YARP routing configuration
{
  "ReverseProxy": {
    "Routes": {
      "orders-route": {
        "ClusterId": "orders-cluster",
        "AuthorizationPolicy": "default",
        "RateLimiterPolicy": "api-limit",
        "Match": {
          "Path": "/api/orders/{**catch-all}"
        },
        "Transforms": [
          { "PathRemovePrefix": "/api/orders" }
        ]
      },
      "users-route": {
        "ClusterId": "users-cluster",
        "Match": {
          "Path": "/api/users/{**catch-all}"
        },
        "Transforms": [
          { "PathRemovePrefix": "/api/users" },
          { "RequestHeader": "X-Forwarded-Prefix", "Set": "/api/users" }
        ]
      }
    },
    "Clusters": {
      "orders-cluster": {
        "LoadBalancingPolicy": "RoundRobin",
        "HealthCheck": {
          "Active": {
            "Enabled": true,
            "Interval": "00:00:30",
            "Timeout": "00:00:10",
            "Path": "/health"
          }
        },
        "Destinations": {
          "orders-1": { "Address": "https://orders-1:5001" },
          "orders-2": { "Address": "https://orders-2:5002" }
        }
      },
      "users-cluster": {
        "Destinations": {
          "users-1": { "Address": "https://users:5003" }
        }
      }
    }
  }
}
```

### 3.2. Custom Middleware in the YARP Pipeline

YARP's biggest advantage over other gateways: full control over the ASP.NET Core middleware pipeline. You can write custom middleware and insert it anywhere in the pipeline.

```
// Custom middleware: API Key validation for partner routes
app.MapReverseProxy(proxyPipeline =>
{
    proxyPipeline.Use(async (context, next) =>
    {
        var route = context.GetReverseProxyFeature().Route;

if (route.Config.RouteId.StartsWith("partner-"))
        {
            if (!context.Request.Headers
                .TryGetValue("X-API-Key", out var apiKey)
                || !await ValidateApiKey(apiKey!))
            {
                context.Response.StatusCode = 401;
                await context.Response.WriteAsJsonAsync(
                    new { error = "Invalid API key" });
                return;
            }
        }

await next();
    });

// Add passive health checks
    proxyPipeline.UsePassiveHealthChecks();
});
```

### 3.3. Health Checks — Active and Passive

YARP supports two kinds of health checks so traffic flows only to healthy instances:

#### Active Health Check

The gateway proactively calls each destination's `/health` endpoint on a fixed interval (30 seconds by default). If a destination doesn't respond or returns an error status code, it's marked *unhealthy* and removed from the load balancer rotation. When it recovers, it's automatically added back.

#### Passive Health Check

Detects failures *at request time*. When a destination returns errors (5xx, timeout), YARP tracks the failure rate. If it crosses a threshold (e.g., 3 errors in 60 seconds), the destination is temporarily removed from the pool. Works like a circuit breaker — no extra health endpoint calls required.

## 4. Popular API Gateways Compared — 2026

| Criterion | YARP (.NET) | Kong (OSS) | AWS API Gateway | Envoy |
| --- | --- | --- | --- | --- |
| **Deployment model** | Library (in-process) | Standalone / K8s | Managed service | Sidecar / Standalone |
| **Language** | C# / .NET | Lua + Nginx/Kong | N/A (managed) | C++ / WASM filters |
| **Performance overhead** | <1ms (in-process) | 1-5ms (proxy hop) | 5-29ms (managed) | ~1ms (sidecar) |
| **Customization** | Full middleware pipeline | Plugin system (Lua/Go) | Lambda authorizers | WASM / Lua filters |
| **Rate Limiting** | Native .NET (since .NET 7) | Built-in plugin | Built-in (throttling) | Local/Global filters |
| **Service Discovery** | Config / Custom provider | DNS / Consul / K8s | VPC Link / ALB | xDS API (Istio) |
| **Health Check** | Active + Passive | Active (upstream) | Managed | Active + Passive + EDS |
| **Cost** | Free (OSS) | Free (OSS) / Enterprise | $3.50/million requests | Free (OSS) |
| **Best fit** | .NET team, full control | Multi-language, plugins | AWS-native, serverless | K8s / Service Mesh |

#### 💡 Which gateway to choose?

**.NET team, want full control:** YARP — runs in-process, zero network hop, customize freely with C# middleware. **Multi-language team with plugin needs:** Kong — rich plugin ecosystem, strong admin API. **Full AWS, want zero ops:** AWS API Gateway — managed, auto-scale, Lambda integration. **Kubernetes/Service Mesh:** Envoy — the standard data plane for Istio with WASM extensibility.

## 5. Authentication at the Gateway Layer

```
sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant IDP as Identity Provider
    participant SVC as Backend Service

C->>GW: Request + Bearer Token
    GW->>IDP: Validate JWT (cached JWKS)
    IDP-->>GW: Token Valid + Claims
    GW->>GW: Check Authorization Policy
    GW->>SVC: Forward Request + X-User-Id + X-Roles
    SVC-->>GW: Response
    GW-->>C: Response

Note over GW: Gateway caches JWKS keys  
to reduce round-trips to the IDP

```

Figure 4: Authentication flow — validate the token at the gateway, forward claims downstream

```
// YARP: Forward user claims to the backend service via header transforms
{
  "Routes": {
    "secured-route": {
      "ClusterId": "backend",
      "AuthorizationPolicy": "authenticated-users",
      "Match": { "Path": "/api/secure/{**catch-all}" },
      "Transforms": [
        { "RequestHeader": "X-User-Id", "Set": "{Claims:sub}" },
        { "RequestHeader": "X-User-Email", "Set": "{Claims:email}" },
        { "RequestHeader": "X-User-Roles", "Set": "{Claims:role}" },
        { "RequestHeaderRemove": "Authorization" }
      ]
    }
  }
}
```

#### ⚠ Always strip the Authorization header when forwarding

After the gateway validates the token, strip the `Authorization` header before forwarding to the backend. Reasons: (1) the backend doesn't need to validate again, (2) you avoid token leakage if the backend logs request headers, (3) it reduces attack surface — if the backend is compromised, attackers can't extract bearer tokens from requests.

## 6. Rate Limiting — Protecting the Backend

### 6.1. Rate Limiting Algorithms

| Algorithm | How it works | Pros | Cons |
| --- | --- | --- | --- |
| **Fixed Window** | Count requests in a fixed window (e.g., 100 req/min) | Simple, low memory | Burst at window boundaries (199 req in 2 seconds) |
| **Sliding Window** | Continuously sliding window, weighted count | Smoother than fixed window | More complex, needs timestamp storage |
| **Token Bucket** | Tokens refill steadily; each request consumes one token | Allows short bursts, smooth overall | Requires tuning bucket size + refill rate |
| **Concurrency Limiter** | Limits concurrent (parallel) requests | Protects against slow-request attacks | Doesn't cap overall throughput |

```
// .NET 10: Multi-tier rate limiting in YARP
builder.Services.AddRateLimiter(options =>
{
    // Global: 1,000 req/min for the whole gateway
    options.GlobalLimiter = PartitionedRateLimiter
        .Create<HttpContext, string>(context =>
            RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: "global",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit = 1000,
                    Window = TimeSpan.FromMinutes(1)
                }));

// Per-user: 100 req/min by user ID
    options.AddPolicy("per-user", context =>
        RateLimitPartition.GetTokenBucketLimiter(
            partitionKey: context.User?.FindFirst("sub")?.Value
                          ?? context.Connection.RemoteIpAddress?.ToString()
                          ?? "anonymous",
            factory: _ => new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 100,
                QueueLimit = 10
            }));

// Strict: for sensitive endpoints (login, password reset)
    options.AddFixedWindowLimiter("strict", opt =>
    {
        opt.PermitLimit = 5;
        opt.Window = TimeSpan.FromMinutes(15);
    });

options.OnRejected = async (context, ct) =>
    {
        context.HttpContext.Response.StatusCode = 429;
        context.HttpContext.Response.Headers
            .RetryAfter = "60";
        await context.HttpContext.Response.WriteAsJsonAsync(
            new { error = "Too many requests. Retry after 60 seconds." }, ct);
    };
});
```

## 7. Load Balancing and Circuit Breaker

### 7.1. Load Balancing Strategies in YARP

YARP supports multiple load balancing algorithms configurable at the cluster level:

| Policy | Description | Use case |
| --- | --- | --- |
| **RoundRobin** | Rotate through each destination | Instances with uniform capacity |
| **Random** | Pick a destination at random | Simple, stateless |
| **LeastRequests** | Pick the destination with the fewest active requests | Uneven instances, varying request durations |
| **PowerOfTwoChoices** | Randomly pick 2, choose the one with fewer requests | Good balance between randomness and load awareness |
| **FirstAlphabetical** | Always pick the alphabetically first destination | Primary-secondary failover |

### 7.2. Circuit Breaker at the Gateway

When a backend service fails continuously, the gateway shouldn't keep sending it requests — that causes cascading failure. The Circuit Breaker pattern solves this:

```
stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure threshold reached
    Open --> HalfOpen : Timeout expires
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails

state Closed {
        [*] --> Monitoring
        Monitoring --> Monitoring : Request OK (reset counter)
        Monitoring --> CountFailure : Request Failed
        CountFailure --> Monitoring : Below threshold
    }

```

Figure 5: Circuit Breaker state machine — Closed → Open → Half-Open

```
// Combine YARP with Polly for Circuit Breaker
builder.Services.AddHttpClient("yarp-forwarder")
    .AddResilienceHandler("gateway-resilience", pipeline =>
    {
        pipeline.AddCircuitBreaker(new()
        {
            SamplingDuration = TimeSpan.FromSeconds(30),
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            BreakDuration = TimeSpan.FromSeconds(15)
        });

pipeline.AddTimeout(TimeSpan.FromSeconds(10));

pipeline.AddRetry(new()
        {
            MaxRetryAttempts = 2,
            Delay = TimeSpan.FromMilliseconds(500),
            BackoffType = DelayBackoffType.Exponential,
            ShouldHandle = new PredicateBuilder()
                .Handle<HttpRequestException>()
                .Handle<TimeoutRejectedException>()
        });
    });
```

## 8. Backend-for-Frontend (BFF) Pattern — Implementation

The BFF pattern provides a dedicated gateway per client type, allowing response format, aggregation logic, and caching strategy to be tailored to each platform's needs.

```
graph TB
    subgraph "Client Layer"
        WEB[Vue.js SPA]
        MOB[React Native App]
        IOT[IoT Dashboard]
    end

subgraph "BFF Layer"
        WBFF[Web BFF  
Full response, SSR support]
        MBFF[Mobile BFF  
Compact response, pagination]
        IBFF[IoT BFF  
Minimal payload, batch writes]
    end

subgraph "Domain Services"
        USER[User Service]
        ORDER[Order Service]
        PRODUCT[Product Service]
        ANALYTICS[Analytics Service]
    end

WEB --> WBFF
    MOB --> MBFF
    IOT --> IBFF

WBFF --> USER
    WBFF --> ORDER
    WBFF --> PRODUCT
    WBFF --> ANALYTICS

MBFF --> USER
    MBFF --> ORDER
    MBFF --> PRODUCT

IBFF --> ANALYTICS

style WBFF fill:#e94560,stroke:#fff,color:#fff
    style MBFF fill:#e94560,stroke:#fff,color:#fff
    style IBFF fill:#e94560,stroke:#fff,color:#fff
    style USER fill:#2c3e50,stroke:#fff,color:#fff
    style ORDER fill:#2c3e50,stroke:#fff,color:#fff
    style PRODUCT fill:#2c3e50,stroke:#fff,color:#fff
    style ANALYTICS fill:#2c3e50,stroke:#fff,color:#fff

```

Figure 6: BFF Pattern — each client type gets its own gateway with a customized response shape

```
// Web BFF: Aggregation endpoint for the Dashboard page
app.MapGet("/bff/dashboard", async (
    IUserService userService,
    IOrderService orderService,
    IProductService productService,
    HttpContext context) =>
{
    var userId = context.User.FindFirst("sub")!.Value;

// Call 3 services in parallel
    var profileTask = userService.GetProfileAsync(userId);
    var ordersTask = orderService.GetRecentAsync(userId, limit: 10);
    var recommendationsTask = productService
        .GetRecommendationsAsync(userId, limit: 8);

await Task.WhenAll(profileTask, ordersTask, recommendationsTask);

return Results.Ok(new
    {
        Profile = profileTask.Result,
        RecentOrders = ordersTask.Result,
        Recommendations = recommendationsTask.Result,
        ServerTime = DateTime.UtcNow
    });
}).RequireAuthorization();

// Mobile BFF: Compact response, only essential fields
app.MapGet("/bff/mobile/dashboard", async (
    IUserService userService,
    IOrderService orderService,
    HttpContext context) =>
{
    var userId = context.User.FindFirst("sub")!.Value;

var profileTask = userService.GetProfileAsync(userId);
    var ordersTask = orderService.GetRecentAsync(userId, limit: 5);

await Task.WhenAll(profileTask, ordersTask);

var profile = profileTask.Result;
    return Results.Ok(new
    {
        DisplayName = profile.DisplayName,
        AvatarUrl = profile.AvatarUrl,
        OrderCount = ordersTask.Result.Count,
        LastOrderStatus = ordersTask.Result.FirstOrDefault()?.Status
    });
}).RequireAuthorization();
```

## 9. Observability — Monitoring the API Gateway

The API Gateway is the best place for observability since all traffic passes through it. Track the 4 key metrics (RED + Saturation):

Rate Requests per second (throughput)

Errors Error rate (4xx/5xx %)

Duration Latency percentiles (p50, p95, p99)

Saturation Connection pool, queue depth

```
// OpenTelemetry integration for YARP
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .AddMeter("Yarp.ReverseProxy")
               .AddOtlpExporter(opt =>
                   opt.Endpoint = new Uri("http://otel-collector:4317"));
    })
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .SetResourceBuilder(ResourceBuilder.CreateDefault()
                   .AddService("api-gateway"))
               .AddOtlpExporter();
    });

// Custom middleware: add a correlation ID
app.Use(async (context, next) =>
{
    if (!context.Request.Headers.ContainsKey("X-Correlation-Id"))
    {
        context.Request.Headers["X-Correlation-Id"] =
            Guid.NewGuid().ToString("N");
    }
    context.Response.Headers["X-Correlation-Id"] =
        context.Request.Headers["X-Correlation-Id"];

await next();
});
```

## 10. Best Practices for Production

#### 💡 10 API Gateway principles for production

1. **Stateless gateway:** Don't store session state at the gateway — use JWTs or an external session store. Makes horizontal scaling easy.
2. **Timeout cascade:** Gateway timeout must exceed backend timeout. Example: backend 5s → gateway 8s → client 15s. Avoid the gateway timing out before the backend responds.
3. **Only retry idempotent operations:** Retry GET, PUT, DELETE only. DO NOT retry POST (can create duplicates). If you need POST retries, require the client to send an Idempotency-Key header.
4. **Rate limit before auth:** Put rate limiting at the front of the pipeline to block abuse before spending CPU on JWT validation.
5. **Separate gateway health checks:** Split `/health/live` (gateway alive) and `/health/ready` (gateway + backends ready). Kubernetes uses different liveness and readiness probes.
6. **Request/response size limits:** Set a max body size (e.g., 10MB) to block payload abuse. Use streaming for file uploads instead of buffering the whole thing.
7. **Graceful shutdown:** When restarting the gateway, drain active connections before shutting down. YARP supports the `app.Lifetime.ApplicationStopping` event.
8. **Configuration hot-reload:** Routing changes shouldn't require a gateway restart. YARP supports `IProxyConfigProvider` to load config from DB, Consul, or etcd.
9. **Avoid the gateway monolith:** Keep business logic out of the gateway. The gateway only handles cross-cutting concerns — routing, auth, rate limiting, transforms. Business logic belongs in domain services.
10. **Canary routing:** Use header-based or weight-based routing to roll out new versions gradually. YARP supports `HeaderRouteMatch` to route by a custom header.

## 11. Full Architecture — The Gateway in a Production Stack

```
graph TB
    CDN[CDN / Cloudflare] --> LB[External Load Balancer]
    LB --> GW1[API Gateway Instance 1]
    LB --> GW2[API Gateway Instance 2]

subgraph "Gateway Responsibilities"
        GW1 --> |"Auth, Rate Limit,  
Route, Transform"| GW1
    end

GW1 --> SD[Service Discovery  
Consul / K8s DNS]
    GW2 --> SD

SD --> US1[User Service x3]
    SD --> OS1[Order Service x2]
    SD --> PS1[Product Service x4]

GW1 --> OTEL[OpenTelemetry  
Collector]
    GW2 --> OTEL
    OTEL --> GRAF[Grafana / Dashboard]

style CDN fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GW1 fill:#e94560,stroke:#fff,color:#fff
    style GW2 fill:#e94560,stroke:#fff,color:#fff
    style SD fill:#2c3e50,stroke:#fff,color:#fff
    style US1 fill:#2c3e50,stroke:#fff,color:#fff
    style OS1 fill:#2c3e50,stroke:#fff,color:#fff
    style PS1 fill:#2c3e50,stroke:#fff,color:#fff
    style OTEL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GRAF fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```

Figure 7: Production stack — CDN → Load Balancer → API Gateway → Service Discovery → Backend Services

## Conclusion

The API Gateway is not an optional component in microservices — it's an **essential infrastructure layer** once a system exceeds 3-5 services. By centralizing routing, authentication, rate limiting, and observability in one place, the gateway lets teams focus on business logic instead of repeating cross-cutting concerns in every service.

Whatever you choose, remember the golden rule: **the gateway handles only cross-cutting concerns. Business logic belongs to domain services.**

## References

- [Microsoft Learn — Getting Started with YARP on .NET 10](https://learn.microsoft.com/en-us/aspnet/core/fundamentals/servers/yarp/getting-started?view=aspnetcore-10.0)
- [Milan Jovanović — Implementing an API Gateway with YARP](https://www.milanjovanovic.tech/blog/implementing-an-api-gateway-for-microservices-with-yarp)
- [NashTech — Build Full Sample with YARP API Gateway on .NET 10](https://blog.nashtechglobal.com/build-full-sample-with-yarp-api-gateway-on-net-10/)
- [Calmops — API Gateways: Kong, Envoy and Modern API Management 2026](https://calmops.com/software-engineering/api-gateways-kong-envoy-modern-api-management/)
- [Elysiate — API Gateway Authentication Patterns 2026: JWT, OAuth2, API Keys, mTLS](https://www.elysiate.com/blog/api-gateway-authentication-patterns-jwt-oauth)
- [GitHub — dotnet/yarp: A toolkit for developing high-performance HTTP reverse proxy applications](https://github.com/dotnet/yarp)
- [AntonDevTips — YARP as API Gateway in .NET: 7 Real-World Scenarios](https://antondevtips.com/blog/yarp-as-api-gateway-in-dotnet)

Microsoft Agent Framework 1.0 — Unified SDK for AI Agents on .NET 10

DynamoDB Single-Table Design — The Art of NoSQL Modeling for Large-Scale Systems

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.