API Gateway 2026 — Central Gateway Architecture for Microservices with YARP, Kong, and the BFF Pattern
Posted on: 4/18/2026 10:12:46 AM
Table of contents
- 1. What Is an API Gateway and Why Do Microservices Need One?
- 2. API Gateway Architecture — Design Patterns
- 3. YARP — An API Gateway on .NET 10
- 4. Popular API Gateways Compared — 2026
- 5. Authentication at the Gateway Layer
- 6. Rate Limiting — Protecting the Backend
- 7. Load Balancing and Circuit Breaker
- 8. Backend-for-Frontend (BFF) Pattern — Implementation
- 9. Observability — Monitoring the API Gateway
- 10. Best Practices for Production
- 11. Full Architecture — The Gateway in a Production Stack
- Conclusion
- References
In microservices architecture, one of the biggest challenges is: how many services should a client connect to? When a system has 10, 50, or 200 services, letting clients call each service directly is both complex and a security/operations nightmare. The API Gateway is the answer — a single central gateway sitting between the client and the entire backend, handling routing, authentication, rate limiting, load balancing, and a long list of cross-cutting concerns.
This article dives deep into API Gateway architecture for 2026, from core design patterns and a real-world YARP implementation on .NET 10, to comparisons with Kong and AWS API Gateway, and the Backend-for-Frontend (BFF) pattern for production systems.
1. What Is an API Gateway and Why Do Microservices Need One?
An API Gateway is a reverse proxy that sits between clients (web, mobile, third-party) and backend services. Instead of the client knowing each service's address and protocol, every request goes through a single point — the gateway — and is then routed to the correct service behind it.
graph TB
subgraph Clients
WEB[Web App]
MOB[Mobile App]
EXT[3rd-party API]
end
GW[API Gateway]
subgraph Backend Services
US[User Service]
OS[Order Service]
PS[Product Service]
NS[Notification Service]
end
WEB --> GW
MOB --> GW
EXT --> GW
GW --> US
GW --> OS
GW --> PS
GW --> NS
style GW fill:#e94560,stroke:#fff,color:#fff
style WEB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style MOB fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style EXT fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style US fill:#2c3e50,stroke:#fff,color:#fff
style OS fill:#2c3e50,stroke:#fff,color:#fff
style PS fill:#2c3e50,stroke:#fff,color:#fff
style NS fill:#2c3e50,stroke:#fff,color:#fff
Why not call each service directly?
When clients call directly: (1) they must manage N different endpoints, (2) every service has to handle auth, rate limiting, and CORS itself, (3) internal changes (splitting/merging services) directly affect clients, (4) there's no central place to monitor traffic. The API Gateway solves all of these by centralizing cross-cutting concerns into a single layer.
1.1. What an API Gateway Handles
Routing & Path Rewriting
Map public URLs to internal service endpoints. For example: /api/orders/* → Orders Service, /api/users/* → User Service. Supports path prefix stripping and query string forwarding.
Authentication & Authorization
Validate the JWT/OAuth2 token once at the gateway and forward claims to downstream services. Eliminates the need for every service to validate tokens.
Rate Limiting & Throttling
Protect the backend from abuse using fixed window, sliding window, or token bucket algorithms. Apply per IP, user, API key, or a specific route.
Load Balancing
Distribute traffic across multiple instances: Round Robin, Least Connections, Weighted, or Consistent Hashing. Combine with health checks to remove failing instances.
Response Caching
Cache responses at the gateway layer for read-heavy endpoints, reducing backend load. Supports cache invalidation via headers or TTL.
Observability & Logging
Trace every request from client to backend with a correlation ID. Export metrics (latency, error rate, throughput) to Prometheus/OpenTelemetry.
2. API Gateway Architecture — Design Patterns
2.1. Single Gateway vs Gateway per Client
There are two main approaches to designing an API Gateway:
graph LR
subgraph "Pattern 1: Single Gateway"
C1[All Clients] --> SG[Shared Gateway]
SG --> S1[Service A]
SG --> S2[Service B]
end
subgraph "Pattern 2: BFF — Gateway per Client"
WA[Web App] --> WG[Web Gateway]
MA[Mobile App] --> MG[Mobile Gateway]
PA[Partner API] --> PG[Partner Gateway]
WG --> S3[Service A]
MG --> S3
PG --> S3
WG --> S4[Service B]
MG --> S4
PG --> S4
end
style SG fill:#e94560,stroke:#fff,color:#fff
style WG fill:#e94560,stroke:#fff,color:#fff
style MG fill:#e94560,stroke:#fff,color:#fff
style PG fill:#e94560,stroke:#fff,color:#fff
style S1 fill:#2c3e50,stroke:#fff,color:#fff
style S2 fill:#2c3e50,stroke:#fff,color:#fff
style S3 fill:#2c3e50,stroke:#fff,color:#fff
style S4 fill:#2c3e50,stroke:#fff,color:#fff
Single Gateway is simple and easy to manage, suitable for small to mid-sized systems. But when Web needs a different response than Mobile (fewer fields, different format), the single gateway becomes a bottleneck — every mobile change affects web and vice versa.
The BFF Pattern (Backend-for-Frontend) solves this: each client type gets its own gateway, customized to its response shape, aggregation logic, and caching strategy. Netflix, Spotify, and Shopify all use BFFs in production.
2.2. Request Pipeline in an API Gateway
A request passes through multiple middleware layers in a fixed order. That order is critical — a wrong arrangement can introduce security holes or unwanted behavior.
graph TB
REQ[Incoming Request] --> CORS[CORS Middleware]
CORS --> RL[Rate Limiting]
RL --> AUTH[Authentication]
AUTH --> AUTHZ[Authorization]
AUTHZ --> CACHE[Response Cache Check]
CACHE --> TRANSFORM[Request Transform]
TRANSFORM --> ROUTE[Route Matching]
ROUTE --> LB[Load Balancer]
LB --> HEALTH[Health Check Filter]
HEALTH --> PROXY[Proxy to Backend]
PROXY --> RES_TRANSFORM[Response Transform]
RES_TRANSFORM --> LOG[Logging & Metrics]
LOG --> RES[Response to Client]
style REQ fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style CORS fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style RL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style AUTH fill:#e94560,stroke:#fff,color:#fff
style AUTHZ fill:#e94560,stroke:#fff,color:#fff
style CACHE fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style ROUTE fill:#2c3e50,stroke:#fff,color:#fff
style LB fill:#2c3e50,stroke:#fff,color:#fff
style HEALTH fill:#2c3e50,stroke:#fff,color:#fff
style PROXY fill:#2c3e50,stroke:#fff,color:#fff
style RES_TRANSFORM fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style LOG fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style RES fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
⚠ Middleware order matters
Rate Limiting MUST come before Authentication. If reversed, attackers can spam requests with invalid tokens — the gateway still burns resources validating JWTs before rate limiting kicks in. Placing rate limiting first blocks abuse as early as possible without wasting CPU on crypto operations.
2.3. Gateway Aggregation Pattern
One of the most powerful API Gateway patterns is request aggregation — combining multiple backend calls into a single response for the client. Instead of a mobile app making 3 separate API calls (user profile, orders, recommendations), the gateway fans them out in parallel, merges the results, and returns a single response.
// Aggregation example: client calls GET /api/dashboard
// Internally, the gateway calls in parallel:
// GET /users/123/profile
// GET /orders?userId=123&limit=5
// GET /recommendations?userId=123
// Merged into 1 response { profile, recentOrders, recommendations }
This pattern is especially useful for mobile apps (fewer HTTP round-trips on slow networks) and dashboard pages (reduces waterfall loading).
3. YARP — An API Gateway on .NET 10
YARP (Yet Another Reverse Proxy) is Microsoft's open-source library designed as a fully programmable reverse proxy on ASP.NET Core. Unlike Kong or AWS API Gateway — which are feature-rich products — YARP is a toolkit you use to build a gateway matching your exact requirements.
3.1. Basic YARP Configuration
YARP works entirely via configuration — no need to write routing logic by hand. Each route maps a URL pattern to a cluster (a group of backend instances).
// Program.cs — Minimal API with YARP
var builder = WebApplication.CreateBuilder(args);
// Add YARP services
builder.Services.AddReverseProxy()
.LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
// Add Rate Limiting
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter("api-limit", opt =>
{
opt.PermitLimit = 100;
opt.Window = TimeSpan.FromMinutes(1);
opt.QueueLimit = 10;
});
});
// Add Authentication
builder.Services.AddAuthentication("Bearer")
.AddJwtBearer(options =>
{
options.Authority = "https://auth.example.com";
options.TokenValidationParameters = new()
{
ValidateIssuer = true,
ValidateAudience = true,
ValidAudience = "my-api"
};
});
var app = builder.Build();
app.UseRateLimiter();
app.UseAuthentication();
app.UseAuthorization();
app.MapReverseProxy();
app.Run();
// appsettings.json — YARP routing configuration
{
"ReverseProxy": {
"Routes": {
"orders-route": {
"ClusterId": "orders-cluster",
"AuthorizationPolicy": "default",
"RateLimiterPolicy": "api-limit",
"Match": {
"Path": "/api/orders/{**catch-all}"
},
"Transforms": [
{ "PathRemovePrefix": "/api/orders" }
]
},
"users-route": {
"ClusterId": "users-cluster",
"Match": {
"Path": "/api/users/{**catch-all}"
},
"Transforms": [
{ "PathRemovePrefix": "/api/users" },
{ "RequestHeader": "X-Forwarded-Prefix", "Set": "/api/users" }
]
}
},
"Clusters": {
"orders-cluster": {
"LoadBalancingPolicy": "RoundRobin",
"HealthCheck": {
"Active": {
"Enabled": true,
"Interval": "00:00:30",
"Timeout": "00:00:10",
"Path": "/health"
}
},
"Destinations": {
"orders-1": { "Address": "https://orders-1:5001" },
"orders-2": { "Address": "https://orders-2:5002" }
}
},
"users-cluster": {
"Destinations": {
"users-1": { "Address": "https://users:5003" }
}
}
}
}
}
3.2. Custom Middleware in the YARP Pipeline
YARP's biggest advantage over other gateways: full control over the ASP.NET Core middleware pipeline. You can write custom middleware and insert it anywhere in the pipeline.
// Custom middleware: API Key validation for partner routes
app.MapReverseProxy(proxyPipeline =>
{
proxyPipeline.Use(async (context, next) =>
{
var route = context.GetReverseProxyFeature().Route;
if (route.Config.RouteId.StartsWith("partner-"))
{
if (!context.Request.Headers
.TryGetValue("X-API-Key", out var apiKey)
|| !await ValidateApiKey(apiKey!))
{
context.Response.StatusCode = 401;
await context.Response.WriteAsJsonAsync(
new { error = "Invalid API key" });
return;
}
}
await next();
});
// Add passive health checks
proxyPipeline.UsePassiveHealthChecks();
});
3.3. Health Checks — Active and Passive
YARP supports two kinds of health checks so traffic flows only to healthy instances:
Active Health Check
The gateway proactively calls each destination's /health endpoint on a fixed interval (30 seconds by default). If a destination doesn't respond or returns an error status code, it's marked unhealthy and removed from the load balancer rotation. When it recovers, it's automatically added back.
Passive Health Check
Detects failures at request time. When a destination returns errors (5xx, timeout), YARP tracks the failure rate. If it crosses a threshold (e.g., 3 errors in 60 seconds), the destination is temporarily removed from the pool. Works like a circuit breaker — no extra health endpoint calls required.
4. Popular API Gateways Compared — 2026
| Criterion | YARP (.NET) | Kong (OSS) | AWS API Gateway | Envoy |
|---|---|---|---|---|
| Deployment model | Library (in-process) | Standalone / K8s | Managed service | Sidecar / Standalone |
| Language | C# / .NET | Lua + Nginx/Kong | N/A (managed) | C++ / WASM filters |
| Performance overhead | <1ms (in-process) | 1-5ms (proxy hop) | 5-29ms (managed) | ~1ms (sidecar) |
| Customization | Full middleware pipeline | Plugin system (Lua/Go) | Lambda authorizers | WASM / Lua filters |
| Rate Limiting | Native .NET (since .NET 7) | Built-in plugin | Built-in (throttling) | Local/Global filters |
| Service Discovery | Config / Custom provider | DNS / Consul / K8s | VPC Link / ALB | xDS API (Istio) |
| Health Check | Active + Passive | Active (upstream) | Managed | Active + Passive + EDS |
| Cost | Free (OSS) | Free (OSS) / Enterprise | $3.50/million requests | Free (OSS) |
| Best fit | .NET team, full control | Multi-language, plugins | AWS-native, serverless | K8s / Service Mesh |
💡 Which gateway to choose?
.NET team, want full control: YARP — runs in-process, zero network hop, customize freely with C# middleware. Multi-language team with plugin needs: Kong — rich plugin ecosystem, strong admin API. Full AWS, want zero ops: AWS API Gateway — managed, auto-scale, Lambda integration. Kubernetes/Service Mesh: Envoy — the standard data plane for Istio with WASM extensibility.
5. Authentication at the Gateway Layer
One of the biggest benefits of an API Gateway is centralizing authentication. Instead of every service validating JWTs itself, the gateway validates once and forwards claims (user ID, roles) to downstream services via headers.
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant IDP as Identity Provider
participant SVC as Backend Service
C->>GW: Request + Bearer Token
GW->>IDP: Validate JWT (cached JWKS)
IDP-->>GW: Token Valid + Claims
GW->>GW: Check Authorization Policy
GW->>SVC: Forward Request + X-User-Id + X-Roles
SVC-->>GW: Response
GW-->>C: Response
Note over GW: Gateway caches JWKS keys
to reduce round-trips to the IDP
// YARP: Forward user claims to the backend service via header transforms
{
"Routes": {
"secured-route": {
"ClusterId": "backend",
"AuthorizationPolicy": "authenticated-users",
"Match": { "Path": "/api/secure/{**catch-all}" },
"Transforms": [
{ "RequestHeader": "X-User-Id", "Set": "{Claims:sub}" },
{ "RequestHeader": "X-User-Email", "Set": "{Claims:email}" },
{ "RequestHeader": "X-User-Roles", "Set": "{Claims:role}" },
{ "RequestHeaderRemove": "Authorization" }
]
}
}
}
⚠ Always strip the Authorization header when forwarding
After the gateway validates the token, strip the Authorization header before forwarding to the backend. Reasons: (1) the backend doesn't need to validate again, (2) you avoid token leakage if the backend logs request headers, (3) it reduces attack surface — if the backend is compromised, attackers can't extract bearer tokens from requests.
6. Rate Limiting — Protecting the Backend
6.1. Rate Limiting Algorithms
| Algorithm | How it works | Pros | Cons |
|---|---|---|---|
| Fixed Window | Count requests in a fixed window (e.g., 100 req/min) | Simple, low memory | Burst at window boundaries (199 req in 2 seconds) |
| Sliding Window | Continuously sliding window, weighted count | Smoother than fixed window | More complex, needs timestamp storage |
| Token Bucket | Tokens refill steadily; each request consumes one token | Allows short bursts, smooth overall | Requires tuning bucket size + refill rate |
| Concurrency Limiter | Limits concurrent (parallel) requests | Protects against slow-request attacks | Doesn't cap overall throughput |
// .NET 10: Multi-tier rate limiting in YARP
builder.Services.AddRateLimiter(options =>
{
// Global: 1,000 req/min for the whole gateway
options.GlobalLimiter = PartitionedRateLimiter
.Create<HttpContext, string>(context =>
RateLimitPartition.GetFixedWindowLimiter(
partitionKey: "global",
factory: _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 1000,
Window = TimeSpan.FromMinutes(1)
}));
// Per-user: 100 req/min by user ID
options.AddPolicy("per-user", context =>
RateLimitPartition.GetTokenBucketLimiter(
partitionKey: context.User?.FindFirst("sub")?.Value
?? context.Connection.RemoteIpAddress?.ToString()
?? "anonymous",
factory: _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 100,
QueueLimit = 10
}));
// Strict: for sensitive endpoints (login, password reset)
options.AddFixedWindowLimiter("strict", opt =>
{
opt.PermitLimit = 5;
opt.Window = TimeSpan.FromMinutes(15);
});
options.OnRejected = async (context, ct) =>
{
context.HttpContext.Response.StatusCode = 429;
context.HttpContext.Response.Headers
.RetryAfter = "60";
await context.HttpContext.Response.WriteAsJsonAsync(
new { error = "Too many requests. Retry after 60 seconds." }, ct);
};
});
7. Load Balancing and Circuit Breaker
7.1. Load Balancing Strategies in YARP
YARP supports multiple load balancing algorithms configurable at the cluster level:
| Policy | Description | Use case |
|---|---|---|
| RoundRobin | Rotate through each destination | Instances with uniform capacity |
| Random | Pick a destination at random | Simple, stateless |
| LeastRequests | Pick the destination with the fewest active requests | Uneven instances, varying request durations |
| PowerOfTwoChoices | Randomly pick 2, choose the one with fewer requests | Good balance between randomness and load awareness |
| FirstAlphabetical | Always pick the alphabetically first destination | Primary-secondary failover |
7.2. Circuit Breaker at the Gateway
When a backend service fails continuously, the gateway shouldn't keep sending it requests — that causes cascading failure. The Circuit Breaker pattern solves this:
stateDiagram-v2
[*] --> Closed
Closed --> Open : Failure threshold reached
Open --> HalfOpen : Timeout expires
HalfOpen --> Closed : Probe request succeeds
HalfOpen --> Open : Probe request fails
state Closed {
[*] --> Monitoring
Monitoring --> Monitoring : Request OK (reset counter)
Monitoring --> CountFailure : Request Failed
CountFailure --> Monitoring : Below threshold
}
// Combine YARP with Polly for Circuit Breaker
builder.Services.AddHttpClient("yarp-forwarder")
.AddResilienceHandler("gateway-resilience", pipeline =>
{
pipeline.AddCircuitBreaker(new()
{
SamplingDuration = TimeSpan.FromSeconds(30),
FailureRatio = 0.5,
MinimumThroughput = 10,
BreakDuration = TimeSpan.FromSeconds(15)
});
pipeline.AddTimeout(TimeSpan.FromSeconds(10));
pipeline.AddRetry(new()
{
MaxRetryAttempts = 2,
Delay = TimeSpan.FromMilliseconds(500),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<HttpRequestException>()
.Handle<TimeoutRejectedException>()
});
});
8. Backend-for-Frontend (BFF) Pattern — Implementation
The BFF pattern provides a dedicated gateway per client type, allowing response format, aggregation logic, and caching strategy to be tailored to each platform's needs.
graph TB
subgraph "Client Layer"
WEB[Vue.js SPA]
MOB[React Native App]
IOT[IoT Dashboard]
end
subgraph "BFF Layer"
WBFF[Web BFF
Full response, SSR support]
MBFF[Mobile BFF
Compact response, pagination]
IBFF[IoT BFF
Minimal payload, batch writes]
end
subgraph "Domain Services"
USER[User Service]
ORDER[Order Service]
PRODUCT[Product Service]
ANALYTICS[Analytics Service]
end
WEB --> WBFF
MOB --> MBFF
IOT --> IBFF
WBFF --> USER
WBFF --> ORDER
WBFF --> PRODUCT
WBFF --> ANALYTICS
MBFF --> USER
MBFF --> ORDER
MBFF --> PRODUCT
IBFF --> ANALYTICS
style WBFF fill:#e94560,stroke:#fff,color:#fff
style MBFF fill:#e94560,stroke:#fff,color:#fff
style IBFF fill:#e94560,stroke:#fff,color:#fff
style USER fill:#2c3e50,stroke:#fff,color:#fff
style ORDER fill:#2c3e50,stroke:#fff,color:#fff
style PRODUCT fill:#2c3e50,stroke:#fff,color:#fff
style ANALYTICS fill:#2c3e50,stroke:#fff,color:#fff
// Web BFF: Aggregation endpoint for the Dashboard page
app.MapGet("/bff/dashboard", async (
IUserService userService,
IOrderService orderService,
IProductService productService,
HttpContext context) =>
{
var userId = context.User.FindFirst("sub")!.Value;
// Call 3 services in parallel
var profileTask = userService.GetProfileAsync(userId);
var ordersTask = orderService.GetRecentAsync(userId, limit: 10);
var recommendationsTask = productService
.GetRecommendationsAsync(userId, limit: 8);
await Task.WhenAll(profileTask, ordersTask, recommendationsTask);
return Results.Ok(new
{
Profile = profileTask.Result,
RecentOrders = ordersTask.Result,
Recommendations = recommendationsTask.Result,
ServerTime = DateTime.UtcNow
});
}).RequireAuthorization();
// Mobile BFF: Compact response, only essential fields
app.MapGet("/bff/mobile/dashboard", async (
IUserService userService,
IOrderService orderService,
HttpContext context) =>
{
var userId = context.User.FindFirst("sub")!.Value;
var profileTask = userService.GetProfileAsync(userId);
var ordersTask = orderService.GetRecentAsync(userId, limit: 5);
await Task.WhenAll(profileTask, ordersTask);
var profile = profileTask.Result;
return Results.Ok(new
{
DisplayName = profile.DisplayName,
AvatarUrl = profile.AvatarUrl,
OrderCount = ordersTask.Result.Count,
LastOrderStatus = ordersTask.Result.FirstOrDefault()?.Status
});
}).RequireAuthorization();
9. Observability — Monitoring the API Gateway
The API Gateway is the best place for observability since all traffic passes through it. Track the 4 key metrics (RED + Saturation):
// OpenTelemetry integration for YARP
builder.Services.AddOpenTelemetry()
.WithMetrics(metrics =>
{
metrics.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddMeter("Yarp.ReverseProxy")
.AddOtlpExporter(opt =>
opt.Endpoint = new Uri("http://otel-collector:4317"));
})
.WithTracing(tracing =>
{
tracing.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.SetResourceBuilder(ResourceBuilder.CreateDefault()
.AddService("api-gateway"))
.AddOtlpExporter();
});
// Custom middleware: add a correlation ID
app.Use(async (context, next) =>
{
if (!context.Request.Headers.ContainsKey("X-Correlation-Id"))
{
context.Request.Headers["X-Correlation-Id"] =
Guid.NewGuid().ToString("N");
}
context.Response.Headers["X-Correlation-Id"] =
context.Request.Headers["X-Correlation-Id"];
await next();
});
10. Best Practices for Production
💡 10 API Gateway principles for production
- Stateless gateway: Don't store session state at the gateway — use JWTs or an external session store. Makes horizontal scaling easy.
- Timeout cascade: Gateway timeout must exceed backend timeout. Example: backend 5s → gateway 8s → client 15s. Avoid the gateway timing out before the backend responds.
- Only retry idempotent operations: Retry GET, PUT, DELETE only. DO NOT retry POST (can create duplicates). If you need POST retries, require the client to send an Idempotency-Key header.
- Rate limit before auth: Put rate limiting at the front of the pipeline to block abuse before spending CPU on JWT validation.
- Separate gateway health checks: Split
/health/live(gateway alive) and/health/ready(gateway + backends ready). Kubernetes uses different liveness and readiness probes. - Request/response size limits: Set a max body size (e.g., 10MB) to block payload abuse. Use streaming for file uploads instead of buffering the whole thing.
- Graceful shutdown: When restarting the gateway, drain active connections before shutting down. YARP supports the
app.Lifetime.ApplicationStoppingevent. - Configuration hot-reload: Routing changes shouldn't require a gateway restart. YARP supports
IProxyConfigProviderto load config from DB, Consul, or etcd. - Avoid the gateway monolith: Keep business logic out of the gateway. The gateway only handles cross-cutting concerns — routing, auth, rate limiting, transforms. Business logic belongs in domain services.
- Canary routing: Use header-based or weight-based routing to roll out new versions gradually. YARP supports
HeaderRouteMatchto route by a custom header.
11. Full Architecture — The Gateway in a Production Stack
graph TB
CDN[CDN / Cloudflare] --> LB[External Load Balancer]
LB --> GW1[API Gateway Instance 1]
LB --> GW2[API Gateway Instance 2]
subgraph "Gateway Responsibilities"
GW1 --> |"Auth, Rate Limit,
Route, Transform"| GW1
end
GW1 --> SD[Service Discovery
Consul / K8s DNS]
GW2 --> SD
SD --> US1[User Service x3]
SD --> OS1[Order Service x2]
SD --> PS1[Product Service x4]
GW1 --> OTEL[OpenTelemetry
Collector]
GW2 --> OTEL
OTEL --> GRAF[Grafana / Dashboard]
style CDN fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style GW1 fill:#e94560,stroke:#fff,color:#fff
style GW2 fill:#e94560,stroke:#fff,color:#fff
style SD fill:#2c3e50,stroke:#fff,color:#fff
style US1 fill:#2c3e50,stroke:#fff,color:#fff
style OS1 fill:#2c3e50,stroke:#fff,color:#fff
style PS1 fill:#2c3e50,stroke:#fff,color:#fff
style OTEL fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style GRAF fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Conclusion
The API Gateway is not an optional component in microservices — it's an essential infrastructure layer once a system exceeds 3-5 services. By centralizing routing, authentication, rate limiting, and observability in one place, the gateway lets teams focus on business logic instead of repeating cross-cutting concerns in every service.
For .NET teams, YARP stands out because it runs in-process on the ASP.NET Core pipeline — near-zero overhead and unlimited customization via C# middleware. For multi-language teams or those needing a ready-made plugin ecosystem, Kong remains the industry standard. If you're all-in on AWS and want zero ops, AWS API Gateway is a worthwhile managed choice.
Whatever you choose, remember the golden rule: the gateway handles only cross-cutting concerns. Business logic belongs to domain services.
References
- Microsoft Learn — Getting Started with YARP on .NET 10
- Milan Jovanović — Implementing an API Gateway with YARP
- NashTech — Build Full Sample with YARP API Gateway on .NET 10
- Calmops — API Gateways: Kong, Envoy and Modern API Management 2026
- Elysiate — API Gateway Authentication Patterns 2026: JWT, OAuth2, API Keys, mTLS
- GitHub — dotnet/yarp: A toolkit for developing high-performance HTTP reverse proxy applications
- AntonDevTips — YARP as API Gateway in .NET: 7 Real-World Scenarios
Microsoft Agent Framework 1.0 — Unified SDK for AI Agents on .NET 10
DynamoDB Single-Table Design — The Art of NoSQL Modeling for Large-Scale Systems
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.