GraphQL Federation — Building a Unified API Gateway for Microservices
Posted on: 4/25/2026 6:12:31 AM
Table of contents
- 1. The Problem — When REST API Gateway Falls Short
- 2. Core Concepts
- 3. Federation v2 — Key Improvements
- 4. Choosing a Router — Apollo vs Cosmo vs HotChocolate
- 5. Implementation on .NET with HotChocolate
- 6. Performance — Is Federation Slower Than REST?
- 7. Caching — GraphQL's Biggest Challenge
- 8. Apollo Connectors — Integrating REST into Federation
- 9. When Should You Use Federation?
- 10. Conclusion
As microservice systems grow to tens or hundreds of services, the biggest question is no longer "REST or gRPC?" but rather: how can the frontend make a single request and access data from all services? REST API Gateways handle routing, but they don't solve data composition — when a page needs data from 5-10 different services, the frontend must make multiple API calls and stitch the data together. GraphQL Federation was built to solve exactly this problem: each team owns a subgraph, and the router automatically composes them into a single unified API. This article provides a deep analysis of the architecture, implementation on .NET with HotChocolate, and real-world production lessons.
1. The Problem — When REST API Gateway Falls Short
Imagine a product detail page on an e-commerce platform: you need product information (Product Service), pricing and inventory (Inventory Service), reviews (Review Service), recommendations (Recommendation Service), and seller information (Seller Service). With REST, the frontend must make 5 sequential round-trips to the gateway, each request waiting for the previous one to complete (waterfall pattern).
With GraphQL Federation, the frontend sends 1 single query. The Router automatically fans out to relevant subgraphs in parallel and composes the response.
2. Core Concepts
GraphQL Federation revolves around 3 main components:
| Concept | Description | Example |
|---|---|---|
| Subgraph | A GraphQL service owned by one team. Defines the types and fields that team is responsible for. | Product Subgraph defines type Product { id, name, price } |
| Supergraph | The unified schema composed from all subgraphs. Clients only see the supergraph — they don't know how many subgraphs exist behind it. | type Product { id, name, price, stock, reviews } |
| Router | The runtime that receives client queries, creates a query plan, fans out to relevant subgraphs, and composes the response. | Apollo Router (Rust), Cosmo Router (Go), HotChocolate Fusion (.NET) |
| @key | Directive marking an entity — allows other subgraphs to reference and extend this type. | type Product @key(fields: "id") |
| @external | Field defined in another subgraph, only referenced here. | extend type Product { id: ID! @external } |
| @requires | Field that needs data from another subgraph to resolve. | shippingCost: Float @requires(fields: "weight") |
3. Federation v2 — Key Improvements
Federation v2 (current version: v2.11) brings significant improvements over v1:
| Feature | Federation v1 | Federation v2 |
|---|---|---|
| Shared types | Each type owned by only 1 subgraph | @shareable — multiple subgraphs can define the same field |
| Schema evolution | Breaking changes hard to manage | @override — safely migrate fields between subgraphs |
| Input types | Cannot be shared | @inaccessible — hide fields from supergraph without deleting |
| Composition | Runtime composition at gateway | Build-time composition — errors caught before deployment |
| Error messages | Vague, hard to debug | Detailed composition hints pointing to exact error locations |
| Progressive migration | Not supported | @override(label: "percent(50)") — canary migration |
# Product Subgraph — owned by Product Team
extend schema @link(
url: "https://specs.apollo.dev/federation/v2.11"
import: ["@key", "@shareable", "@override"]
)
type Product @key(fields: "id") {
id: ID!
name: String!
description: String
price: Float!
weight: Float
sellerId: ID!
createdAt: DateTime!
}
type Query {
product(id: ID!): Product
products(first: Int = 10, after: String): ProductConnection!
}
Entity Resolution — How the Router stitches data together
When a client queries product(id: "123") { name, stock, reviews }, the Router:
Step 1: Sends query to Product Subgraph for name and id.
Step 2: Uses id from step 1, sends requests in parallel to Inventory Subgraph (for stock) and Review Subgraph (for reviews).
Step 3: Merges results into a single JSON response.
This entire process is transparent to the client — it only sees one unified schema with no knowledge of where data originates.
4. Choosing a Router — Apollo vs Cosmo vs HotChocolate
| Criteria | Apollo Router | Cosmo Router | HotChocolate Fusion |
|---|---|---|---|
| Language | Rust | Go | C# (.NET) |
| License | Elastic v2 (source-available) | Apache 2.0 (open-source) | MIT (open-source) |
| Performance | Very high (Rust runtime) | High (Go runtime) | High (Kestrel + .NET 10) |
| Federation spec | v2.11 (spec authors) | v2.x compatible | v2.x + custom Fusion |
| REST Connectors | Yes (GraphOS) | No | Schema Stitching legacy |
| Managed platform | Apollo GraphOS (paid) | Cosmo Cloud (free tier) | Self-hosted |
| Best for | Polyglot teams, enterprise | Teams needing true open-source | .NET teams, ASP.NET Core integration |
5. Implementation on .NET with HotChocolate
For .NET teams, HotChocolate v14 is the only production-grade choice. It supports both Apollo Federation v2 spec and ChilliCream's own Fusion ecosystem:
// ProductSubgraph/Program.cs
var builder = WebApplication.CreateBuilder(args);
builder.Services
.AddGraphQLServer()
.AddApolloFederation() // Enable Federation v2
.AddQueryType<ProductQuery>()
.AddType<ProductType>()
.AddFiltering()
.AddSorting()
.RegisterDbContext<ProductDbContext>();
var app = builder.Build();
app.MapGraphQL();
app.Run();
// Types/ProductType.cs
[Key("id")] // Federation @key directive
public class Product
{
public int Id { get; set; }
public string Name { get; set; } = default!;
public decimal Price { get; set; }
public float? Weight { get; set; }
public int SellerId { get; set; }
}
// DataLoader — batch N+1 queries
public class ProductByIdDataLoader : BatchDataLoader<int, Product>
{
private readonly IDbContextFactory<ProductDbContext> _factory;
public ProductByIdDataLoader(
IDbContextFactory<ProductDbContext> factory,
IBatchScheduler scheduler) : base(scheduler)
{
_factory = factory;
}
protected override async Task<IReadOnlyDictionary<int, Product>>
LoadBatchAsync(IReadOnlyList<int> keys, CancellationToken ct)
{
await using var ctx = await _factory.CreateDbContextAsync(ct);
return await ctx.Products
.Where(p => keys.Contains(p.Id))
.ToDictionaryAsync(p => p.Id, ct);
}
}
DataLoader — the solution for N+1 in Federation
When the Router fans out queries to multiple subgraphs, each subgraph may receive hundreds of entity resolution requests simultaneously (e.g., resolving stock for 50 products in a list query). Without DataLoader, each entity = 1 database query — the classic N+1 problem.
HotChocolate has built-in DataLoader that automatically batches requests within the same execution cycle. 50 entity resolutions become 1 SQL query with WHERE id IN (...). This is the key difference between a PoC and production-grade Federation.
6. Performance — Is Federation Slower Than REST?
A common concern: "doesn't adding a Router layer slow things down?" Short answer: Router adds slight latency, but total end-to-end time typically decreases thanks to parallel fan-out and elimination of frontend waterfall patterns.
| Scenario | REST (5 API calls) | GraphQL Federation | Difference |
|---|---|---|---|
| Product detail page | ~450ms (5 sequential calls) | ~180ms (1 query, parallel fan-out) | -60% |
| Product list (20 items) | ~200ms (1 call, over-fetching) | ~220ms (1 query, exact fields) | +10% (but less data transfer) |
| Mobile — weak network | ~2.5s (5 round-trips x high latency) | ~800ms (1 round-trip) | -68% |
| Router overhead | N/A | ~3-8ms (query planning) | Negligible |
Real benchmarks from The Guild (2026)
The Guild (creators of GraphQL Hive) published benchmarks comparing Federation routers:
Apollo Router (Rust): ~8,500 req/s, p99 latency ~12ms
Cosmo Router (Go): ~7,200 req/s, p99 latency ~15ms
Hive Gateway (TypeScript): ~3,800 req/s, p99 latency ~28ms
These numbers show that Router overhead is not the bottleneck — subgraph response times and database queries are the deciding factors.
7. Caching — GraphQL's Biggest Challenge
GraphQL's well-known weakness: CDN caching is hard. REST uses GET + Cache-Control headers for easy caching. GraphQL uses POST with different query bodies each time.
| Strategy | Description | Effectiveness |
|---|---|---|
| Persisted Queries | Hash queries into IDs (e.g., abc123), send GET /graphql?id=abc123. CDN-cacheable like REST. | High — production recommended |
| Response Caching | Cache full responses at Router level by query hash + variables. | Medium — low hit rate with diverse queries |
| Entity Caching | Cache individual entities (e.g., Product:123) at Router. When another query needs Product:123, use cache instead of calling subgraph. | High — Apollo Router supports with Redis |
| Subgraph-level Caching | Each subgraph manages its own caching at data layer (Redis, in-memory). | High — full control, Router-independent |
8. Apollo Connectors — Integrating REST into Federation
A breakthrough feature in Apollo Router v2: Connectors let you turn REST APIs into subgraphs without writing code. Just declare in the schema:
# Turn REST endpoint into GraphQL subgraph
# No separate server needed — Router calls REST API directly
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.11", import: ["@key"])
@link(url: "https://specs.apollo.dev/connect/v0.4", import: ["@source", "@connect"])
@source(name: "legacy", http: { baseURL: "https://api.legacy.com/v1" })
type Product @key(fields: "id") {
id: ID!
name: String!
legacyData: LegacyProductData
@connect(
source: "legacy"
http: { GET: "/products/{$this.id}" }
selection: """
sku
category: categoryName
tags: labels
"""
)
}
Connectors enable gradual migration
Instead of rewriting all REST APIs to GraphQL at once (high risk), you can:
1. Keep existing REST services running.
2. Use Connectors to expose them through Federation.
3. Migrate frontend to GraphQL queries.
4. Gradually replace REST services with native GraphQL subgraphs.
This is the "strangler fig pattern" applied to API migration.
9. When Should You Use Federation?
| Scenario | Use Federation | Don't Use Federation |
|---|---|---|
| Service count | 5+ microservices, multiple teams | Monolith or 2-3 services |
| Frontend | Multiple platforms (web, mobile, partner API) | Single frontend, few complex pages |
| Data composition | One page needs data from 3+ services | Each page only needs data from 1 service |
| Team structure | Each team owns their service independently | One small team manages everything |
| Caching | Frequently changing, personalized data | Mostly static content, need strong CDN caching |
Federation is not a silver bullet
1. Increased complexity: Adding Router, schema composition, entity resolution requires deep GraphQL understanding.
2. Harder debugging: When queries are slow, you need to trace through Router, subgraph, and database layers. Without proper observability, debugging is extremely difficult.
3. Schema governance: With 10+ subgraphs from multiple teams, you need schema change review processes (schema registry, CI checks) to prevent breaking changes.
4. Learning curve: Developers must learn GraphQL spec, Federation directives, DataLoader patterns, and caching strategies.
5. Doesn't fully replace API Gateway: You still need an API Gateway for authentication, rate limiting, and IP filtering. The Router only handles GraphQL routing.
10. Conclusion
GraphQL Federation solves a problem that REST API Gateways cannot: automatically composing data from multiple microservices into a single response, while maintaining each team's autonomy with their own subgraph. With Apollo Router (Rust, 10x faster than legacy gateway), HotChocolate Fusion (deep .NET 10 integration), and features like Connectors (integrate REST without code), Federation is production-ready at scale.
However, Federation isn't for every project. If your system only has 2-3 services and one frontend — REST + BFF is much simpler. Federation truly shines when you have 5+ services, multiple teams, and frontends needing data from multiple sources. Start with 1-2 subgraphs, validate the value, then expand gradually — don't migrate your entire system at once.
References
- Introduction to Apollo Federation — Apollo GraphQL Docs
- What's New in GraphOS Router v2.x — Apollo GraphQL Docs
- GraphQL Federation Gateways Performance Benchmark — The Guild
- Hot Chocolate vs graphql-dotnet in .NET 2026 — Coding Droplets
- GraphQL vs REST: 18 Claims Fact-Checked — WunderGraph
- GraphQL vs REST 2026: Performance Tested — Tech Insider
- GraphQL vs REST vs gRPC: 2026 Decision Framework — Fordel Studios
- Federating .NET GraphQL APIs with Fusion — Medium
.NET 11 Preview 3: Union Types, Runtime Async and Notable Improvements
Distributed Locking — Solving Race Conditions in Distributed Systems with Redis and .NET 10
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.