GraphQL Federation — Unifying Microservices APIs into a Single Supergraph

Posted on: 4/22/2026 2:12:26 AM

In the world of microservices, each team owns a separate service with its own API schema. But on the client side — web apps, mobile apps — all they want is a single endpoint to query all the data they need. This is exactly the problem GraphQL Federation solves: it allows multiple subgraphs (each subgraph is a microservice) to be composed into a single supergraph, with a gateway in the middle orchestrating everything.

330+ Edge locations (Apollo Router Cloud)
<0.5ms Query planning overhead
Netflix, Expedia Production adopters
10K+ QPS ~120MB RAM (Router)

1. The Problem: Traditional API Gateways Don't Scale with Organizations

As microservices systems grow to dozens or hundreds of services, the traditional REST API Gateway model faces serious issues:

  • Over-fetching / Under-fetching: Clients must call multiple separate endpoints, receive too much or too little data, then stitch results together manually.
  • Gateway monolith: A central team must maintain all routing logic, becoming an organizational bottleneck.
  • Schema coupling: When service A adds a new field, the gateway must redeploy — even though services B and C are unrelated.
  • API versioning hell: REST forces maintaining v1, v2, v3… in parallel, increasing complexity over time.

Why not just use regular GraphQL?

Monolithic GraphQL solves over-fetching but creates a single schema monolith — one massive schema file that every team must commit to the same repo. Federation removes that constraint by letting each team own their own subgraph.

2. What Is GraphQL Federation?

GraphQL Federation is an architecture that lets you split your GraphQL schema into multiple independent subgraphs, each owned by a separate team/service. A router (gateway) automatically composes all subgraphs into a single supergraph that clients interact with.

graph TD
    Client["🖥️ Client App"] --> Router["Apollo Router
(Supergraph Gateway)"] Router --> S1["Subgraph: Users
Team A"] Router --> S2["Subgraph: Products
Team B"] Router --> S3["Subgraph: Orders
Team C"] Router --> S4["Subgraph: Reviews
Team D"] style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style Router fill:#e94560,stroke:#fff,color:#fff style S1 fill:#2c3e50,stroke:#fff,color:#fff style S2 fill:#2c3e50,stroke:#fff,color:#fff style S3 fill:#2c3e50,stroke:#fff,color:#fff style S4 fill:#2c3e50,stroke:#fff,color:#fff
GraphQL Federation architecture: Client → Router → Subgraphs

2.1. Three Core Concepts

Concept Description Example
Subgraph An independent GraphQL service that defines part of the schema. Each team owns ≥1 subgraph. Users subgraph defines type User
Supergraph The composed schema, automatically built from all subgraphs. Clients only see the supergraph. Query { user { orders { items } } } spans 3 subgraphs
Router Gateway that receives client queries, creates a query plan, calls the right subgraphs, and merges results. Apollo Router, Cosmo Router, Grafbase Gateway

3. How Federation Composes Schemas: Key Directives

Federation uses special directives that allow subgraphs to "communicate" with each other through the router, without needing to know about each other directly.

3.1. @key — Entity Identification

The @key directive marks a type as an entity — something that can be resolved by multiple subgraphs based on a primary key.

# Subgraph: Users (owned by Team A)
type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
  avatar: String
}

type Query {
  user(id: ID!): User
  users: [User!]!
}
# Subgraph: Reviews (owned by Team D)
# Team D "extends" the User type WITHOUT modifying the Users subgraph
type User @key(fields: "id") {
  id: ID!
  reviews: [Review!]!    # Team D adds this field
}

type Review @key(fields: "id") {
  id: ID!
  rating: Int!
  comment: String!
  author: User!
}

Entity Resolution

When the router receives the query { user(id: "1") { name reviews { rating } } }, it will: (1) call the Users subgraph to get name, (2) call the Reviews subgraph with __resolveReference({ id: "1" }) to get reviews, then (3) merge the results. The client never knows the data came from 2 different services.

3.2. @external, @requires, @provides — Cross-Subgraph Dependencies

# Subgraph: Products
type Product @key(fields: "id") {
  id: ID!
  name: String!
  price: Float!
  weight: Float!
}

# Subgraph: Shipping
type Product @key(fields: "id") {
  id: ID!
  weight: Float! @external        # Declares a field from another subgraph
  shippingCost: Float! @requires(fields: "weight")  # Needs weight to compute
}

When the client queries shippingCost, the router automatically fetches weight from the Products subgraph first, then passes it to the Shipping subgraph to calculate the cost.

3.3. @shareable — Multiple Subgraphs Resolving the Same Field

# Both Users and Auth subgraphs can resolve email
type User @key(fields: "id") {
  id: ID!
  email: String! @shareable
}

4. Apollo Router — The Brain of the Supergraph

Apollo Router (written in Rust) is the central component in Federation architecture. It replaces the legacy Apollo Gateway (Node.js) with vastly superior performance.

graph LR
    Q["Client Query"] --> QP["Query Planner
(determines which subgraphs)"] QP --> EE["Execution Engine
(calls subgraphs in parallel)"] EE --> S1["Subgraph A"] EE --> S2["Subgraph B"] EE --> S3["Subgraph C"] S1 --> MR["Response Merger
(combines results)"] S2 --> MR S3 --> MR MR --> R["Response → Client"] style Q fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style QP fill:#e94560,stroke:#fff,color:#fff style EE fill:#e94560,stroke:#fff,color:#fff style MR fill:#e94560,stroke:#fff,color:#fff style R fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S1 fill:#2c3e50,stroke:#fff,color:#fff style S2 fill:#2c3e50,stroke:#fff,color:#fff style S3 fill:#2c3e50,stroke:#fff,color:#fff
Query processing flow inside Apollo Router
Criteria Apollo Gateway (Node.js) Apollo Router (Rust)
RAM @ 10K QPS ~800 MB ~120 MB
Query planning overhead 3–5 ms ~0.5 ms
Cold start Several seconds (Node.js boot) <100 ms
Plugin system JavaScript middleware Rhai scripting + Rust native
Language JavaScript/TypeScript Rust
Status Legacy (not recommended) Production-ready, actively developed

4.1. Basic Router Configuration

# router.yaml
supergraph:
  listen: 0.0.0.0:4000
  introspection: true

cors:
  origins:
    - https://app.example.com

headers:
  all:
    request:
      - propagate:
          named: "Authorization"
      - propagate:
          named: "X-Request-Id"

telemetry:
  exporters:
    tracing:
      otlp:
        enabled: true
        endpoint: http://otel-collector:4317

limits:
  max_depth: 15
  max_height: 200

coprocessor:
  url: http://auth-service:8080
  router:
    request:
      headers: true

5. Designing Subgraphs the Right Way

5.1. Separation of Concerns Principles

Each subgraph should represent a bounded context in the domain — similar to Domain-Driven Design.

graph TD
    subgraph "E-commerce Supergraph"
        UG["Users Subgraph
Profiles, Auth, Preferences"] PG["Products Subgraph
Catalog, Inventory, Pricing"] OG["Orders Subgraph
Cart, Checkout, Fulfillment"] RG["Reviews Subgraph
Ratings, Comments, Moderation"] SG["Search Subgraph
Full-text, Filters, Suggestions"] end UG -.->|"@key: User.id"| RG PG -.->|"@key: Product.id"| OG PG -.->|"@key: Product.id"| RG UG -.->|"@key: User.id"| OG PG -.->|"@key: Product.id"| SG style UG fill:#e94560,stroke:#fff,color:#fff style PG fill:#2c3e50,stroke:#fff,color:#fff style OG fill:#4CAF50,stroke:#fff,color:#fff style RG fill:#ff9800,stroke:#fff,color:#fff style SG fill:#9C27B0,stroke:#fff,color:#fff
Subgraph division by bounded context in an e-commerce system

5.2. Entity Ownership vs. Contribution

The Golden Rule

Only one subgraph owns an entity — it defines the core fields and primary resolver. Other subgraphs only contribute additional fields to that entity via the @key directive. For example: the Users subgraph owns User.name, User.email; the Reviews subgraph contributes User.reviews.

5.3. Avoiding Circular Dependencies

# ❌ AVOID: Deep circular references
# Users subgraph
type User @key(fields: "id") {
  id: ID!
  orders: [Order!]!      # User → Order
}

# Orders subgraph
type Order @key(fields: "id") {
  id: ID!
  buyer: User!            # Order → User
  items: [OrderItem!]!
}

type OrderItem {
  product: Product!       # OrderItem → Product
}

# Products subgraph
type Product @key(fields: "id") {
  id: ID!
  reviews: [Review!]!    # Product → Review
}

# Reviews subgraph
type Review {
  author: User!           # Review → User → cycle!
# ✅ BETTER: Keep references unidirectional, use IDs when needed
# Reviews subgraph
type Review @key(fields: "id") {
  id: ID!
  rating: Int!
  comment: String!
  authorId: ID!           # Return only the ID, let client resolve if needed
  author: User!           # Or still resolve but shallow (only @key fields)
}

6. Schema Composition and CI/CD

In production, schema composition happens in the CI/CD pipeline — not at runtime. This ensures every breaking change is detected before deployment.

graph LR
    Dev["Developer pushes
subgraph change"] --> CI["CI Pipeline"] CI --> Check["rover subgraph check
(composition + breaking change detection)"] Check -->|Pass| Pub["rover subgraph publish
(update schema registry)"] Check -->|Fail| Fix["Fix schema
incompatibility"] Pub --> Router["Router hot-reloads
new supergraph schema"] Fix --> Dev style Dev fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style CI fill:#2c3e50,stroke:#fff,color:#fff style Check fill:#e94560,stroke:#fff,color:#fff style Pub fill:#4CAF50,stroke:#fff,color:#fff style Fix fill:#ff9800,stroke:#fff,color:#fff style Router fill:#2c3e50,stroke:#fff,color:#fff
CI/CD workflow for schema composition with Apollo GraphOS

6.1. Rover CLI — Schema Management Tool

# Check if a subgraph change is compatible
rover subgraph check my-graph@production \
  --name users \
  --schema ./users/schema.graphql

# Publish subgraph to the registry
rover subgraph publish my-graph@production \
  --name users \
  --schema ./users/schema.graphql \
  --routing-url http://users-service:4001/graphql

# Fetch the current supergraph schema
rover supergraph fetch my-graph@production

6.2. Breaking Change Detection

Change Type Severity Example
Remove a field in active use 🔴 Breaking Remove User.email while 50 operations use it
Change field type 🔴 Breaking price: Float!price: String!
Add required argument 🟡 Potentially breaking users(limit: Int!) — old clients missing the argument
Add new field 🟢 Safe Add User.lastLoginAt
Deprecate field 🟢 Safe email: String! @deprecated(reason: "Use contactEmail")

7. Query Planning and Execution — Deep Dive into Router Internals

When the router receives a query, it performs query planning — analyzing the query into an execution plan that calls the right subgraphs in the correct dependency order.

# Client sends this query:
query GetUserDashboard($userId: ID!) {
  user(id: $userId) {
    name                    # → Users subgraph
    email                   # → Users subgraph
    orders(last: 5) {       # → Orders subgraph
      id
      total
      items {
        product {
          name              # → Products subgraph
          price             # → Products subgraph
        }
      }
    }
    reviews {               # → Reviews subgraph
      rating
      comment
    }
  }
}

Router's Query Plan

The router creates an execution plan: Fetch User (parallel) → Fetch Orders + Reviews → Fetch Products. Subgraph calls that don't depend on each other run in parallel. The router also caches query plans (plan caching) so the same query next time only takes ~0.01ms instead of ~0.5ms.

8. Security and Authorization in Federation

Authorization in a federated graph is more complex than in a monolith because data comes from multiple services. There are 2 common patterns:

8.1. Router-level Authentication + Subgraph-level Authorization

# router.yaml — JWT authentication at the router
authentication:
  router:
    jwt:
      jwks:
        url: https://auth.example.com/.well-known/jwks.json
      header_name: Authorization
      header_value_prefix: Bearer
# Subgraph: Orders — authorization at the subgraph level
type Order @key(fields: "id") @authenticated {
  id: ID!
  total: Float!
  buyerId: ID!
  items: [OrderItem!]! @requiresScopes(scopes: [["order:read"]])
  internalNotes: String @requiresScopes(scopes: [["admin"]])
}

8.2. @policy Directive for Business Rules

type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String! @policy(policies: [["self_or_admin"]])
  salary: Float! @policy(policies: [["hr_department"]])
}

9. Performance Optimization

9.1. DataLoader Pattern — Solving N+1 Queries

When querying a list of orders with buyer info, without DataLoader the Users subgraph would be called N times (once per order). DataLoader batches these requests into a single call.

// Users subgraph — Reference resolver with DataLoader
import DataLoader from 'dataloader';

const userLoader = new DataLoader<string, User>(async (ids) => {
  const users = await db.users.findMany({
    where: { id: { in: ids as string[] } }
  });
  const userMap = new Map(users.map(u => [u.id, u]));
  return ids.map(id => userMap.get(id)!);
});

const resolvers = {
  User: {
    __resolveReference(ref: { id: string }) {
      return userLoader.load(ref.id);
    }
  }
};

9.2. @defer — Streaming Responses

# Client receives name/email immediately, reviews load later (streaming)
query {
  user(id: "1") {
    name
    email
    ... @defer(label: "reviews") {
      reviews {
        rating
        comment
      }
    }
  }
}

9.3. Persisted Queries

# router.yaml
persisted_queries:
  enabled: true
  safelist:
    enabled: true        # Only allow registered queries
    require_id: true     # Client sends hash instead of full query text

Performance Note

Avoid designing subgraphs with too many cross-entity references — each hop between subgraphs adds ~2-5ms of network latency. If a query must pass through 4-5 subgraphs sequentially, total latency can reach 20-30ms from overhead alone. Design so that most queries only need 1-2 hops.

10. Federation in .NET with Hot Chocolate

If you're using .NET, Hot Chocolate (ChilliCream) has full Federation 2 support:

// Program.cs — .NET 10 subgraph
var builder = WebApplication.CreateBuilder(args);

builder.Services
    .AddGraphQLServer()
    .AddApolloFederationV2()
    .AddQueryType<Query>()
    .AddType<UserType>()
    .AddType<ReviewType>()
    .RegisterService<IReviewRepository>();

var app = builder.Build();
app.MapGraphQL();
app.Run();
// UserType.cs — Entity type with @key
[Key("id")]
public class UserType : ObjectType<User>
{
    protected override void Configure(IObjectTypeDescriptor<User> descriptor)
    {
        descriptor.Field(u => u.Id).Type<NonNullType<IdType>>();
        descriptor.Field("reviews")
            .ResolveWith<UserResolvers>(r => r.GetReviews(default!, default!));
    }
}

// Reference resolver — called by the router to resolve User entity
[ReferenceResolver]
public static async Task<User?> GetUserById(
    [ID] int id,
    IReviewRepository repo)
{
    return await repo.GetUserReferenceAsync(id);
}

11. Monitoring and Observability

In a federated architecture, observability is especially important because a single query can traverse multiple subgraphs:

11.1. Distributed Tracing

# router.yaml — OpenTelemetry tracing
telemetry:
  exporters:
    tracing:
      otlp:
        enabled: true
        endpoint: http://otel-collector:4317
        protocol: grpc
  instrumentation:
    spans:
      router:
        attributes:
          graphql.document: true
      subgraph:
        attributes:
          subgraph.name: true
          graphql.operation.name: true

11.2. Key Metrics to Monitor

Metric Description Alert Threshold
apollo.router.query_planning.duration Time to create query plan >5ms (may need plan cache warming)
apollo.router.http.request.duration Total end-to-end request time p99 >500ms
subgraph.request.duration per subgraph Latency per subgraph Slowest subgraph = bottleneck
apollo.router.cache.hit_rate Query plan cache hit rate <90% is concerning
graphql.error.count GraphQL error count (partial responses) Sudden spike = subgraph down

12. When NOT to Use Federation

Federation Is Not a Silver Bullet

  • Small teams (<3 backend devs): The overhead of managing multiple subgraphs, a router, and a schema registry outweighs the benefits. Monolithic GraphQL is sufficient.
  • Rarely changing schema: If the API schema is stable, REST + OpenAPI is much simpler.
  • Latency-critical (sub-ms): Each hop through the router adds overhead — high-frequency trading or game servers should use gRPC directly.
  • No clear domain boundaries: Federation forces you to divide subgraphs — if the domain model isn't mature, you'll be refactoring constantly.

13. Alternatives Comparison

Solution Strengths Weaknesses Best For
GraphQL Federation Unified schema, team autonomy, type-safe Complexity, learning curve, router overhead Large orgs, many teams, complex domains
REST + API Gateway Simple, mature tooling, easy caching Over-fetching, many endpoints, versioning CRUD apps, public APIs, small teams
gRPC + Envoy High performance, strong typing, streaming Not frontend-friendly, binary protocol Service-to-service, latency-critical
Schema Stitching Simpler schema merging Doesn't scale organizationally, fragile 2-3 services, quick prototypes
2019
Apollo introduces Federation v1 — the first time microservices could compose distributed GraphQL schemas.
2021
Federation v2 launches: progressive migration, @shareable, @override, @inaccessible directives. Solves the biggest pain points of v1.
2023
Apollo Router (Rust) replaces Apollo Gateway (Node.js). Performance increases 6-8x, RAM decreases 85%.
2024
The GraphQL Foundation officially standardizes the Federation spec — no longer "Apollo-only".
2025–2026
@defer/@stream reach production support. Cosmo Router, Grafbase Gateway — the ecosystem diversifies. Netflix, Expedia, Volvo deploy supergraphs at scale.

Conclusion

GraphQL Federation is a powerful architectural solution for the problem of "how to let multiple teams build a unified API without creating bottlenecks". With Apollo Router (Rust), schema composition in CI/CD, and an increasingly diverse ecosystem (Hot Chocolate for .NET, federation-jvm for Java/Kotlin), Federation is no longer an experiment — it's a production-proven pattern for large-scale organizations.

However, remember: Federation solves organizational scaling problems before technical ones. If your team is small and your domain is simple, a monolithic GraphQL server is still the better choice.

References: