AI Agent Orchestration — 6 Patterns for Production Agent Coordination in 2026

Posted on: 4/21/2026 1:10:44 AM

Table of contents

1. Why orchestration is make-or-break
1. Golden rule
2. Pattern 1: Sequential Pipeline
1. 2.1. Characteristics
2. 2.2. When to use it
3. Pattern 2: Supervisor (Orchestrator-Worker)
1. 3.1. Trade-offs
2. 3.2. Immutable state pattern
  1. Common mistake
4. Pattern 3: Parallel Fan-Out / Fan-In
1. 4.1. Cost and latency
2. 4.2. Real-world example
5. Pattern 4: Router (Intelligent Dispatch)
1. 5.1. Why is Router the most cost-efficient?
  1. Tip: which model should the Router use?
6. Pattern 5: Hierarchical (Multi-Level)
1. 6.1. When do you need Hierarchical?
  1. Warning: compounding overhead
7. Pattern 6: Evaluator-Optimizer Loop
1. 7.1. Implementation key points
2. 7.2. Cap your iterations
8. Summary comparison of the 6 patterns
9. Agent connection protocols: MCP vs A2A
1. MCP and A2A complement each other, not replace
10. SDKs & Frameworks for Production 2026
11. Picking the right pattern
12. Production best practices

When AI systems move from "one model, one prompt" into multi-agent architecture, the biggest question is no longer which model is the strongest but how to orchestrate the agents effectively. Based on Anthropic's analysis of 200+ enterprise deployments, 57% of failed projects have a root cause in orchestration design — the individual agents are strong enough, but coordination is weak.

This post goes deep on the 6 core orchestration patterns used in production, analyzing real-world trade-offs in cost, latency, and complexity so you can pick the right one for your problem.

57%AI projects failing due to orchestration

40%Multi-agent pilots that fail within 6 months

30-60%Cost reduction with the Router pattern

6Core patterns for production

1. Why orchestration is make-or-break

A single agent is enough for simple tasks: writing an email, summarizing a document, generating a code snippet. But as the problem grows — analyzing data from multiple sources, automating multi-step workflows with dependencies, or handling diverse input types — you need multiple specialized agents working together.

The catch is: adding agents doesn't automatically add value. Without a clear orchestration pattern, you'll hit:

Race conditions: agents overwriting each other's state
Infinite loops: agent A calls agent B, which calls A again
Cost explosion: every agent call consumes tokens — without control, costs spike 10-50x expected
Quality degradation: downstream agents receive low-quality input from upstream ones

Golden rule

Always start with the simplest pattern that can solve the problem. Upgrade to a more complex pattern only when you have concrete evidence (metrics) that the current one doesn't meet requirements. Premature orchestration complexity is the most common anti-pattern in production.

2. Pattern 1: Sequential Pipeline

This is the simplest pattern and should be your first choice. Agents run sequentially in a fixed chain, and each agent's output becomes the next agent's input.

graph LR
    A["Input"] --> B["Agent 1
Extract"]
    B --> C["Agent 2
Transform"]
    C --> D["Agent 3
Validate"]
    D --> E["Agent 4
Output"]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#0f3460,stroke:#fff,color:#fff
    style C fill:#0f3460,stroke:#fff,color:#fff
    style D fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#4CAF50,stroke:#fff,color:#fff

Figure 1: Sequential Pipeline — a linear, deterministic flow

2.1. Characteristics

Fixed order: defined at design time, not changed at runtime
State flows through a shared object: each agent reads/writes the same state container
Easy to debug: you know exactly which step failed
Linear latency: total latency = sum of all agents

2.2. When to use it

Pipeline fits when:

Each step depends on the previous one's output
The workflow has 3–5 clear steps
No parallelism is needed
Reliability and debuggability matter more than speed

# Example: Code-review analysis pipeline
from claude_agent_sdk import Agent, Pipeline

pipeline = Pipeline([
    Agent("code_reader", model="haiku",
          system="Read code and list the main changes"),
    Agent("security_scanner", model="sonnet",
          system="Analyze security vulnerabilities from the diff"),
    Agent("style_checker", model="haiku",
          system="Check coding style and conventions"),
    Agent("summarizer", model="sonnet",
          system="Summarize findings into a review comment")
])

result = await pipeline.run(code_diff)

3. Pattern 2: Supervisor (Orchestrator-Worker)

The most common pattern in enterprise. A "smart" supervisor agent takes the task, decomposes it into sub-tasks, delegates to specialized workers, and aggregates the results.

graph TD
    A["Task Input"] --> S["Supervisor Agent
(Opus/Sonnet)"]
    S --> W1["Worker 1
Research
(Haiku)"]
    S --> W2["Worker 2
Analysis
(Sonnet)"]
    S --> W3["Worker 3
Code Gen
(Sonnet)"]
    W1 --> S
    W2 --> S
    W3 --> S
    S --> R["Final Result"]
    style S fill:#e94560,stroke:#fff,color:#fff
    style W1 fill:#0f3460,stroke:#fff,color:#fff
    style W2 fill:#0f3460,stroke:#fff,color:#fff
    style W3 fill:#0f3460,stroke:#fff,color:#fff
    style R fill:#4CAF50,stroke:#fff,color:#fff

Figure 2: Supervisor Pattern — orchestrator decomposes and delegates to workers

3.1. Trade-offs

Aspect	Pros	Cons
Cost	Workers use cheaper models (Haiku), saves 40-60%	Supervisor reasoning adds 20-40% overhead
Quality	A strong supervisor model ensures good synthesis	Bottleneck if the supervisor misreads the request
Scalability	Easy to add new workers	Supervisor is a single point of failure
Debugging	Each worker has clear scope	Harder to trace interaction between supervisor and workers

3.2. Immutable state pattern

The most important best practice when implementing Supervisor: use immutable state snapshots. Each agent takes state version N, processes it, and returns state version N+1. No agent mutates state directly.

// Immutable state pattern with Claude Agent SDK
interface AgentState {
  readonly version: number;
  readonly data: Record<string, unknown>;
  readonly history: ReadonlyArray<AgentAction>;
}

function createNextState(
  current: AgentState,
  action: AgentAction,
  result: unknown
): AgentState {
  return {
    version: current.version + 1,
    data: { ...current.data, [action.key]: result },
    history: [...current.history, action]
  };
}

Common mistake

Don't let the supervisor decide how many workers to use — this leads to cost explosion. Instead, predefine the list of workers and let the supervisor pick which ones to activate for a specific task.

4. Pattern 3: Parallel Fan-Out / Fan-In

When a problem can be split into independent parts that can run in parallel, Fan-Out drops latency significantly. A dispatcher shards the task to N agents concurrently, and an aggregator collects and synthesizes the results.

graph TD
    I["Input"] --> D["Dispatcher"]
    D -->|"Chunk 1"| A1["Agent A"]
    D -->|"Chunk 2"| A2["Agent B"]
    D -->|"Chunk 3"| A3["Agent C"]
    D -->|"Chunk 4"| A4["Agent D"]
    A1 --> AG["Aggregator"]
    A2 --> AG
    A3 --> AG
    A4 --> AG
    AG --> O["Output"]
    style D fill:#e94560,stroke:#fff,color:#fff
    style AG fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#0f3460,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#0f3460,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

Figure 3: Fan-Out/Fan-In — parallel processing, final aggregation

4.1. Cost and latency

Fan-Out multiplies cost by the number of agents (N agents = ~N× cost) but latency equals the slowest agent + aggregation overhead. This pattern is only worth it when:

Latency is the top priority (user-facing realtime)
The chunks are truly independent (no shared dependency)
The budget supports N× cost

4.2. Real-world example

# Fan-Out: analyze many documents concurrently
import asyncio
from claude_agent_sdk import Agent

async def analyze_documents(docs: list[str]) -> dict:
    analyzer = Agent("doc_analyzer", model="haiku",
                     system="Extract key insights from the document")

    # Fan-Out: send all documents in parallel
    tasks = [analyzer.run(doc) for doc in docs]
    results = await asyncio.gather(*tasks)

    # Fan-In: aggregate with a stronger model
    synthesizer = Agent("synthesizer", model="sonnet",
                        system="Synthesize the insights into a report")

    combined_input = "\n---\n".join(
        f"Document {i+1}: {r}" for i, r in enumerate(results)
    )
    return await synthesizer.run(combined_input)

5. Pattern 4: Router (Intelligent Dispatch)

The Router pattern classifies input and routes it to the most suitable specialized agent. It's the most cost-efficient pattern because only one agent handles each request — the router only pays the classification cost.

graph TD
    I["User Request"] --> R["Router Agent
(Haiku - fast classify)"]
    R -->|"Simple Q&A"| A1["FAQ Agent
(Haiku)"]
    R -->|"Code task"| A2["Code Agent
(Sonnet)"]
    R -->|"Complex reasoning"| A3["Reasoning Agent
(Opus)"]
    R -->|"Data analysis"| A4["Analytics Agent
(Sonnet + Tools)"]
    style R fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#4CAF50,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#9c27b0,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

Figure 4: Router Pattern — intelligent dispatch to the best-fit agent

5.1. Why is Router the most cost-efficient?

Suppose a system receives 1000 requests/day with this distribution:

60% simple (FAQ) → Haiku: $0.001/request
25% medium (code/analysis) → Sonnet: $0.015/request
15% complex (reasoning) → Opus: $0.075/request

Without Router: Sonnet for everything = 1000 × $0.015 = $15/day

With Router: (600 × $0.001) + (250 × $0.015) + (150 × $0.075) + Router overhead = $16.1/day, but the output quality for complex tasks is far better with Opus.

In reality, Router drops cost by 30-60% when most traffic is simple requests.

Tip: which model should the Router use?

The Router only needs to classify intent — use Haiku with structured output (JSON schema). Adds ~200ms of latency but saves significantly downstream. If you need higher accuracy, use Sonnet for the Router — still cheaper than "all-Opus".

6. Pattern 5: Hierarchical (Multi-Level)

An extension of the Supervisor pattern with multiple tiers. Fits organization-scale workflows where one supervisor isn't enough context to manage every worker.

graph TD
    CEO["Strategic Agent
(Opus)"]
    CEO --> M1["Manager: Backend
(Sonnet)"]
    CEO --> M2["Manager: Frontend
(Sonnet)"]
    CEO --> M3["Manager: QA
(Sonnet)"]
    M1 --> W1["API Worker
(Haiku)"]
    M1 --> W2["DB Worker
(Haiku)"]
    M2 --> W3["UI Worker
(Haiku)"]
    M2 --> W4["Style Worker
(Haiku)"]
    M3 --> W5["Test Writer
(Sonnet)"]
    M3 --> W6["Test Runner
(Haiku)"]
    style CEO fill:#e94560,stroke:#fff,color:#fff
    style M1 fill:#0f3460,stroke:#fff,color:#fff
    style M2 fill:#0f3460,stroke:#fff,color:#fff
    style M3 fill:#0f3460,stroke:#fff,color:#fff
    style W1 fill:#4CAF50,stroke:#fff,color:#fff
    style W2 fill:#4CAF50,stroke:#fff,color:#fff
    style W3 fill:#4CAF50,stroke:#fff,color:#fff
    style W4 fill:#4CAF50,stroke:#fff,color:#fff
    style W5 fill:#4CAF50,stroke:#fff,color:#fff
    style W6 fill:#4CAF50,stroke:#fff,color:#fff

Figure 5: Hierarchical Pattern — a management hierarchy with multiple agent tiers

6.1. When do you need Hierarchical?

Only when:

You have 10+ specialized workers
Workers can be grouped into clear domains
Each domain needs its own coordination logic
A single supervisor would be overwhelmed by context length

Warning: compounding overhead

Every hierarchy tier adds latency (supervisor reasoning) and cost (reasoning tokens). A 3-tier hierarchy can cost 3-5× more than a flat supervisor. Only justified when task complexity truly demands domain separation.

7. Pattern 6: Evaluator-Optimizer Loop

This pattern runs iterative refinement: one agent generates an output, another evaluates its quality, and if it hasn't hit the threshold, it loops. Especially effective for high-quality tasks like code generation, content writing, or data transformation.

graph TD
    I["Input + Requirements"] --> G["Generator Agent"]
    G --> E["Evaluator Agent"]
    E -->|"Score < threshold"| F["Feedback"]
    F --> G
    E -->|"Score >= threshold"| O["Final Output"]
    style G fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#e94560,stroke:#fff,color:#fff
    style F fill:#ff9800,stroke:#fff,color:#fff
    style O fill:#4CAF50,stroke:#fff,color:#fff

Figure 6: Evaluator-Optimizer Loop — iterative refinement for high-quality output

7.1. Implementation key points

# Evaluator-Optimizer Loop
from claude_agent_sdk import Agent

async def eval_optimize_loop(task: str, max_iterations: int = 3):
    generator = Agent("generator", model="sonnet",
                      system="Generate solution based on requirements and feedback")
    evaluator = Agent("evaluator", model="opus",
                      system="Score output 1-10, provide specific feedback for improvement")

    output = await generator.run(task)

    for i in range(max_iterations):
        evaluation = await evaluator.run(
            f"Task: {task}\nOutput: {output}\nScore (1-10) and specific feedback:"
        )
        score = extract_score(evaluation)

        if score >= 8:
            return output  # Hit threshold

        # Feed evaluation back to generator
        output = await generator.run(
            f"Original task: {task}\nPrevious attempt: {output}\n"
            f"Feedback: {evaluation}\nImprove the output:"
        )

    return output  # Max iterations reached

7.2. Cap your iterations

Always set a max iteration count (typically 2–4). Reasons:

Diminishing returns: iteration 3+ rarely improves much
Linear cost: each iteration = full generator + evaluator cost
Infinite-loop risk: the evaluator may never be satisfied

8. Summary comparison of the 6 patterns

Pattern	Latency	Cost	Complexity	Best for
Sequential Pipeline	High (sum all)	Low-Med	Low	ETL, data processing, multi-step validation
Supervisor	Med-High	Med	Med	Task decomposition, project workflows
Fan-Out/Fan-In	Low (max one)	High (N×)	Med	Batch processing, realtime multi-source
Router	Low	Lowest	Low-Med	API gateway, customer support, mixed workload
Hierarchical	High	High	High	Large-scale org workflows, 10+ agents
Evaluator-Optimizer	Very high (N × iter)	High	Med	High-quality generation, code review, content

9. Agent connection protocols: MCP vs A2A

In the 2026 multi-agent ecosystem, two protocols are competing and complementing each other:

Criterion	MCP (Model Context Protocol)	A2A (Agent-to-Agent)
Purpose	Connect models to tools/data sources	Agents communicate directly with other agents
Architecture	Client-Server (host → MCP server)	Peer-to-peer (agent ↔ agent)
Discovery	Config file, registry	Agent Cards + REST endpoints
Origin	Anthropic → Linux Foundation	Google → Linux Foundation
Primary use cases	Tool calling, data access, context injection	Cross-org agent delegation, marketplace
Used in patterns	All (agent ↔ tool)	Hierarchical, Supervisor (agent ↔ agent)

MCP and A2A complement each other, not replace

MCP handles "vertical integration" (agent connecting to tools, databases, APIs). A2A handles "horizontal integration" (agents discovering and delegating to each other). A production system typically uses both: MCP so agents can access tools, A2A so agents can talk across boundaries.

10. SDKs & Frameworks for Production 2026

Three frameworks lead the orchestration market:

Claude Agent SDK (Anthropic)

Python v0.1.48, TypeScript v0.2.71. Deepest MCP integration, optimized for coding agents. Ships Claude Managed Agents for serverless deployment — Anthropic hosts and scales the agent for you.

OpenAI Agents SDK

Built-in handoff mechanism, tracing, guardrails. Tight integration with GPT models. Swarm-inspired architecture for multi-agent.

Google ADK (Agent Development Kit)

4 language SDKs (Python, TypeScript, Java, Go). Native A2A support. Visual Agent Designer inside Google Cloud Console. ADK agents can invoke LangGraph/CrewAI agents over A2A.

11. Picking the right pattern

Decision framework based on problem characteristics:

graph TD
    Q1{"Do steps
depend on each other?"}
    Q1 -->|"Yes"| Q2{"Need iterative
quality?"}
    Q1 -->|"No"| Q3{"Need all
results?"}
    Q2 -->|"Yes"| P6["Evaluator-Optimizer"]
    Q2 -->|"No"| Q4{"More than
10 agents?"}
    Q4 -->|"Yes"| P5["Hierarchical"]
    Q4 -->|"No"| Q5{"Need dynamic
delegation?"}
    Q5 -->|"Yes"| P2["Supervisor"]
    Q5 -->|"No"| P1["Sequential Pipeline"]
    Q3 -->|"Yes"| P3["Fan-Out/Fan-In"]
    Q3 -->|"No"| P4["Router"]
    style P1 fill:#4CAF50,stroke:#fff,color:#fff
    style P2 fill:#4CAF50,stroke:#fff,color:#fff
    style P3 fill:#4CAF50,stroke:#fff,color:#fff
    style P4 fill:#4CAF50,stroke:#fff,color:#fff
    style P5 fill:#4CAF50,stroke:#fff,color:#fff
    style P6 fill:#4CAF50,stroke:#fff,color:#fff
    style Q1 fill:#e94560,stroke:#fff,color:#fff
    style Q2 fill:#0f3460,stroke:#fff,color:#fff
    style Q3 fill:#0f3460,stroke:#fff,color:#fff
    style Q4 fill:#0f3460,stroke:#fff,color:#fff
    style Q5 fill:#0f3460,stroke:#fff,color:#fff

Figure 7: Decision tree for picking an orchestration pattern

12. Production best practices

12.1. Observability is mandatory

Multi-agent systems are many times harder to debug than single agents. Minimum requirements:

Distributed tracing: each agent call is a span, trace the entire chain
Per-agent token counting: know which agent is spending the most
Latency breakdown: P50, P95, P99 per stage
Per-agent error rate: isolate faulty agents

12.2. Cost controls

// Configure budget limits for a multi-agent system
const orchestratorConfig = {
  maxTokensPerRequest: 100_000,
  maxAgentCalls: 10,
  timeoutMs: 30_000,
  costLimitPerRequest: 0.50, // USD
  fallbackBehavior: 'return_partial' // or 'error'
};

12.3. Graceful degradation

When a single agent in the chain fails, the whole system shouldn't crash. Strategies:

Timeout + fallback: agent doesn't respond in N seconds → use cached result or simpler model
Circuit breaker: agent fails 3 times in a row → bypass and log
Partial results: return completed agents' output instead of failing the whole run

Where to start?

If you're new to multi-agent: implement Sequential Pipeline for your clearest workflow (3–4 stages). Validate output quality, measure latency and cost. Only then consider upgrading to Supervisor or Router when you have concrete data on the bottleneck. Don't start with Hierarchical — 90% of the time you don't need it.

References:

#AI Agent #Multi-Agent #system design #Claude

# AI Agent Orchestration — 6 Patterns for Production Agent Coordination in 2026

When AI systems move from "one model, one prompt" into multi-agent architecture, the biggest question is no longer *which model is the strongest* but **how to orchestrate the agents effectively**. Based on Anthropic's analysis of 200+ enterprise deployments, 57% of failed projects have a root cause in orchestration design — the individual agents are strong enough, but coordination is weak.

This post goes deep on the 6 core orchestration patterns used in production, analyzing real-world trade-offs in cost, latency, and complexity so you can pick the right one for your problem.

57%AI projects failing due to orchestration

40%Multi-agent pilots that fail within 6 months

30-60%Cost reduction with the Router pattern

6Core patterns for production

## 1. Why orchestration is make-or-break

The catch is: **adding agents doesn't automatically add value**. Without a clear orchestration pattern, you'll hit:

- **Race conditions**: agents overwriting each other's state
- **Infinite loops**: agent A calls agent B, which calls A again
- **Cost explosion**: every agent call consumes tokens — without control, costs spike 10-50x expected
- **Quality degradation**: downstream agents receive low-quality input from upstream ones

#### Golden rule

## 2. Pattern 1: Sequential Pipeline

This is the simplest pattern and should be your first choice. Agents run sequentially in a fixed chain, and each agent's output becomes the next agent's input.

```
graph LR
    A["Input"] --> B["Agent 1  
Extract"]
    B --> C["Agent 2  
Transform"]
    C --> D["Agent 3  
Validate"]
    D --> E["Agent 4  
Output"]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#0f3460,stroke:#fff,color:#fff
    style C fill:#0f3460,stroke:#fff,color:#fff
    style D fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#4CAF50,stroke:#fff,color:#fff

```

Figure 1: Sequential Pipeline — a linear, deterministic flow

### 2.1. Characteristics

- **Fixed order**: defined at design time, not changed at runtime
- **State flows through a shared object**: each agent reads/writes the same state container
- **Easy to debug**: you know exactly which step failed
- **Linear latency**: total latency = sum of all agents

### 2.2. When to use it

Pipeline fits when:

- Each step depends on the previous one's output
- The workflow has 3–5 clear steps
- No parallelism is needed
- Reliability and debuggability matter more than speed

```python
# Example: Code-review analysis pipeline
from claude_agent_sdk import Agent, Pipeline

pipeline = Pipeline([
    Agent("code_reader", model="haiku",
          system="Read code and list the main changes"),
    Agent("security_scanner", model="sonnet",
          system="Analyze security vulnerabilities from the diff"),
    Agent("style_checker", model="haiku",
          system="Check coding style and conventions"),
    Agent("summarizer", model="sonnet",
          system="Summarize findings into a review comment")
])

result = await pipeline.run(code_diff)

```

## 3. Pattern 2: Supervisor (Orchestrator-Worker)

The most common pattern in enterprise. A "smart" supervisor agent takes the task, decomposes it into sub-tasks, delegates to specialized workers, and aggregates the results.

```
graph TD
    A["Task Input"] --> S["Supervisor Agent  
(Opus/Sonnet)"]
    S --> W1["Worker 1  
Research  
(Haiku)"]
    S --> W2["Worker 2  
Analysis  
(Sonnet)"]
    S --> W3["Worker 3  
Code Gen  
(Sonnet)"]
    W1 --> S
    W2 --> S
    W3 --> S
    S --> R["Final Result"]
    style S fill:#e94560,stroke:#fff,color:#fff
    style W1 fill:#0f3460,stroke:#fff,color:#fff
    style W2 fill:#0f3460,stroke:#fff,color:#fff
    style W3 fill:#0f3460,stroke:#fff,color:#fff
    style R fill:#4CAF50,stroke:#fff,color:#fff

```

Figure 2: Supervisor Pattern — orchestrator decomposes and delegates to workers

### 3.1. Trade-offs

| Aspect | Pros | Cons |
| --- | --- | --- |
| **Cost** | Workers use cheaper models (Haiku), saves 40-60% | Supervisor reasoning adds 20-40% overhead |
| **Quality** | A strong supervisor model ensures good synthesis | Bottleneck if the supervisor misreads the request |
| **Scalability** | Easy to add new workers | Supervisor is a single point of failure |
| **Debugging** | Each worker has clear scope | Harder to trace interaction between supervisor and workers |

### 3.2. Immutable state pattern

The most important best practice when implementing Supervisor: use **immutable state snapshots**. Each agent takes state version N, processes it, and returns state version N+1. No agent mutates state directly.

```typescript
// Immutable state pattern with Claude Agent SDK
interface AgentState {
  readonly version: number;
  readonly data: Record<string, unknown>;
  readonly history: ReadonlyArray<AgentAction>;
}

function createNextState(
  current: AgentState,
  action: AgentAction,
  result: unknown
): AgentState {
  return {
    version: current.version + 1,
    data: { ...current.data, [action.key]: result },
    history: [...current.history, action]
  };
}

```

#### Common mistake

Don't let the supervisor decide *how many* workers to use — this leads to cost explosion. Instead, predefine the list of workers and let the supervisor pick *which ones* to activate for a specific task.

## 4. Pattern 3: Parallel Fan-Out / Fan-In

```
graph TD
    I["Input"] --> D["Dispatcher"]
    D -->|"Chunk 1"| A1["Agent A"]
    D -->|"Chunk 2"| A2["Agent B"]
    D -->|"Chunk 3"| A3["Agent C"]
    D -->|"Chunk 4"| A4["Agent D"]
    A1 --> AG["Aggregator"]
    A2 --> AG
    A3 --> AG
    A4 --> AG
    AG --> O["Output"]
    style D fill:#e94560,stroke:#fff,color:#fff
    style AG fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#0f3460,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#0f3460,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

```

Figure 3: Fan-Out/Fan-In — parallel processing, final aggregation

### 4.1. Cost and latency

Fan-Out **multiplies cost** by the number of agents (N agents = ~N× cost) but **latency equals the slowest agent** + aggregation overhead. This pattern is only worth it when:

- Latency is the top priority (user-facing realtime)
- The chunks are truly independent (no shared dependency)
- The budget supports N× cost

### 4.2. Real-world example

```python
# Fan-Out: analyze many documents concurrently
import asyncio
from claude_agent_sdk import Agent

async def analyze_documents(docs: list[str]) -> dict:
    analyzer = Agent("doc_analyzer", model="haiku",
                     system="Extract key insights from the document")

# Fan-Out: send all documents in parallel
    tasks = [analyzer.run(doc) for doc in docs]
    results = await asyncio.gather(*tasks)

# Fan-In: aggregate with a stronger model
    synthesizer = Agent("synthesizer", model="sonnet",
                        system="Synthesize the insights into a report")

combined_input = "\n---\n".join(
        f"Document {i+1}: {r}" for i, r in enumerate(results)
    )
    return await synthesizer.run(combined_input)

```

## 5. Pattern 4: Router (Intelligent Dispatch)

The Router pattern classifies input and routes it to the most suitable specialized agent. It's the most cost-efficient pattern because **only one agent handles each request** — the router only pays the classification cost.

```
graph TD
    I["User Request"] --> R["Router Agent  
(Haiku - fast classify)"]
    R -->|"Simple Q&A"| A1["FAQ Agent  
(Haiku)"]
    R -->|"Code task"| A2["Code Agent  
(Sonnet)"]
    R -->|"Complex reasoning"| A3["Reasoning Agent  
(Opus)"]
    R -->|"Data analysis"| A4["Analytics Agent  
(Sonnet + Tools)"]
    style R fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#4CAF50,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#9c27b0,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

```

Figure 4: Router Pattern — intelligent dispatch to the best-fit agent

### 5.1. Why is Router the most cost-efficient?

Suppose a system receives 1000 requests/day with this distribution:

- 60% simple (FAQ) → Haiku: $0.001/request
- 25% medium (code/analysis) → Sonnet: $0.015/request
- 15% complex (reasoning) → Opus: $0.075/request

**Without Router**: Sonnet for everything = 1000 × $0.015 = **$15/day**

**With Router**: (600 × $0.001) + (250 × $0.015) + (150 × $0.075) + Router overhead = **$16.1/day**, but the output quality for complex tasks is far better with Opus.

In reality, Router drops cost by 30-60% when most traffic is simple requests.

#### Tip: which model should the Router use?

## 6. Pattern 5: Hierarchical (Multi-Level)

An extension of the Supervisor pattern with multiple tiers. Fits organization-scale workflows where one supervisor isn't enough context to manage every worker.

```
graph TD
    CEO["Strategic Agent  
(Opus)"]
    CEO --> M1["Manager: Backend  
(Sonnet)"]
    CEO --> M2["Manager: Frontend  
(Sonnet)"]
    CEO --> M3["Manager: QA  
(Sonnet)"]
    M1 --> W1["API Worker  
(Haiku)"]
    M1 --> W2["DB Worker  
(Haiku)"]
    M2 --> W3["UI Worker  
(Haiku)"]
    M2 --> W4["Style Worker  
(Haiku)"]
    M3 --> W5["Test Writer  
(Sonnet)"]
    M3 --> W6["Test Runner  
(Haiku)"]
    style CEO fill:#e94560,stroke:#fff,color:#fff
    style M1 fill:#0f3460,stroke:#fff,color:#fff
    style M2 fill:#0f3460,stroke:#fff,color:#fff
    style M3 fill:#0f3460,stroke:#fff,color:#fff
    style W1 fill:#4CAF50,stroke:#fff,color:#fff
    style W2 fill:#4CAF50,stroke:#fff,color:#fff
    style W3 fill:#4CAF50,stroke:#fff,color:#fff
    style W4 fill:#4CAF50,stroke:#fff,color:#fff
    style W5 fill:#4CAF50,stroke:#fff,color:#fff
    style W6 fill:#4CAF50,stroke:#fff,color:#fff

```

Figure 5: Hierarchical Pattern — a management hierarchy with multiple agent tiers

### 6.1. When do you need Hierarchical?

Only when:

- You have 10+ specialized workers
- Workers can be grouped into clear domains
- Each domain needs its own coordination logic
- A single supervisor would be overwhelmed by context length

#### Warning: compounding overhead

## 7. Pattern 6: Evaluator-Optimizer Loop

```
graph TD
    I["Input + Requirements"] --> G["Generator Agent"]
    G --> E["Evaluator Agent"]
    E -->|"Score < threshold"| F["Feedback"]
    F --> G
    E -->|"Score >= threshold"| O["Final Output"]
    style G fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#e94560,stroke:#fff,color:#fff
    style F fill:#ff9800,stroke:#fff,color:#fff
    style O fill:#4CAF50,stroke:#fff,color:#fff

```

Figure 6: Evaluator-Optimizer Loop — iterative refinement for high-quality output

### 7.1. Implementation key points

```python
# Evaluator-Optimizer Loop
from claude_agent_sdk import Agent

async def eval_optimize_loop(task: str, max_iterations: int = 3):
    generator = Agent("generator", model="sonnet",
                      system="Generate solution based on requirements and feedback")
    evaluator = Agent("evaluator", model="opus",
                      system="Score output 1-10, provide specific feedback for improvement")

output = await generator.run(task)

for i in range(max_iterations):
        evaluation = await evaluator.run(
            f"Task: {task}\nOutput: {output}\nScore (1-10) and specific feedback:"
        )
        score = extract_score(evaluation)

if score >= 8:
            return output  # Hit threshold

# Feed evaluation back to generator
        output = await generator.run(
            f"Original task: {task}\nPrevious attempt: {output}\n"
            f"Feedback: {evaluation}\nImprove the output:"
        )

return output  # Max iterations reached

```

### 7.2. Cap your iterations

**Always set a max iteration count** (typically 2–4). Reasons:

- Diminishing returns: iteration 3+ rarely improves much
- Linear cost: each iteration = full generator + evaluator cost
- Infinite-loop risk: the evaluator may never be satisfied

## 8. Summary comparison of the 6 patterns

| Pattern | Latency | Cost | Complexity | Best for |
| --- | --- | --- | --- | --- |
| **Sequential Pipeline** | High (sum all) | Low-Med | Low | ETL, data processing, multi-step validation |
| **Supervisor** | Med-High | Med | Med | Task decomposition, project workflows |
| **Fan-Out/Fan-In** | Low (max one) | High (N×) | Med | Batch processing, realtime multi-source |
| **Router** | Low | Lowest | Low-Med | API gateway, customer support, mixed workload |
| **Hierarchical** | High | High | High | Large-scale org workflows, 10+ agents |
| **Evaluator-Optimizer** | Very high (N × iter) | High | Med | High-quality generation, code review, content |

## 9. Agent connection protocols: MCP vs A2A

In the 2026 multi-agent ecosystem, two protocols are competing and complementing each other:

| Criterion | MCP (Model Context Protocol) | A2A (Agent-to-Agent) |
| --- | --- | --- |
| **Purpose** | Connect models to tools/data sources | Agents communicate directly with other agents |
| **Architecture** | Client-Server (host → MCP server) | Peer-to-peer (agent ↔ agent) |
| **Discovery** | Config file, registry | Agent Cards + REST endpoints |
| **Origin** | Anthropic → Linux Foundation | Google → Linux Foundation |
| **Primary use cases** | Tool calling, data access, context injection | Cross-org agent delegation, marketplace |
| **Used in patterns** | All (agent ↔ tool) | Hierarchical, Supervisor (agent ↔ agent) |

#### MCP and A2A complement each other, not replace

## 10. SDKs & Frameworks for Production 2026

Three frameworks lead the orchestration market:

Claude Agent SDK (Anthropic)

Python v0.1.48, TypeScript v0.2.71. Deepest MCP integration, optimized for coding agents. Ships **Claude Managed Agents** for serverless deployment — Anthropic hosts and scales the agent for you.

OpenAI Agents SDK

Built-in handoff mechanism, tracing, guardrails. Tight integration with GPT models. Swarm-inspired architecture for multi-agent.

Google ADK (Agent Development Kit)

4 language SDKs (Python, TypeScript, Java, Go). Native A2A support. Visual Agent Designer inside Google Cloud Console. ADK agents can invoke LangGraph/CrewAI agents over A2A.

## 11. Picking the right pattern

Decision framework based on problem characteristics:

```
graph TD
    Q1{"Do steps  
depend on each other?"}
    Q1 -->|"Yes"| Q2{"Need iterative  
quality?"}
    Q1 -->|"No"| Q3{"Need all  
results?"}
    Q2 -->|"Yes"| P6["Evaluator-Optimizer"]
    Q2 -->|"No"| Q4{"More than  
10 agents?"}
    Q4 -->|"Yes"| P5["Hierarchical"]
    Q4 -->|"No"| Q5{"Need dynamic  
delegation?"}
    Q5 -->|"Yes"| P2["Supervisor"]
    Q5 -->|"No"| P1["Sequential Pipeline"]
    Q3 -->|"Yes"| P3["Fan-Out/Fan-In"]
    Q3 -->|"No"| P4["Router"]
    style P1 fill:#4CAF50,stroke:#fff,color:#fff
    style P2 fill:#4CAF50,stroke:#fff,color:#fff
    style P3 fill:#4CAF50,stroke:#fff,color:#fff
    style P4 fill:#4CAF50,stroke:#fff,color:#fff
    style P5 fill:#4CAF50,stroke:#fff,color:#fff
    style P6 fill:#4CAF50,stroke:#fff,color:#fff
    style Q1 fill:#e94560,stroke:#fff,color:#fff
    style Q2 fill:#0f3460,stroke:#fff,color:#fff
    style Q3 fill:#0f3460,stroke:#fff,color:#fff
    style Q4 fill:#0f3460,stroke:#fff,color:#fff
    style Q5 fill:#0f3460,stroke:#fff,color:#fff

```

Figure 7: Decision tree for picking an orchestration pattern

## 12. Production best practices

### 12.1. Observability is mandatory

Multi-agent systems are many times harder to debug than single agents. Minimum requirements:

- **Distributed tracing**: each agent call is a span, trace the entire chain
- **Per-agent token counting**: know which agent is spending the most
- **Latency breakdown**: P50, P95, P99 per stage
- **Per-agent error rate**: isolate faulty agents

### 12.2. Cost controls

```typescript
// Configure budget limits for a multi-agent system
const orchestratorConfig = {
  maxTokensPerRequest: 100_000,
  maxAgentCalls: 10,
  timeoutMs: 30_000,
  costLimitPerRequest: 0.50, // USD
  fallbackBehavior: 'return_partial' // or 'error'
};

```

### 12.3. Graceful degradation

When a single agent in the chain fails, the whole system shouldn't crash. Strategies:

- **Timeout + fallback**: agent doesn't respond in N seconds → use cached result or simpler model
- **Circuit breaker**: agent fails 3 times in a row → bypass and log
- **Partial results**: return completed agents' output instead of failing the whole run

#### Where to start?

If you're new to multi-agent: implement **Sequential Pipeline** for your clearest workflow (3–4 stages). Validate output quality, measure latency and cost. Only then consider upgrading to Supervisor or Router when you have concrete data on the bottleneck. Don't start with Hierarchical — 90% of the time you don't need it.

**References:**

- [AI Agent Orchestration Patterns — Azure Architecture Center](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns)
- [6 Multi-Agent Orchestration Patterns for Production — Beam AI](https://beam.ai/agentic-insights/multi-agent-orchestration-patterns-production)
- [Claude Agents SDK vs OpenAI Agents SDK vs Google ADK — Composio](https://composio.dev/content/claude-agents-sdk-vs-openai-agents-sdk-vs-google-adk)
- [MCP vs A2A: Complete Guide to AI Agent Protocols 2026 — DEV Community](https://dev.to/pockit_tools/mcp-vs-a2a-the-complete-guide-to-ai-agent-protocols-in-2026-30li)
- [Multi Agent Architecture: Patterns, Use Cases & Production Reality — TrueFoundry](https://www.truefoundry.com/blog/multi-agent-architecture)

Vue 3 Performance 2026 - Optimizing rendering from component to bundle

Distributed Caching: Designing a Distributed Cache System from A to Z

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.