AI Agent Orchestration — 6 Pattern điều phối Agent trong Production 2026

Posted on: 4/21/2026 1:10:44 AM

Table of contents

1. Tại sao Orchestration là yếu tố sống còn?
1. Nguyên tắc vàng
2. Pattern 1: Sequential Pipeline
1. 2.1. Đặc điểm
2. 2.2. Khi nào dùng?
3. Pattern 2: Supervisor (Orchestrator-Worker)
1. 3.1. Trade-offs
2. 3.2. Immutable State Pattern
  1. Sai lầm phổ biến
4. Pattern 3: Parallel Fan-Out / Fan-In
1. 4.1. Chi phí và Latency
2. 4.2. Ví dụ thực tế
5. Pattern 4: Router (Intelligent Dispatch)
1. 5.1. Tại sao Router tiết kiệm nhất?
  1. Tip: Router nên dùng model nào?
6. Pattern 5: Hierarchical (Multi-Level)
1. 6.1. Khi nào cần Hierarchical?
  1. Cảnh báo: Overhead tích lũy
7. Pattern 6: Evaluator-Optimizer Loop
1. 7.1. Implementation Key Points
2. 7.2. Giới hạn iterations
8. So sánh tổng hợp 6 Patterns
9. Protocols kết nối Agents: MCP vs A2A
1. MCP và A2A bổ trợ nhau, không thay thế
10. SDK & Frameworks cho Production 2026
11. Hướng dẫn chọn Pattern phù hợp
12. Best Practices cho Production

Khi hệ thống AI chuyển từ "một model, một prompt" sang kiến trúc multi-agent, câu hỏi lớn nhất không còn là model nào mạnh nhất mà là điều phối các agent như thế nào cho hiệu quả. Theo phân tích của Anthropic trên 200+ deployment enterprise, 57% dự án thất bại có nguyên nhân gốc rễ từ thiết kế orchestration — agent riêng lẻ đủ mạnh nhưng phối hợp kém.

Bài viết này đi sâu vào 6 orchestration pattern cốt lõi đang được sử dụng trong production, phân tích trade-off thực tế về chi phí, latency, và độ phức tạp để bạn chọn đúng pattern cho bài toán của mình.

57%Dự án AI thất bại do orchestration

40%Multi-agent pilot fail trong 6 tháng

30-60%Giảm chi phí với Router pattern

6Core patterns cho production

1. Tại sao Orchestration là yếu tố sống còn?

Một single-agent đủ tốt cho task đơn giản: viết email, tóm tắt tài liệu, generate code snippet. Nhưng khi bài toán phức tạp hơn — phân tích dữ liệu từ nhiều nguồn, tự động hóa quy trình có nhiều bước phụ thuộc, hoặc xử lý input đa dạng — bạn cần nhiều agent chuyên biệt phối hợp với nhau.

Vấn đề là: thêm agent không tự động thêm giá trị. Nếu không có orchestration pattern rõ ràng, bạn sẽ gặp:

Race conditions: agents ghi đè state của nhau
Infinite loops: agent A gọi agent B, B lại gọi lại A
Cost explosion: mỗi agent call tiêu tốn token, không kiểm soát sẽ gấp 10-50x chi phí dự kiến
Quality degradation: agent downstream nhận input kém chất lượng từ agent upstream

Nguyên tắc vàng

Luôn bắt đầu với pattern đơn giản nhất có thể giải quyết bài toán. Upgrade lên pattern phức tạp hơn chỉ khi có bằng chứng cụ thể (metrics) rằng pattern hiện tại không đáp ứng yêu cầu. Premature orchestration complexity là anti-pattern phổ biến nhất trong production.

2. Pattern 1: Sequential Pipeline

Đây là pattern đơn giản nhất và nên là lựa chọn đầu tiên. Các agent thực thi tuần tự trong một chuỗi xác định, output của agent trước trở thành input của agent sau.

graph LR
    A["Input"] --> B["Agent 1
Extract"]
    B --> C["Agent 2
Transform"]
    C --> D["Agent 3
Validate"]
    D --> E["Agent 4
Output"]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#0f3460,stroke:#fff,color:#fff
    style C fill:#0f3460,stroke:#fff,color:#fff
    style D fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#4CAF50,stroke:#fff,color:#fff

Hình 1: Sequential Pipeline — luồng xử lý tuyến tính, deterministic

2.1. Đặc điểm

Thứ tự cố định: được xác định lúc thiết kế, không thay đổi runtime
State truyền qua shared object: mỗi agent đọc/ghi vào cùng một state container
Dễ debug: biết chính xác lỗi xảy ra ở bước nào
Latency tuyến tính: tổng latency = sum of all agents

2.2. Khi nào dùng?

Pipeline phù hợp khi:

Mỗi bước phụ thuộc output của bước trước
Workflow có 3-5 bước rõ ràng
Không cần xử lý song song
Ưu tiên reliability và debugability hơn speed

# Ví dụ: Pipeline phân tích code review
from claude_agent_sdk import Agent, Pipeline

pipeline = Pipeline([
    Agent("code_reader", model="haiku",
          system="Đọc code và liệt kê các thay đổi chính"),
    Agent("security_scanner", model="sonnet",
          system="Phân tích security vulnerabilities từ diff"),
    Agent("style_checker", model="haiku",
          system="Kiểm tra coding style và conventions"),
    Agent("summarizer", model="sonnet",
          system="Tổng hợp findings thành review comment")
])

result = await pipeline.run(code_diff)

3. Pattern 2: Supervisor (Orchestrator-Worker)

Pattern phổ biến nhất trong enterprise. Một agent "thông minh" (supervisor) nhận task, phân tách thành sub-tasks, delegate cho worker agents chuyên biệt, và tổng hợp kết quả.

graph TD
    A["Task Input"] --> S["Supervisor Agent
(Opus/Sonnet)"]
    S --> W1["Worker 1
Research
(Haiku)"]
    S --> W2["Worker 2
Analysis
(Sonnet)"]
    S --> W3["Worker 3
Code Gen
(Sonnet)"]
    W1 --> S
    W2 --> S
    W3 --> S
    S --> R["Final Result"]
    style S fill:#e94560,stroke:#fff,color:#fff
    style W1 fill:#0f3460,stroke:#fff,color:#fff
    style W2 fill:#0f3460,stroke:#fff,color:#fff
    style W3 fill:#0f3460,stroke:#fff,color:#fff
    style R fill:#4CAF50,stroke:#fff,color:#fff

Hình 2: Supervisor Pattern — orchestrator phân tách và delegate cho workers

3.1. Trade-offs

Tiêu chí	Ưu điểm	Nhược điểm
Chi phí	Workers dùng model rẻ (Haiku), tiết kiệm 40-60%	Supervisor reasoning thêm 20-40% overhead
Chất lượng	Supervisor model mạnh đảm bảo tổng hợp tốt	Bottleneck nếu supervisor hiểu sai yêu cầu
Scalability	Thêm worker mới dễ dàng	Single point of failure ở supervisor
Debugging	Mỗi worker có scope rõ ràng	Khó trace interaction giữa supervisor và workers

3.2. Immutable State Pattern

Best practice quan trọng nhất khi implement Supervisor: sử dụng immutable state snapshots. Mỗi agent nhận state version N, xử lý, và trả về state version N+1. Không agent nào mutate state trực tiếp.

// Immutable state pattern với Claude Agent SDK
interface AgentState {
  readonly version: number;
  readonly data: Record<string, unknown>;
  readonly history: ReadonlyArray<AgentAction>;
}

function createNextState(
  current: AgentState,
  action: AgentAction,
  result: unknown
): AgentState {
  return {
    version: current.version + 1,
    data: { ...current.data, [action.key]: result },
    history: [...current.history, action]
  };
}

Sai lầm phổ biến

Đừng để supervisor agent tự quyết định bao nhiêu workers cần dùng — điều này dẫn tới cost explosion. Thay vào đó, define trước danh sách workers và để supervisor chỉ chọn workers nào cần activate cho task cụ thể.

4. Pattern 3: Parallel Fan-Out / Fan-In

Khi bài toán có thể chia thành các phần độc lập xử lý song song, Fan-Out giúp giảm latency đáng kể. Một dispatcher phân phối task cho N agents đồng thời, một aggregator thu thập và tổng hợp kết quả.

graph TD
    I["Input"] --> D["Dispatcher"]
    D -->|"Chunk 1"| A1["Agent A"]
    D -->|"Chunk 2"| A2["Agent B"]
    D -->|"Chunk 3"| A3["Agent C"]
    D -->|"Chunk 4"| A4["Agent D"]
    A1 --> AG["Aggregator"]
    A2 --> AG
    A3 --> AG
    A4 --> AG
    AG --> O["Output"]
    style D fill:#e94560,stroke:#fff,color:#fff
    style AG fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#0f3460,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#0f3460,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

Hình 3: Fan-Out/Fan-In — xử lý song song, aggregation cuối cùng

4.1. Chi phí và Latency

Fan-Out nhân chi phí theo số agent (N agents = ~N× cost) nhưng latency chỉ bằng agent chậm nhất + overhead aggregation. Pattern này chỉ đáng dùng khi:

Latency là ưu tiên số 1 (user-facing realtime)
Các chunk thực sự độc lập (không shared dependency)
Budget cho phép chi phí gấp N lần

4.2. Ví dụ thực tế

# Fan-Out: phân tích nhiều documents cùng lúc
import asyncio
from claude_agent_sdk import Agent

async def analyze_documents(docs: list[str]) -> dict:
    analyzer = Agent("doc_analyzer", model="haiku",
                     system="Trích xuất key insights từ document")

    # Fan-Out: gửi tất cả documents song song
    tasks = [analyzer.run(doc) for doc in docs]
    results = await asyncio.gather(*tasks)

    # Fan-In: tổng hợp với model mạnh hơn
    synthesizer = Agent("synthesizer", model="sonnet",
                        system="Tổng hợp insights thành report")

    combined_input = "\n---\n".join(
        f"Document {i+1}: {r}" for i, r in enumerate(results)
    )
    return await synthesizer.run(combined_input)

5. Pattern 4: Router (Intelligent Dispatch)

Router pattern phân loại input và điều hướng tới agent chuyên biệt phù hợp nhất. Đây là pattern hiệu quả nhất về chi phí vì chỉ một agent xử lý mỗi request — router chỉ tốn chi phí phân loại.

graph TD
    I["User Request"] --> R["Router Agent
(Haiku - fast classify)"]
    R -->|"Simple Q&A"| A1["FAQ Agent
(Haiku)"]
    R -->|"Code task"| A2["Code Agent
(Sonnet)"]
    R -->|"Complex reasoning"| A3["Reasoning Agent
(Opus)"]
    R -->|"Data analysis"| A4["Analytics Agent
(Sonnet + Tools)"]
    style R fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#4CAF50,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#9c27b0,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

Hình 4: Router Pattern — intelligent dispatch tới agent phù hợp nhất

5.1. Tại sao Router tiết kiệm nhất?

Giả sử hệ thống nhận 1000 request/ngày với phân bố:

60% simple (FAQ) → Haiku: $0.001/request
25% medium (code/analysis) → Sonnet: $0.015/request
15% complex (reasoning) → Opus: $0.075/request

Không có Router: dùng Sonnet cho tất cả = 1000 × $0.015 = $15/ngày

Có Router: (600 × $0.001) + (250 × $0.015) + (150 × $0.075) + Router overhead = $16.1/ngày nhưng chất lượng output cho complex tasks tốt hơn nhiều với Opus.

Thực tế, Router giảm chi phí 30-60% khi phần lớn traffic là simple requests.

Tip: Router nên dùng model nào?

Router chỉ cần phân loại intent — dùng Haiku với structured output (JSON schema). Latency thêm ~200ms nhưng tiết kiệm lớn ở downstream. Nếu cần chính xác hơn, dùng Sonnet cho router — vẫn rẻ hơn so với "all-Opus".

6. Pattern 5: Hierarchical (Multi-Level)

Mở rộng từ Supervisor pattern lên nhiều tầng. Phù hợp cho organization-scale workflows nơi một supervisor không đủ context để quản lý tất cả workers.

graph TD
    CEO["Strategic Agent
(Opus)"]
    CEO --> M1["Manager: Backend
(Sonnet)"]
    CEO --> M2["Manager: Frontend
(Sonnet)"]
    CEO --> M3["Manager: QA
(Sonnet)"]
    M1 --> W1["API Worker
(Haiku)"]
    M1 --> W2["DB Worker
(Haiku)"]
    M2 --> W3["UI Worker
(Haiku)"]
    M2 --> W4["Style Worker
(Haiku)"]
    M3 --> W5["Test Writer
(Sonnet)"]
    M3 --> W6["Test Runner
(Haiku)"]
    style CEO fill:#e94560,stroke:#fff,color:#fff
    style M1 fill:#0f3460,stroke:#fff,color:#fff
    style M2 fill:#0f3460,stroke:#fff,color:#fff
    style M3 fill:#0f3460,stroke:#fff,color:#fff
    style W1 fill:#4CAF50,stroke:#fff,color:#fff
    style W2 fill:#4CAF50,stroke:#fff,color:#fff
    style W3 fill:#4CAF50,stroke:#fff,color:#fff
    style W4 fill:#4CAF50,stroke:#fff,color:#fff
    style W5 fill:#4CAF50,stroke:#fff,color:#fff
    style W6 fill:#4CAF50,stroke:#fff,color:#fff

Hình 5: Hierarchical Pattern — management hierarchy với nhiều cấp agent

6.1. Khi nào cần Hierarchical?

Pattern này chỉ nên dùng khi:

Có 10+ workers chuyên biệt
Workers có thể group thành domains rõ ràng
Mỗi domain cần coordination logic riêng
Một supervisor đơn lẻ sẽ bị overwhelm bởi context length

Cảnh báo: Overhead tích lũy

Mỗi tầng hierarchy thêm latency (supervisor reasoning) và cost (reasoning tokens). 3 tầng hierarchy có thể tốn gấp 3-5× so với flat supervisor. Chỉ justify được khi task complexity thực sự đòi hỏi domain separation.

7. Pattern 6: Evaluator-Optimizer Loop

Pattern này chạy iterative refinement: một agent generate output, agent khác đánh giá chất lượng, nếu chưa đạt threshold thì loop lại. Đặc biệt hiệu quả cho tasks cần chất lượng cao như code generation, content writing, hoặc data transformation.

graph TD
    I["Input + Requirements"] --> G["Generator Agent"]
    G --> E["Evaluator Agent"]
    E -->|"Score < threshold"| F["Feedback"]
    F --> G
    E -->|"Score >= threshold"| O["Final Output"]
    style G fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#e94560,stroke:#fff,color:#fff
    style F fill:#ff9800,stroke:#fff,color:#fff
    style O fill:#4CAF50,stroke:#fff,color:#fff

Hình 6: Evaluator-Optimizer Loop — iterative refinement cho output chất lượng cao

7.1. Implementation Key Points

# Evaluator-Optimizer Loop
from claude_agent_sdk import Agent

async def eval_optimize_loop(task: str, max_iterations: int = 3):
    generator = Agent("generator", model="sonnet",
                      system="Generate solution based on requirements and feedback")
    evaluator = Agent("evaluator", model="opus",
                      system="Score output 1-10, provide specific feedback for improvement")

    output = await generator.run(task)

    for i in range(max_iterations):
        evaluation = await evaluator.run(
            f"Task: {task}\nOutput: {output}\nScore (1-10) and specific feedback:"
        )
        score = extract_score(evaluation)

        if score >= 8:
            return output  # Đạt threshold

        # Feed evaluation back to generator
        output = await generator.run(
            f"Original task: {task}\nPrevious attempt: {output}\n"
            f"Feedback: {evaluation}\nImprove the output:"
        )

    return output  # Max iterations reached

7.2. Giới hạn iterations

Luôn đặt max iterations (thường 2-4). Lý do:

Diminishing returns: iteration 3+ hiếm khi cải thiện đáng kể
Cost linear: mỗi iteration = full cost of generator + evaluator
Infinite loop risk: evaluator có thể không bao giờ hài lòng

8. So sánh tổng hợp 6 Patterns

Pattern	Latency	Cost	Complexity	Best for
Sequential Pipeline	Cao (sum all)	Thấp-TB	Thấp	ETL, data processing, multi-step validation
Supervisor	TB-Cao	TB	TB	Task decomposition, project workflows
Fan-Out/Fan-In	Thấp (max one)	Cao (N×)	TB	Batch processing, realtime multi-source
Router	Thấp	Thấp nhất	Thấp-TB	API gateway, customer support, mixed workload
Hierarchical	Cao	Cao	Cao	Large-scale org workflows, 10+ agents
Evaluator-Optimizer	Rất cao (N×iter)	Cao	TB	High-quality generation, code review, content

9. Protocols kết nối Agents: MCP vs A2A

Trong hệ sinh thái multi-agent 2026, hai protocol chính đang cạnh tranh và bổ trợ nhau:

Tiêu chí	MCP (Model Context Protocol)	A2A (Agent-to-Agent)
Mục đích	Kết nối model với tools/data sources	Agent giao tiếp trực tiếp với agent khác
Kiến trúc	Client-Server (host → MCP server)	Peer-to-peer (agent ↔ agent)
Discovery	Config file, registry	Agent Cards + REST endpoints
Khởi xướng	Anthropic → Linux Foundation	Google → Linux Foundation
Use case chính	Tool calling, data access, context injection	Cross-org agent delegation, marketplace
Dùng trong pattern	Tất cả (agent ↔ tool)	Hierarchical, Supervisor (agent ↔ agent)

MCP và A2A bổ trợ nhau, không thay thế

MCP xử lý "vertical integration" (agent kết nối tools, databases, APIs). A2A xử lý "horizontal integration" (agents discover và delegate cho nhau). Một production system thường dùng cả hai: MCP để agents access tools, A2A để agents giao tiếp cross-boundary.

10. SDK & Frameworks cho Production 2026

Ba framework chính đang dẫn đầu thị trường orchestration:

Claude Agent SDK (Anthropic)

Python v0.1.48, TypeScript v0.2.71. Tích hợp MCP sâu nhất, tối ưu cho coding agents. Có Claude Managed Agents cho serverless deployment — Anthropic host và scale agent cho bạn.

OpenAI Agents SDK

Built-in handoff mechanism, tracing, guardrails. Tight integration với GPT models. Swarm-inspired architecture cho multi-agent.

Google ADK (Agent Development Kit)

4 language SDKs (Python, TypeScript, Java, Go). Native A2A support. Visual Agent Designer trong Google Cloud Console. ADK agent có thể invoke LangGraph/CrewAI agents qua A2A.

11. Hướng dẫn chọn Pattern phù hợp

Decision framework dựa trên đặc điểm bài toán:

graph TD
    Q1{"Các bước có
phụ thuộc nhau?"}
    Q1 -->|"Có"| Q2{"Cần quality
iterative?"}
    Q1 -->|"Không"| Q3{"Cần tất cả
kết quả?"}
    Q2 -->|"Có"| P6["Evaluator-Optimizer"]
    Q2 -->|"Không"| Q4{"Nhiều hơn
10 agents?"}
    Q4 -->|"Có"| P5["Hierarchical"]
    Q4 -->|"Không"| Q5{"Cần dynamic
delegation?"}
    Q5 -->|"Có"| P2["Supervisor"]
    Q5 -->|"Không"| P1["Sequential Pipeline"]
    Q3 -->|"Có"| P3["Fan-Out/Fan-In"]
    Q3 -->|"Không"| P4["Router"]
    style P1 fill:#4CAF50,stroke:#fff,color:#fff
    style P2 fill:#4CAF50,stroke:#fff,color:#fff
    style P3 fill:#4CAF50,stroke:#fff,color:#fff
    style P4 fill:#4CAF50,stroke:#fff,color:#fff
    style P5 fill:#4CAF50,stroke:#fff,color:#fff
    style P6 fill:#4CAF50,stroke:#fff,color:#fff
    style Q1 fill:#e94560,stroke:#fff,color:#fff
    style Q2 fill:#0f3460,stroke:#fff,color:#fff
    style Q3 fill:#0f3460,stroke:#fff,color:#fff
    style Q4 fill:#0f3460,stroke:#fff,color:#fff
    style Q5 fill:#0f3460,stroke:#fff,color:#fff

Hình 7: Decision tree chọn orchestration pattern

12. Best Practices cho Production

12.1. Observability là bắt buộc

Multi-agent system phức tạp hơn single-agent gấp nhiều lần về debugging. Minimum cần có:

Distributed tracing: mỗi agent call là một span, trace toàn bộ chain
Token counting per agent: biết agent nào đang tiêu tốn nhiều nhất
Latency breakdown: P50, P95, P99 cho từng stage
Error rate per agent: isolate agent gây lỗi

12.2. Cost Controls

// Cấu hình budget limits cho multi-agent system
const orchestratorConfig = {
  maxTokensPerRequest: 100_000,
  maxAgentCalls: 10,
  timeoutMs: 30_000,
  costLimitPerRequest: 0.50, // USD
  fallbackBehavior: 'return_partial' // hoặc 'error'
};

12.3. Graceful Degradation

Khi một agent trong chain fail, hệ thống không nên crash toàn bộ. Strategies:

Timeout + fallback: agent không response trong N giây → dùng cached result hoặc simpler model
Circuit breaker: agent fail 3 lần liên tiếp → bypass và log
Partial results: trả kết quả những agents đã complete thay vì fail toàn bộ

Bắt đầu từ đâu?

Nếu bạn mới bắt đầu với multi-agent: hãy implement Sequential Pipeline cho workflow rõ ràng nhất của bạn (3-4 stages). Validate chất lượng output, measure latency và cost. Sau đó mới xem xét upgrade lên Supervisor hoặc Router khi có data cụ thể về bottleneck. Đừng bắt đầu với Hierarchical — 90% trường hợp bạn không cần nó.

Nguồn tham khảo:

#AI Agent #Multi-Agent #system design #Claude

# AI Agent Orchestration — 6 Pattern điều phối Agent trong Production 2026

Khi hệ thống AI chuyển từ "một model, một prompt" sang kiến trúc multi-agent, câu hỏi lớn nhất không còn là *model nào mạnh nhất* mà là **điều phối các agent như thế nào cho hiệu quả**. Theo phân tích của Anthropic trên 200+ deployment enterprise, 57% dự án thất bại có nguyên nhân gốc rễ từ thiết kế orchestration — agent riêng lẻ đủ mạnh nhưng phối hợp kém.

57%Dự án AI thất bại do orchestration

40%Multi-agent pilot fail trong 6 tháng

30-60%Giảm chi phí với Router pattern

6Core patterns cho production

## 1. Tại sao Orchestration là yếu tố sống còn?

Vấn đề là: **thêm agent không tự động thêm giá trị**. Nếu không có orchestration pattern rõ ràng, bạn sẽ gặp:

- **Race conditions**: agents ghi đè state của nhau
- **Infinite loops**: agent A gọi agent B, B lại gọi lại A
- **Cost explosion**: mỗi agent call tiêu tốn token, không kiểm soát sẽ gấp 10-50x chi phí dự kiến
- **Quality degradation**: agent downstream nhận input kém chất lượng từ agent upstream

#### Nguyên tắc vàng

## 2. Pattern 1: Sequential Pipeline

```
graph LR
    A["Input"] --> B["Agent 1  
Extract"]
    B --> C["Agent 2  
Transform"]
    C --> D["Agent 3  
Validate"]
    D --> E["Agent 4  
Output"]
    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#0f3460,stroke:#fff,color:#fff
    style C fill:#0f3460,stroke:#fff,color:#fff
    style D fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#4CAF50,stroke:#fff,color:#fff

```

Hình 1: Sequential Pipeline — luồng xử lý tuyến tính, deterministic

### 2.1. Đặc điểm

- **Thứ tự cố định**: được xác định lúc thiết kế, không thay đổi runtime
- **State truyền qua shared object**: mỗi agent đọc/ghi vào cùng một state container
- **Dễ debug**: biết chính xác lỗi xảy ra ở bước nào
- **Latency tuyến tính**: tổng latency = sum of all agents

### 2.2. Khi nào dùng?

Pipeline phù hợp khi:

- Mỗi bước phụ thuộc output của bước trước
- Workflow có 3-5 bước rõ ràng
- Không cần xử lý song song
- Ưu tiên reliability và debugability hơn speed

```python
# Ví dụ: Pipeline phân tích code review
from claude_agent_sdk import Agent, Pipeline

pipeline = Pipeline([
    Agent("code_reader", model="haiku",
          system="Đọc code và liệt kê các thay đổi chính"),
    Agent("security_scanner", model="sonnet",
          system="Phân tích security vulnerabilities từ diff"),
    Agent("style_checker", model="haiku",
          system="Kiểm tra coding style và conventions"),
    Agent("summarizer", model="sonnet",
          system="Tổng hợp findings thành review comment")
])

result = await pipeline.run(code_diff)

```

## 3. Pattern 2: Supervisor (Orchestrator-Worker)

```
graph TD
    A["Task Input"] --> S["Supervisor Agent  
(Opus/Sonnet)"]
    S --> W1["Worker 1  
Research  
(Haiku)"]
    S --> W2["Worker 2  
Analysis  
(Sonnet)"]
    S --> W3["Worker 3  
Code Gen  
(Sonnet)"]
    W1 --> S
    W2 --> S
    W3 --> S
    S --> R["Final Result"]
    style S fill:#e94560,stroke:#fff,color:#fff
    style W1 fill:#0f3460,stroke:#fff,color:#fff
    style W2 fill:#0f3460,stroke:#fff,color:#fff
    style W3 fill:#0f3460,stroke:#fff,color:#fff
    style R fill:#4CAF50,stroke:#fff,color:#fff

```

Hình 2: Supervisor Pattern — orchestrator phân tách và delegate cho workers

### 3.1. Trade-offs

| Tiêu chí | Ưu điểm | Nhược điểm |
| --- | --- | --- |
| **Chi phí** | Workers dùng model rẻ (Haiku), tiết kiệm 40-60% | Supervisor reasoning thêm 20-40% overhead |
| **Chất lượng** | Supervisor model mạnh đảm bảo tổng hợp tốt | Bottleneck nếu supervisor hiểu sai yêu cầu |
| **Scalability** | Thêm worker mới dễ dàng | Single point of failure ở supervisor |
| **Debugging** | Mỗi worker có scope rõ ràng | Khó trace interaction giữa supervisor và workers |

### 3.2. Immutable State Pattern

Best practice quan trọng nhất khi implement Supervisor: sử dụng **immutable state snapshots**. Mỗi agent nhận state version N, xử lý, và trả về state version N+1. Không agent nào mutate state trực tiếp.

```typescript
// Immutable state pattern với Claude Agent SDK
interface AgentState {
  readonly version: number;
  readonly data: Record<string, unknown>;
  readonly history: ReadonlyArray<AgentAction>;
}

function createNextState(
  current: AgentState,
  action: AgentAction,
  result: unknown
): AgentState {
  return {
    version: current.version + 1,
    data: { ...current.data, [action.key]: result },
    history: [...current.history, action]
  };
}

```

#### Sai lầm phổ biến

Đừng để supervisor agent tự quyết định *bao nhiêu* workers cần dùng — điều này dẫn tới cost explosion. Thay vào đó, define trước danh sách workers và để supervisor chỉ chọn *workers nào* cần activate cho task cụ thể.

## 4. Pattern 3: Parallel Fan-Out / Fan-In

```
graph TD
    I["Input"] --> D["Dispatcher"]
    D -->|"Chunk 1"| A1["Agent A"]
    D -->|"Chunk 2"| A2["Agent B"]
    D -->|"Chunk 3"| A3["Agent C"]
    D -->|"Chunk 4"| A4["Agent D"]
    A1 --> AG["Aggregator"]
    A2 --> AG
    A3 --> AG
    A4 --> AG
    AG --> O["Output"]
    style D fill:#e94560,stroke:#fff,color:#fff
    style AG fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#0f3460,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#0f3460,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

```

Hình 3: Fan-Out/Fan-In — xử lý song song, aggregation cuối cùng

### 4.1. Chi phí và Latency

Fan-Out **nhân chi phí** theo số agent (N agents = ~N× cost) nhưng **latency chỉ bằng agent chậm nhất** + overhead aggregation. Pattern này chỉ đáng dùng khi:

- Latency là ưu tiên số 1 (user-facing realtime)
- Các chunk thực sự độc lập (không shared dependency)
- Budget cho phép chi phí gấp N lần

### 4.2. Ví dụ thực tế

```python
# Fan-Out: phân tích nhiều documents cùng lúc
import asyncio
from claude_agent_sdk import Agent

async def analyze_documents(docs: list[str]) -> dict:
    analyzer = Agent("doc_analyzer", model="haiku",
                     system="Trích xuất key insights từ document")

# Fan-Out: gửi tất cả documents song song
    tasks = [analyzer.run(doc) for doc in docs]
    results = await asyncio.gather(*tasks)

# Fan-In: tổng hợp với model mạnh hơn
    synthesizer = Agent("synthesizer", model="sonnet",
                        system="Tổng hợp insights thành report")

combined_input = "\n---\n".join(
        f"Document {i+1}: {r}" for i, r in enumerate(results)
    )
    return await synthesizer.run(combined_input)

```

## 5. Pattern 4: Router (Intelligent Dispatch)

Router pattern phân loại input và điều hướng tới agent chuyên biệt phù hợp nhất. Đây là pattern hiệu quả nhất về chi phí vì **chỉ một agent xử lý mỗi request** — router chỉ tốn chi phí phân loại.

```
graph TD
    I["User Request"] --> R["Router Agent  
(Haiku - fast classify)"]
    R -->|"Simple Q&A"| A1["FAQ Agent  
(Haiku)"]
    R -->|"Code task"| A2["Code Agent  
(Sonnet)"]
    R -->|"Complex reasoning"| A3["Reasoning Agent  
(Opus)"]
    R -->|"Data analysis"| A4["Analytics Agent  
(Sonnet + Tools)"]
    style R fill:#e94560,stroke:#fff,color:#fff
    style A1 fill:#4CAF50,stroke:#fff,color:#fff
    style A2 fill:#0f3460,stroke:#fff,color:#fff
    style A3 fill:#9c27b0,stroke:#fff,color:#fff
    style A4 fill:#0f3460,stroke:#fff,color:#fff

```

Hình 4: Router Pattern — intelligent dispatch tới agent phù hợp nhất

### 5.1. Tại sao Router tiết kiệm nhất?

Giả sử hệ thống nhận 1000 request/ngày với phân bố:

- 60% simple (FAQ) → Haiku: $0.001/request
- 25% medium (code/analysis) → Sonnet: $0.015/request
- 15% complex (reasoning) → Opus: $0.075/request

**Không có Router**: dùng Sonnet cho tất cả = 1000 × $0.015 = **$15/ngày**

**Có Router**: (600 × $0.001) + (250 × $0.015) + (150 × $0.075) + Router overhead = **$16.1/ngày** nhưng chất lượng output cho complex tasks tốt hơn nhiều với Opus.

Thực tế, Router giảm chi phí 30-60% khi phần lớn traffic là simple requests.

#### Tip: Router nên dùng model nào?

## 6. Pattern 5: Hierarchical (Multi-Level)

Mở rộng từ Supervisor pattern lên nhiều tầng. Phù hợp cho organization-scale workflows nơi một supervisor không đủ context để quản lý tất cả workers.

```
graph TD
    CEO["Strategic Agent  
(Opus)"]
    CEO --> M1["Manager: Backend  
(Sonnet)"]
    CEO --> M2["Manager: Frontend  
(Sonnet)"]
    CEO --> M3["Manager: QA  
(Sonnet)"]
    M1 --> W1["API Worker  
(Haiku)"]
    M1 --> W2["DB Worker  
(Haiku)"]
    M2 --> W3["UI Worker  
(Haiku)"]
    M2 --> W4["Style Worker  
(Haiku)"]
    M3 --> W5["Test Writer  
(Sonnet)"]
    M3 --> W6["Test Runner  
(Haiku)"]
    style CEO fill:#e94560,stroke:#fff,color:#fff
    style M1 fill:#0f3460,stroke:#fff,color:#fff
    style M2 fill:#0f3460,stroke:#fff,color:#fff
    style M3 fill:#0f3460,stroke:#fff,color:#fff
    style W1 fill:#4CAF50,stroke:#fff,color:#fff
    style W2 fill:#4CAF50,stroke:#fff,color:#fff
    style W3 fill:#4CAF50,stroke:#fff,color:#fff
    style W4 fill:#4CAF50,stroke:#fff,color:#fff
    style W5 fill:#4CAF50,stroke:#fff,color:#fff
    style W6 fill:#4CAF50,stroke:#fff,color:#fff

```

Hình 5: Hierarchical Pattern — management hierarchy với nhiều cấp agent

### 6.1. Khi nào cần Hierarchical?

Pattern này chỉ nên dùng khi:

- Có 10+ workers chuyên biệt
- Workers có thể group thành domains rõ ràng
- Mỗi domain cần coordination logic riêng
- Một supervisor đơn lẻ sẽ bị overwhelm bởi context length

#### Cảnh báo: Overhead tích lũy

## 7. Pattern 6: Evaluator-Optimizer Loop

```
graph TD
    I["Input + Requirements"] --> G["Generator Agent"]
    G --> E["Evaluator Agent"]
    E -->|"Score < threshold"| F["Feedback"]
    F --> G
    E -->|"Score >= threshold"| O["Final Output"]
    style G fill:#0f3460,stroke:#fff,color:#fff
    style E fill:#e94560,stroke:#fff,color:#fff
    style F fill:#ff9800,stroke:#fff,color:#fff
    style O fill:#4CAF50,stroke:#fff,color:#fff

```

Hình 6: Evaluator-Optimizer Loop — iterative refinement cho output chất lượng cao

### 7.1. Implementation Key Points

```python
# Evaluator-Optimizer Loop
from claude_agent_sdk import Agent

async def eval_optimize_loop(task: str, max_iterations: int = 3):
    generator = Agent("generator", model="sonnet",
                      system="Generate solution based on requirements and feedback")
    evaluator = Agent("evaluator", model="opus",
                      system="Score output 1-10, provide specific feedback for improvement")

output = await generator.run(task)

for i in range(max_iterations):
        evaluation = await evaluator.run(
            f"Task: {task}\nOutput: {output}\nScore (1-10) and specific feedback:"
        )
        score = extract_score(evaluation)

if score >= 8:
            return output  # Đạt threshold

# Feed evaluation back to generator
        output = await generator.run(
            f"Original task: {task}\nPrevious attempt: {output}\n"
            f"Feedback: {evaluation}\nImprove the output:"
        )

return output  # Max iterations reached

```

### 7.2. Giới hạn iterations

**Luôn đặt max iterations** (thường 2-4). Lý do:

- Diminishing returns: iteration 3+ hiếm khi cải thiện đáng kể
- Cost linear: mỗi iteration = full cost of generator + evaluator
- Infinite loop risk: evaluator có thể không bao giờ hài lòng

## 8. So sánh tổng hợp 6 Patterns

| Pattern | Latency | Cost | Complexity | Best for |
| --- | --- | --- | --- | --- |
| **Sequential Pipeline** | Cao (sum all) | Thấp-TB | Thấp | ETL, data processing, multi-step validation |
| **Supervisor** | TB-Cao | TB | TB | Task decomposition, project workflows |
| **Fan-Out/Fan-In** | Thấp (max one) | Cao (N×) | TB | Batch processing, realtime multi-source |
| **Router** | Thấp | Thấp nhất | Thấp-TB | API gateway, customer support, mixed workload |
| **Hierarchical** | Cao | Cao | Cao | Large-scale org workflows, 10+ agents |
| **Evaluator-Optimizer** | Rất cao (N×iter) | Cao | TB | High-quality generation, code review, content |

## 9. Protocols kết nối Agents: MCP vs A2A

Trong hệ sinh thái multi-agent 2026, hai protocol chính đang cạnh tranh và bổ trợ nhau:

| Tiêu chí | MCP (Model Context Protocol) | A2A (Agent-to-Agent) |
| --- | --- | --- |
| **Mục đích** | Kết nối model với tools/data sources | Agent giao tiếp trực tiếp với agent khác |
| **Kiến trúc** | Client-Server (host → MCP server) | Peer-to-peer (agent ↔ agent) |
| **Discovery** | Config file, registry | Agent Cards + REST endpoints |
| **Khởi xướng** | Anthropic → Linux Foundation | Google → Linux Foundation |
| **Use case chính** | Tool calling, data access, context injection | Cross-org agent delegation, marketplace |
| **Dùng trong pattern** | Tất cả (agent ↔ tool) | Hierarchical, Supervisor (agent ↔ agent) |

#### MCP và A2A bổ trợ nhau, không thay thế

## 10. SDK & Frameworks cho Production 2026

Ba framework chính đang dẫn đầu thị trường orchestration:

Claude Agent SDK (Anthropic)

Python v0.1.48, TypeScript v0.2.71. Tích hợp MCP sâu nhất, tối ưu cho coding agents. Có **Claude Managed Agents** cho serverless deployment — Anthropic host và scale agent cho bạn.

OpenAI Agents SDK

Built-in handoff mechanism, tracing, guardrails. Tight integration với GPT models. Swarm-inspired architecture cho multi-agent.

Google ADK (Agent Development Kit)

4 language SDKs (Python, TypeScript, Java, Go). Native A2A support. Visual Agent Designer trong Google Cloud Console. ADK agent có thể invoke LangGraph/CrewAI agents qua A2A.

## 11. Hướng dẫn chọn Pattern phù hợp

Decision framework dựa trên đặc điểm bài toán:

```
graph TD
    Q1{"Các bước có  
phụ thuộc nhau?"}
    Q1 -->|"Có"| Q2{"Cần quality  
iterative?"}
    Q1 -->|"Không"| Q3{"Cần tất cả  
kết quả?"}
    Q2 -->|"Có"| P6["Evaluator-Optimizer"]
    Q2 -->|"Không"| Q4{"Nhiều hơn  
10 agents?"}
    Q4 -->|"Có"| P5["Hierarchical"]
    Q4 -->|"Không"| Q5{"Cần dynamic  
delegation?"}
    Q5 -->|"Có"| P2["Supervisor"]
    Q5 -->|"Không"| P1["Sequential Pipeline"]
    Q3 -->|"Có"| P3["Fan-Out/Fan-In"]
    Q3 -->|"Không"| P4["Router"]
    style P1 fill:#4CAF50,stroke:#fff,color:#fff
    style P2 fill:#4CAF50,stroke:#fff,color:#fff
    style P3 fill:#4CAF50,stroke:#fff,color:#fff
    style P4 fill:#4CAF50,stroke:#fff,color:#fff
    style P5 fill:#4CAF50,stroke:#fff,color:#fff
    style P6 fill:#4CAF50,stroke:#fff,color:#fff
    style Q1 fill:#e94560,stroke:#fff,color:#fff
    style Q2 fill:#0f3460,stroke:#fff,color:#fff
    style Q3 fill:#0f3460,stroke:#fff,color:#fff
    style Q4 fill:#0f3460,stroke:#fff,color:#fff
    style Q5 fill:#0f3460,stroke:#fff,color:#fff

```

Hình 7: Decision tree chọn orchestration pattern

## 12. Best Practices cho Production

### 12.1. Observability là bắt buộc

Multi-agent system phức tạp hơn single-agent gấp nhiều lần về debugging. Minimum cần có:

- **Distributed tracing**: mỗi agent call là một span, trace toàn bộ chain
- **Token counting per agent**: biết agent nào đang tiêu tốn nhiều nhất
- **Latency breakdown**: P50, P95, P99 cho từng stage
- **Error rate per agent**: isolate agent gây lỗi

### 12.2. Cost Controls

```typescript
// Cấu hình budget limits cho multi-agent system
const orchestratorConfig = {
  maxTokensPerRequest: 100_000,
  maxAgentCalls: 10,
  timeoutMs: 30_000,
  costLimitPerRequest: 0.50, // USD
  fallbackBehavior: 'return_partial' // hoặc 'error'
};

```

### 12.3. Graceful Degradation

Khi một agent trong chain fail, hệ thống không nên crash toàn bộ. Strategies:

- **Timeout + fallback**: agent không response trong N giây → dùng cached result hoặc simpler model
- **Circuit breaker**: agent fail 3 lần liên tiếp → bypass và log
- **Partial results**: trả kết quả những agents đã complete thay vì fail toàn bộ

#### Bắt đầu từ đâu?

Nếu bạn mới bắt đầu với multi-agent: hãy implement **Sequential Pipeline** cho workflow rõ ràng nhất của bạn (3-4 stages). Validate chất lượng output, measure latency và cost. Sau đó mới xem xét upgrade lên Supervisor hoặc Router khi có data cụ thể về bottleneck. Đừng bắt đầu với Hierarchical — 90% trường hợp bạn không cần nó.

**Nguồn tham khảo:**

- [AI Agent Orchestration Patterns — Azure Architecture Center](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns)
- [6 Multi-Agent Orchestration Patterns for Production — Beam AI](https://beam.ai/agentic-insights/multi-agent-orchestration-patterns-production)
- [Claude Agents SDK vs OpenAI Agents SDK vs Google ADK — Composio](https://composio.dev/content/claude-agents-sdk-vs-openai-agents-sdk-vs-google-adk)
- [MCP vs A2A: Complete Guide to AI Agent Protocols 2026 — DEV Community](https://dev.to/pockit_tools/mcp-vs-a2a-the-complete-guide-to-ai-agent-protocols-in-2026-30li)
- [Multi Agent Architecture: Patterns, Use Cases & Production Reality — TrueFoundry](https://www.truefoundry.com/blog/multi-agent-architecture)

Vue 3 Performance 2026 - Tối ưu rendering từ component đến bundle

Distributed Caching: Thiết kế hệ thống Cache phân tán từ A đến Z

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.