LangGraph — Orchestrating Complex AI Agents with Graph Architecture

Posted on: 5/8/2026 10:15:02 AM

As AI Agents grow beyond simple chatbots into complex, multi-step, multi-tool automation systems, the critical question shifts from "which LLM to use" to "how to orchestrate agent workflows reliably in production." LangGraph, LangChain's graph-based framework, has emerged as the leading answer: it models entire agent workflows as stateful directed graphs with built-in persistence, human-in-the-loop controls, and multi-agent orchestration from the ground up.

This article takes a deep dive into LangGraph's core architecture, essential patterns for building production-ready AI Agents, and a practical comparison with competing frameworks like CrewAI and AutoGen.

34%Enterprise agent framework market share (Gartner Q1/2026)

76%Complex task benchmark score (highest among frameworks)

40K+GitHub Stars

2Supported languages (Python & TypeScript)

1. What is LangGraph

LangGraph is a low-level orchestration framework for building, managing, and deploying stateful, long-running AI Agents. Instead of linear pipelines like traditional LangChain, LangGraph models workflows as directed cyclic graphs — allowing cycles (loops), conditional branching, and checkpointing at every node.

This solves a fundamental problem: real-world agents don't run sequentially from A to Z. They need to retry when results are unsatisfactory, branch based on feedback, pause for human approval, and recover from failures. LangGraph is designed precisely for these requirements.

LangGraph ≠ LangChain

LangGraph is an independent library that can be used without LangChain. It focuses on the orchestration layer — managing execution flow — while LangChain provides abstractions for LLM calls, prompt templates, and tool integrations. Many production deployments use only LangGraph + direct LLM SDK, skipping LangChain entirely.

2. Core Architecture — State, Node, Edge

Everything in LangGraph revolves around three concepts: State (shared data), Node (processing unit), and Edge (conditional flow). These three components form a StateGraph — a graph with typed state that gets incrementally updated through each node.

graph TD
    START(["__start__"]) --> A["Agent Node
(LLM reasoning)"]
    A -->|"tool_calls detected"| B["Tool Node
(execute tools)"]
    A -->|"no tool_calls"| END(["__end__"])
    B --> A

    style START fill:#4CAF50,stroke:#fff,color:#fff
    style END fill:#e94560,stroke:#fff,color:#fff
    style A fill:#2c3e50,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 1: Basic ReAct Agent loop — Agent reasons, calls tools, receives results, repeats until done

2.1. State — Shared Data

State in LangGraph is a typed dictionary representing the current snapshot of the entire workflow. Each node receives state as input, processes it, and returns a partial update — LangGraph automatically merges it into the shared state instead of overwriting everything.

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    next_action: str
    iteration_count: int

The reducer mechanism (like add_messages) defines how state gets merged: append to lists, accumulate numbers, or apply custom logic. This is the foundation for parallel node execution without conflicts — each node updates its own fields, and reducers merge the results.

2.2. Node — Processing Unit

A Node is a Python function (or TypeScript function) that receives state and returns a partial state update. Nodes can be:

LLM call: send messages to a model, receive response
Tool execution: run functions/APIs based on tool_calls from the LLM
Pure logic: transform data, validate, filter
Subgraph: a nested StateGraph that runs as a single node

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

def agent_node(state: AgentState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

def tool_node(state: AgentState):
    last_message = state["messages"][-1]
    results = execute_tools(last_message.tool_calls)
    return {"messages": results}

2.3. Edge — Conditional Flow

Edges connect nodes and determine which node runs next. LangGraph supports two types:

Normal edge: always goes from node A to node B
Conditional edge: a function that receives state and returns the next node name

from langgraph.graph import StateGraph

graph = StateGraph(AgentState)

graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)

graph.set_entry_point("agent")

graph.add_conditional_edges(
    "agent",
    should_continue,  # function returns "tools" or "__end__"
    {"tools": "tools", "__end__": "__end__"}
)
graph.add_edge("tools", "agent")  # after tool execution, return to agent

Conditional edges are the heart of LangGraph — they let agents self-determine the execution flow based on current state, enabling complex workflows that linear pipelines simply cannot express.

3. Persistence & Checkpointer

3.1. Why Persistence Matters

Production agents don't finish in a single request. They may need to wait hours for approval, get interrupted by deployments, or crash mid-execution. Without persistence, all progress is lost. LangGraph solves this with Checkpointers — automatically saving state after every node execution.

graph LR
    A["Node A
executes"] -->|"save state"| CP[("Checkpointer
PostgreSQL / Redis")]
    CP -->|"load state"| B["Node B
executes"]
    B -->|"save state"| CP
    CP -->|"crash recovery"| B

    style A fill:#2c3e50,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style CP fill:#e94560,stroke:#fff,color:#fff

Figure 2: Checkpointer saves state after each node — enabling resume after crash or restart

3.2. Checkpointer Types

Checkpointer	Use When	Characteristics
MemorySaver	Development, testing	In-memory, lost on restart
SqliteSaver	Single-process, prototype	File-based, simple
PostgresSaver	Production	Multi-process, durable, scales well
RedisSaver	High-throughput production	In-memory + persistence, TTL support

from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = "postgresql://user:pass@localhost:5432/langgraph_db"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()  # create tables if needed
    app = graph.compile(checkpointer=checkpointer)

    # Each thread_id is a separate conversation/workflow instance
    config = {"configurable": {"thread_id": "order-processing-42"}}
    result = app.invoke(initial_state, config)

Each thread_id represents a workflow instance. You can resume any thread by invoking with the same thread_id — state will be loaded from the last checkpoint.

4. Human-in-the-Loop

4.1. The Interrupt Mechanism

One of LangGraph's most powerful features is interrupt — pausing a workflow at any node, waiting for human input (potentially hours or days later), then resuming exactly where it stopped.

Real-world use case

A refund processing agent: it automatically analyzes the request, checks order history, calculates the amount — but pauses for manager approval before actually transferring funds. Without interrupt, you'd have to build state persistence, queuing, and polling yourself — LangGraph handles it all.

from langgraph.types import interrupt, Command

def approval_node(state: AgentState):
    # Pause workflow, send info to human
    decision = interrupt({
        "question": "Approve refund of $150 for order #42?",
        "options": ["approve", "reject", "escalate"]
    })

    if decision == "approve":
        return Command(goto="process_refund")
    elif decision == "reject":
        return Command(goto="notify_customer_rejected")
    else:
        return Command(goto="escalate_to_senior")

# Resume workflow after human decision
app.invoke(
    Command(resume="approve"),
    config={"configurable": {"thread_id": "refund-request-42"}}
)

When interrupt() is called, LangGraph saves the entire state to the checkpointer, marks the thread as interrupted, and returns control to the caller. When the human sends their decision via Command(resume=...), the workflow continues exactly from the line after interrupt().

5. Multi-Agent Patterns

LangGraph supports three primary patterns for building multi-agent systems:

5.1. Supervisor Pattern

A central agent (supervisor) orchestrates specialized agents (workers). The supervisor decides which worker handles the next task based on current state and previous results.

graph TD
    S["Supervisor Agent
(orchestrator)"] -->|"research task"| R["Research Agent"]
    S -->|"code task"| C["Coding Agent"]
    S -->|"review task"| V["Review Agent"]
    R -->|"result"| S
    C -->|"result"| S
    V -->|"result"| S
    S -->|"complete"| END(["__end__"])

    style S fill:#e94560,stroke:#fff,color:#fff
    style R fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style V fill:#2c3e50,stroke:#fff,color:#fff
    style END fill:#4CAF50,stroke:#fff,color:#fff

Figure 3: Supervisor Pattern — a central agent delegates to specialized workers

from langgraph_supervisor import create_supervisor

supervisor = create_supervisor(
    model=ChatOpenAI(model="gpt-4o"),
    agents=[research_agent, coding_agent, review_agent],
    prompt="You are a tech lead. Delegate tasks to the right team member."
)

app = supervisor.compile()

5.2. Subgraph & Hierarchical Teams

For more complex systems, you can nest subgraphs within the main graph — each team becomes a subgraph with its own supervisor. The top-level graph only sees team-level nodes without knowing internal details.

# Research Team: 3 specialized agents
research_team = StateGraph(ResearchState)
research_team.add_node("web_searcher", web_search_agent)
research_team.add_node("analyst", data_analyst_agent)
research_team.add_node("team_lead", research_supervisor)
research_subgraph = research_team.compile()

# Main graph: compose teams
main_graph = StateGraph(MainState)
main_graph.add_node("research_team", research_subgraph)
main_graph.add_node("dev_team", dev_subgraph)
main_graph.add_node("orchestrator", orchestrator_node)

5.3. Handoff Pattern

Instead of routing through a central supervisor, agents can directly hand off control to another agent with a payload. This pattern works well when the processing flow has a clear sequence.

from langgraph.prebuilt import create_react_agent
from langgraph.types import Command

def transfer_to_billing(state):
    """Transfer to billing agent for payment processing."""
    return Command(
        goto="billing_agent",
        update={"context": "Customer needs billing help"}
    )

support_agent = create_react_agent(
    model=model,
    tools=[transfer_to_billing, search_knowledge_base]
)

6. Comparison with CrewAI and AutoGen

Criteria	LangGraph	CrewAI	AutoGen
Architecture	Graph-based (nodes & edges)	Role-based (crew & tasks)	Conversational (chat-based)
State management	Typed state + reducers, incremental update	Basic shared memory	Chat history as state
Persistence	Built-in checkpointer (Postgres, Redis)	No native support	No native support
Human-in-the-loop	interrupt() API — pause/resume any node	Manual via callback	Chat-based input
Benchmark (medium tasks)	76%	71%	68%
Learning curve	High — requires graph theory understanding	Low — role/task is intuitive	Medium
Production readiness	Highest — deterministic execution	Good for prototyping	Maintenance mode (Microsoft shifted to Agent Framework)
Enterprise adoption	Uber, JP Morgan, Klara	Startups, SMBs	Azure ecosystem
Languages	Python, TypeScript	Python	Python, .NET

AutoGen is in maintenance mode

Microsoft has shifted focus to its broader Agent Framework, and major feature development for AutoGen has stopped. If you're building a new system, consider LangGraph or CrewAI instead of AutoGen.

7. Production Deployment

7.1. LangSmith Deployment

LangGraph Platform (now renamed to LangSmith Deployment) provides purpose-built infrastructure for deploying agents:

Option	Description	Best For
Cloud SaaS	Hosted by LangChain, zero-ops	Startups, rapid prototyping
BYOC (AWS)	Runs in your VPC, LangChain manages provisioning	Enterprise needing data sovereignty
Self-hosted	Full control on your Kubernetes cluster	Regulated industries (finance, healthcare)
Standalone	Lightweight — Agent Server + Postgres + Redis only	Small teams, single-service deployment

7.2. Self-hosted Architecture

The self-hosted architecture consists of: Control Plane (manages deployment, routing) and Data Plane (Agent Servers running graphs). The Data Plane requires PostgreSQL (state + checkpoints) and Redis (task queue + pub/sub). Kubernetes is mandatory for both planes.

graph TB
    subgraph CP["Control Plane"]
        API["LangSmith API"]
        UI["Dashboard UI"]
    end

    subgraph DP["Data Plane"]
        AS1["Agent Server 1"]
        AS2["Agent Server 2"]
        AS3["Agent Server N"]
    end

    PG[("PostgreSQL
State & Checkpoints")]
    RD[("Redis
Task Queue")]

    UI --> API
    API --> AS1
    API --> AS2
    API --> AS3
    AS1 --> PG
    AS2 --> PG
    AS3 --> PG
    AS1 --> RD
    AS2 --> RD
    AS3 --> RD

    style CP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DP fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style PG fill:#e94560,stroke:#fff,color:#fff
    style RD fill:#2c3e50,stroke:#fff,color:#fff
    style API fill:#4CAF50,stroke:#fff,color:#fff
    style UI fill:#4CAF50,stroke:#fff,color:#fff
    style AS1 fill:#2c3e50,stroke:#fff,color:#fff
    style AS2 fill:#2c3e50,stroke:#fff,color:#fff
    style AS3 fill:#2c3e50,stroke:#fff,color:#fff

Figure 4: Self-hosted LangGraph Architecture — Control Plane manages, Data Plane runs agents

8. When to Use LangGraph

Use LangGraph when

Complex workflows with multiple steps, branching, and loops — like order processing systems, data analysis pipelines, or multi-tool AI assistants. Persistence required — workflows running for hours that need to survive restarts/crashes. Human-in-the-loop — human approval needed at critical steps. Multi-agent — multiple specialized agents need coordination.

Skip LangGraph when

Simple chatbot — if you just need LLM + a few tools, use create_react_agent or the LLM SDK directly. Quick prototyping — CrewAI has a much lower learning curve. Conversation-heavy — if agents mainly chat back and forth between roles, AutoGen fits better.

9. Best Practices

Start small: Build a single-agent ReAct loop first, add complexity as needed. Don't jump straight into multi-agent supervisor.
Strict state typing: Use TypedDict with full type hints. Untyped state becomes impossible to debug as graphs grow complex.
Checkpointer from day one: Use MemorySaver for dev, switch to PostgresSaver for staging/production. Don't add persistence later — refactoring will be painful.
Keep nodes small: Each node should do one thing. A "god node" that calls LLM, parses, and validates is extremely hard to test and debug.
Observability: Integrate LangSmith tracing to visualize graph execution. When an agent makes wrong decisions, traces show you exactly which node went wrong.

10. Conclusion

LangGraph has proven its position as the leading framework for building production-ready AI Agents. Its stateful graph architecture solves problems that linear pipelines cannot: loops, branching, persistence, and human-in-the-loop. With the trust of Uber, JP Morgan, Klara and 34% enterprise market share, LangGraph isn't just a framework — it's shaping how the industry builds AI Agents.

If you're transitioning from prototype to production, LangGraph is worth the investment to master. Start with a simple ReAct agent, add persistence, then expand to multi-agent — each step has the right abstraction waiting for you.

References:

#LangGraph #AI Agents #Agentic AI #Python #Multi-Agent Systems

# LangGraph — Orchestrating Complex AI Agents with Graph Architecture

As AI Agents grow beyond simple chatbots into complex, multi-step, multi-tool automation systems, the critical question shifts from "which LLM to use" to "how to orchestrate agent workflows reliably in production." **LangGraph**, LangChain's graph-based framework, has emerged as the leading answer: it models entire agent workflows as *stateful directed graphs* with built-in persistence, human-in-the-loop controls, and multi-agent orchestration from the ground up.

34%Enterprise agent framework market share (Gartner Q1/2026)

76%Complex task benchmark score (highest among frameworks)

40K+GitHub Stars

2Supported languages (Python & TypeScript)

## 1. What is LangGraph

LangGraph is a **low-level orchestration framework** for building, managing, and deploying stateful, long-running AI Agents. Instead of linear pipelines like traditional LangChain, LangGraph models workflows as **directed cyclic graphs** — allowing cycles (loops), conditional branching, and checkpointing at every node.

#### LangGraph ≠ LangChain

LangGraph is an **independent library** that can be used without LangChain. It focuses on the orchestration layer — managing execution flow — while LangChain provides abstractions for LLM calls, prompt templates, and tool integrations. Many production deployments use only LangGraph + direct LLM SDK, skipping LangChain entirely.

## 2. Core Architecture — State, Node, Edge

Everything in LangGraph revolves around three concepts: **State** (shared data), **Node** (processing unit), and **Edge** (conditional flow). These three components form a `StateGraph` — a graph with typed state that gets incrementally updated through each node.

```
graph TD
    START(["__start__"]) --> A["Agent Node  
(LLM reasoning)"]
    A -->|"tool_calls detected"| B["Tool Node  
(execute tools)"]
    A -->|"no tool_calls"| END(["__end__"])
    B --> A

style START fill:#4CAF50,stroke:#fff,color:#fff
    style END fill:#e94560,stroke:#fff,color:#fff
    style A fill:#2c3e50,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```

Figure 1: Basic ReAct Agent loop — Agent reasons, calls tools, receives results, repeats until done

### 2.1. State — Shared Data

State in LangGraph is a **typed dictionary** representing the current snapshot of the entire workflow. Each node receives state as input, processes it, and returns a *partial update* — LangGraph automatically merges it into the shared state instead of overwriting everything.

```python
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    next_action: str
    iteration_count: int

```
The **reducer** mechanism (like `add_messages`) defines how state gets merged: append to lists, accumulate numbers, or apply custom logic. This is the foundation for parallel node execution without conflicts — each node updates its own fields, and reducers merge the results.

### 2.2. Node — Processing Unit

A Node is a **Python function** (or TypeScript function) that receives state and returns a partial state update. Nodes can be:

- **LLM call**: send messages to a model, receive response
- **Tool execution**: run functions/APIs based on tool_calls from the LLM
- **Pure logic**: transform data, validate, filter
- **Subgraph**: a nested StateGraph that runs as a single node

```python
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

def agent_node(state: AgentState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

def tool_node(state: AgentState):
    last_message = state["messages"][-1]
    results = execute_tools(last_message.tool_calls)
    return {"messages": results}

```

### 2.3. Edge — Conditional Flow

Edges connect nodes and determine which node runs next. LangGraph supports two types:

- **Normal edge**: always goes from node A to node B
- **Conditional edge**: a function that receives state and returns the next node name

```python
from langgraph.graph import StateGraph

graph = StateGraph(AgentState)

graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)

graph.set_entry_point("agent")

graph.add_conditional_edges(
    "agent",
    should_continue,  # function returns "tools" or "__end__"
    {"tools": "tools", "__end__": "__end__"}
)
graph.add_edge("tools", "agent")  # after tool execution, return to agent

```
Conditional edges are the heart of LangGraph — they let agents **self-determine** the execution flow based on current state, enabling complex workflows that linear pipelines simply cannot express.

## 3. Persistence & Checkpointer

### 3.1. Why Persistence Matters

Production agents don't finish in a single request. They may need to wait hours for approval, get interrupted by deployments, or crash mid-execution. Without persistence, all progress is lost. LangGraph solves this with **Checkpointers** — automatically saving state after every node execution.

style A fill:#2c3e50,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style CP fill:#e94560,stroke:#fff,color:#fff

```

Figure 2: Checkpointer saves state after each node — enabling resume after crash or restart

### 3.2. Checkpointer Types

| Checkpointer | Use When | Characteristics |
| --- | --- | --- |
| **MemorySaver** | Development, testing | In-memory, lost on restart |
| **SqliteSaver** | Single-process, prototype | File-based, simple |
| **PostgresSaver** | Production | Multi-process, durable, scales well |
| **RedisSaver** | High-throughput production | In-memory + persistence, TTL support |

```python
from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = "postgresql://user:pass@localhost:5432/langgraph_db"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()  # create tables if needed
    app = graph.compile(checkpointer=checkpointer)

# Each thread_id is a separate conversation/workflow instance
    config = {"configurable": {"thread_id": "order-processing-42"}}
    result = app.invoke(initial_state, config)

```
Each `thread_id` represents a workflow instance. You can resume any thread by invoking with the same thread_id — state will be loaded from the last checkpoint.

## 4. Human-in-the-Loop

### 4.1. The Interrupt Mechanism

One of LangGraph's most powerful features is **interrupt** — pausing a workflow at any node, waiting for human input (potentially hours or days later), then resuming exactly where it stopped.

#### Real-world use case

A refund processing agent: it automatically analyzes the request, checks order history, calculates the amount — but **pauses for manager approval** before actually transferring funds. Without interrupt, you'd have to build state persistence, queuing, and polling yourself — LangGraph handles it all.

```python
from langgraph.types import interrupt, Command

def approval_node(state: AgentState):
    # Pause workflow, send info to human
    decision = interrupt({
        "question": "Approve refund of $150 for order #42?",
        "options": ["approve", "reject", "escalate"]
    })

if decision == "approve":
        return Command(goto="process_refund")
    elif decision == "reject":
        return Command(goto="notify_customer_rejected")
    else:
        return Command(goto="escalate_to_senior")

```

```python
# Resume workflow after human decision
app.invoke(
    Command(resume="approve"),
    config={"configurable": {"thread_id": "refund-request-42"}}
)

```
When `interrupt()` is called, LangGraph saves the entire state to the checkpointer, marks the thread as *interrupted*, and returns control to the caller. When the human sends their decision via `Command(resume=...)`, the workflow continues exactly from the line after `interrupt()`.

## 5. Multi-Agent Patterns

LangGraph supports three primary patterns for building multi-agent systems:

### 5.1. Supervisor Pattern

A central agent (supervisor) orchestrates specialized agents (workers). The supervisor decides which worker handles the next task based on current state and previous results.

style S fill:#e94560,stroke:#fff,color:#fff
    style R fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style V fill:#2c3e50,stroke:#fff,color:#fff
    style END fill:#4CAF50,stroke:#fff,color:#fff

```

Figure 3: Supervisor Pattern — a central agent delegates to specialized workers

```python
from langgraph_supervisor import create_supervisor

supervisor = create_supervisor(
    model=ChatOpenAI(model="gpt-4o"),
    agents=[research_agent, coding_agent, review_agent],
    prompt="You are a tech lead. Delegate tasks to the right team member."
)

app = supervisor.compile()

```

### 5.2. Subgraph & Hierarchical Teams

```python
# Research Team: 3 specialized agents
research_team = StateGraph(ResearchState)
research_team.add_node("web_searcher", web_search_agent)
research_team.add_node("analyst", data_analyst_agent)
research_team.add_node("team_lead", research_supervisor)
research_subgraph = research_team.compile()

# Main graph: compose teams
main_graph = StateGraph(MainState)
main_graph.add_node("research_team", research_subgraph)
main_graph.add_node("dev_team", dev_subgraph)
main_graph.add_node("orchestrator", orchestrator_node)

```

### 5.3. Handoff Pattern

Instead of routing through a central supervisor, agents can **directly hand off** control to another agent with a payload. This pattern works well when the processing flow has a clear sequence.

```python
from langgraph.prebuilt import create_react_agent
from langgraph.types import Command

def transfer_to_billing(state):
    """Transfer to billing agent for payment processing."""
    return Command(
        goto="billing_agent",
        update={"context": "Customer needs billing help"}
    )

support_agent = create_react_agent(
    model=model,
    tools=[transfer_to_billing, search_knowledge_base]
)

```

## 6. Comparison with CrewAI and AutoGen

| Criteria | LangGraph | CrewAI | AutoGen |
| --- | --- | --- | --- |
| **Architecture** | Graph-based (nodes & edges) | Role-based (crew & tasks) | Conversational (chat-based) |
| **State management** | Typed state + reducers, incremental update | Basic shared memory | Chat history as state |
| **Persistence** | Built-in checkpointer (Postgres, Redis) | No native support | No native support |
| **Human-in-the-loop** | interrupt() API — pause/resume any node | Manual via callback | Chat-based input |
| **Benchmark (medium tasks)** | 76% | 71% | 68% |
| **Learning curve** | High — requires graph theory understanding | Low — role/task is intuitive | Medium |
| **Production readiness** | Highest — deterministic execution | Good for prototyping | Maintenance mode (Microsoft shifted to Agent Framework) |
| **Enterprise adoption** | Uber, JP Morgan, Klara | Startups, SMBs | Azure ecosystem |
| **Languages** | Python, TypeScript | Python | Python, .NET |

#### AutoGen is in maintenance mode

Microsoft has shifted focus to its broader Agent Framework, and major feature development for AutoGen has stopped. If you're building a new system, consider LangGraph or CrewAI instead of AutoGen.

## 7. Production Deployment

### 7.1. LangSmith Deployment

LangGraph Platform (now renamed to **LangSmith Deployment**) provides purpose-built infrastructure for deploying agents:

| Option | Description | Best For |
| --- | --- | --- |
| **Cloud SaaS** | Hosted by LangChain, zero-ops | Startups, rapid prototyping |
| **BYOC (AWS)** | Runs in your VPC, LangChain manages provisioning | Enterprise needing data sovereignty |
| **Self-hosted** | Full control on your Kubernetes cluster | Regulated industries (finance, healthcare) |
| **Standalone** | Lightweight — Agent Server + Postgres + Redis only | Small teams, single-service deployment |

### 7.2. Self-hosted Architecture

The self-hosted architecture consists of: **Control Plane** (manages deployment, routing) and **Data Plane** (Agent Servers running graphs). The Data Plane requires PostgreSQL (state + checkpoints) and Redis (task queue + pub/sub). Kubernetes is mandatory for both planes.

```
graph TB
    subgraph CP["Control Plane"]
        API["LangSmith API"]
        UI["Dashboard UI"]
    end

subgraph DP["Data Plane"]
        AS1["Agent Server 1"]
        AS2["Agent Server 2"]
        AS3["Agent Server N"]
    end

PG[("PostgreSQL  
State & Checkpoints")]
    RD[("Redis  
Task Queue")]

UI --> API
    API --> AS1
    API --> AS2
    API --> AS3
    AS1 --> PG
    AS2 --> PG
    AS3 --> PG
    AS1 --> RD
    AS2 --> RD
    AS3 --> RD

style CP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DP fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style PG fill:#e94560,stroke:#fff,color:#fff
    style RD fill:#2c3e50,stroke:#fff,color:#fff
    style API fill:#4CAF50,stroke:#fff,color:#fff
    style UI fill:#4CAF50,stroke:#fff,color:#fff
    style AS1 fill:#2c3e50,stroke:#fff,color:#fff
    style AS2 fill:#2c3e50,stroke:#fff,color:#fff
    style AS3 fill:#2c3e50,stroke:#fff,color:#fff

```

Figure 4: Self-hosted LangGraph Architecture — Control Plane manages, Data Plane runs agents

## 8. When to Use LangGraph

#### Use LangGraph when

**Complex workflows** with multiple steps, branching, and loops — like order processing systems, data analysis pipelines, or multi-tool AI assistants. **Persistence required** — workflows running for hours that need to survive restarts/crashes. **Human-in-the-loop** — human approval needed at critical steps. **Multi-agent** — multiple specialized agents need coordination.

#### Skip LangGraph when

**Simple chatbot** — if you just need LLM + a few tools, use `create_react_agent` or the LLM SDK directly. **Quick prototyping** — CrewAI has a much lower learning curve. **Conversation-heavy** — if agents mainly chat back and forth between roles, AutoGen fits better.

## 9. Best Practices

- **Start small:** Build a single-agent ReAct loop first, add complexity as needed. Don't jump straight into multi-agent supervisor.
- **Strict state typing:** Use TypedDict with full type hints. Untyped state becomes impossible to debug as graphs grow complex.
- **Checkpointer from day one:** Use MemorySaver for dev, switch to PostgresSaver for staging/production. Don't add persistence later — refactoring will be painful.
- **Keep nodes small:** Each node should do one thing. A "god node" that calls LLM, parses, and validates is extremely hard to test and debug.
- **Observability:** Integrate LangSmith tracing to visualize graph execution. When an agent makes wrong decisions, traces show you exactly which node went wrong.

## 10. Conclusion

**References:**

- [LangGraph Official — langchain.com/langgraph](https://www.langchain.com/langgraph)
- [GitHub — langchain-ai/langgraph](https://github.com/langchain-ai/langgraph)
- [LangGraph Documentation — docs.langchain.com](https://docs.langchain.com/oss/python/langgraph/overview)
- [AI Agent Frameworks 2026 Production-Tested Ranking — Alice Labs](https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026)
- [CrewAI vs LangGraph vs AutoGen — DataCamp](https://www.datacamp.com/tutorial/crewai-vs-langgraph-vs-autogen)
- [LangGraph Platform Announcement — LangChain Blog](https://www.langchain.com/blog/langgraph-platform-announce)

Google ADK — The Open-Source Framework for Building Production AI Agents

AlphaEvolve — The AI Agent That Discovers Algorithms Beyond Human Capability

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.