Cloudflare Agent Cloud 2026 — Building AI Agents at the Edge with Workers, Durable Objects, and Project Think

Posted on: 4/18/2026 2:09:47 AM

Table of contents

1. Cloudflare — From CDN to Agent Cloud
2. Workers & Dynamic Workers — V8 Isolates at the Edge
1. 2.1. Cloudflare Workers: the serverless edge platform
2. 2.2. Dynamic Workers: code-at-runtime
  1. Dynamic Workers vs containers
3. Durable Objects & Facets — State for AI Agents
1. 3.1. Durable Objects: single-threaded actor model
2. 3.2. Durable Object Facets: isolation for dynamic code
4. Project Think — the Next-Generation Agents SDK
5. Workers AI & AI Gateway — Unified Model Access
1. 5.1. Workers AI: 50+ models at the edge
2. 5.2. AI Gateway: a unified proxy for every AI provider
6. MCP Server at the Edge
1. MCP + Durable Objects = Stateful Tool Servers
7. The supporting service ecosystem
1. 7.1. Storage Layer
2. 7.2. Agents Week 2026 — New services
8. Reference architecture: an AI Agent on Cloudflare
9. Comparing with other platforms
1. Important caveats
10. Getting started with Cloudflare Agent Cloud
11. Cloudflare Developer Platform timeline
12. Conclusion
1. The bottom line
2. References

1. Cloudflare — From CDN to Agent Cloud

Cloudflare has been on a remarkable journey: from a pure CDN and web-security provider to a full-fledged platform for AI Agents. With Agents Week 2026 (April 2026), Cloudflare officially introduced the Agent Cloud concept — a vision that turns its global edge network into infrastructure for distributed, stateful, serverless AI agents.

The core differentiator: instead of renting a 24/7 VM to host each agent, Cloudflare lets agents hibernate when idle and only consume resources during actual work — pushing running costs for idle agents close to zero.

330+ Cities with edge PoPs

100K Free requests/day (Workers)

50+ Open-source AI models

<5ms Dynamic Workers cold start

2. Workers & Dynamic Workers — V8 Isolates at the Edge

2.1. Cloudflare Workers: the serverless edge platform

Cloudflare Workers runs JavaScript/TypeScript on V8 isolates — the same engine powering Chrome — at more than 330 edge locations worldwide. Not containers, not VMs — each request is handled inside a lightweight isolate with near-zero cold start.

Key technical characteristics:

Free tier: 100,000 requests/day, 10 ms CPU time per invocation
Paid ($5/month): 10 million requests, 30 s CPU time, unlimited static assets
Supported languages: JavaScript, TypeScript, Python, Rust (via WASM)
Bindings: direct connections to KV, R2, D1, Queues, and Durable Objects with no network hop

2.2. Dynamic Workers: code-at-runtime

Dynamic Workers (open beta in March 2026) is a major step forward: it allows code to be injected and executed at runtime via an API, with no prior deployment. It's the foundation for AI-generated code — the agent writes code, pushes it, and the code runs instantly.

Dynamic Workers vs containers

Dynamic Workers start up 100× faster and use 1/10 the memory of containers. With single-digit-millisecond startup and single-digit-MB memory, they're light enough to be used once and thrown away — ideal as a for AI-generated code.

Dynamic Workers follow the zero-ambient-authority security principle:

// A Dynamic Worker starts with no permissions at all
const worker = await createDynamicWorker({
  code: agentGeneratedCode,
  bindings: {
    globalOutbound: null,  // No network access
    // Only grant what's needed via bindings
    DB: env.MY_D1_DATABASE,
    STORAGE: env.MY_R2_BUCKET,
  }
});

3.1. Durable Objects: single-threaded actor model

Durable Objects solve serverless's biggest challenge: state. Each Durable Object is a single-threaded actor with:

Its own SQLite database on local disk — near-zero latency
Transactional storage for consistency
Hibernation: it sleeps when idle and wakes on request — zero cost while inactive
WebSocket support: maintains real-time connections

The economic implications are clear: if you have 10,000 AI agents but only 1% are active at once, a traditional VM setup needs 10,000 instances running continuously. With Durable Objects + hibernation you only need ~100 active instances at any given moment.

Facets (Agents Week 2026) extend Durable Objects with a parent–child model:

graph TD
    A["🏗️ Parent Durable Object
(Platform code)"] --> B["📊 Parent SQLite
Metadata, billing, logs"]
    A --> C["🔒 Facet (Child)
AI-generated dynamic code"]
    C --> D["💾 Child SQLite
Application data"]
    B -.->|"❌ Isolated"| D
    A --> E["🔒 Facet (Child 2)
Another application"]
    E --> F["💾 Child SQLite 2
Separate data"]
    D -.->|"❌ Isolated"| F

    style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 1: the parent–child architecture of Durable Object Facets

Each Facet has its own SQLite database, fully isolated from the parent and from other Facets. The parent controls rate limiting, quotas, and billing — while the child focuses purely on application logic. This is the foundation for platforms that let AI create applications with persistent state.

export class AppRunner extends DurableObject {
  async fetch(request: Request): Promise<Response> {
    // Load dynamic code from R2 or an API
    const appCode = await this.env.R2.get("apps/user-123/code.js");

    // Create a facet — each app gets its own SQLite
    const facet = this.ctx.facets.get("user-app-123", {
      className: "UserApp",
      code: await appCode.text(),
    });

    return facet.fetch(request);
  }
}

4. Project Think — the Next-Generation Agents SDK

Project Think is Cloudflare's official framework for AI agents, built on top of Durable Objects. Instead of gluing primitives together yourself, Think provides a base class that handles the full lifecycle of an agent.

4.1. Core architecture

The Think base class — a minimal agent

import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";

export class MyAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/moonshotai/kimi-k2.5"
    );
  }
}

Only getModel() needs to be overridden — Think manages conversation, memory, tool execution, and persistence.

4.2. Key primitives

Durable Execution with Fibers:

Fibers allow agent loops to run for many minutes (or longer) without losing progress. Each fiber is recorded in SQLite before it runs, can checkpoint at any time, and recovers automatically if the platform restarts.

await this.runFiber("research-task", async (fiber) => {
  const results = await this.searchWeb(query);
  await fiber.stash(); // Checkpoint — safe against crashes

  const analysis = await this.analyzeResults(results);
  await fiber.stash(); // Second checkpoint

  return this.generateReport(analysis);
});

Sub-agents via Facets:

Each sub-agent is a child Durable Object with its own SQLite, communicating over typed RPC. The parent agent delegates work to sub-agents — each running isolated and able to hibernate independently.

Persistent Sessions:

Conversations are stored as a tree (parent-message relationships), with non-destructive compaction (summarize rather than delete) and full-text search via SQLite FTS5. Sessions can be forked to explore multiple directions without losing the original context.

4.3. The Execution Ladder — 5 tiers

Tier	Name	Capability	Use case
0	Workspace	Durable filesystem (SQLite + R2)	Store files, config, data
1	Dynamic Workers	V8 isolate, zero ambient authority	Safely run code from AI
2	NPM Resolution	Bundler + npm packages	Complex code needing dependencies
3	Browser	Headless browser automation	Scraping, testing, screenshots
4	Sandbox	Full toolchain + git access	Building, compiling, deploying projects

5. Workers AI & AI Gateway — Unified Model Access

5.1. Workers AI: 50+ models at the edge

Workers AI provides inference for 50+ open-source models directly on Cloudflare's GPU network. No infrastructure to manage — call a model via a binding just like calling a function:

const response = await env.AI.run(
  "@cf/meta/llama-4-scout-17b-16e-instruct",
  {
    messages: [
      { role: "user", content: "Analyze microservices architecture" }
    ]
  }
);

Notable recent models (April 2026):

Google Gemma 4 26B A4B — MoE with 26B total / 4B active, 256K context, supports vision + thinking + function calling
GLM-4.7-Flash — 131K-token context, optimized for summarization
Qwen3-30B-A3B — MoE activating only 3B parameters per forward pass
EmbeddingGemma-300M — 768-dim vectors, optimized for low-latency embedding

5.2. AI Gateway: a unified proxy for every AI provider

AI Gateway acts as a unified inference layer, supporting 14+ providers (OpenAI, Anthropic, Google, Mistral, …) through a single interface. The new breakthrough: the same AI.run() binding works for both Workers AI models and third-party models.

graph LR
    A["🤖 AI Agent"] --> B["🌐 AI Gateway"]
    B --> C["Workers AI
Llama, Gemma, Qwen"]
    B --> D["OpenAI
GPT-4.1, o4"]
    B --> E["Anthropic
Claude Opus, Sonnet"]
    B --> F["Google
Gemini 2.5"]
    B --> G["Caching Layer"]
    B --> H["Rate Limiting"]
    B --> I["AI Firewall"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style H fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 2: AI Gateway — a unified inference layer for multiple providers

AI Gateway provides:

Caching: cache responses for identical prompts, reducing cost and latency
Rate limiting: control quotas per user/key/endpoint
AI Firewall: detect prompt injection and data exfiltration before requests reach the model
Analytics: dashboards for token usage, latency, and error rate per model/provider
Fallback: automatically switch to another provider when one has issues

6. MCP Server at the Edge

Cloudflare has partnered closely with Anthropic to build infrastructure for remote MCP servers — bringing the Model Context Protocol to the edge with scale and hibernation.

MCP + Durable Objects = Stateful Tool Servers

Each MCP server instance runs as an McpAgent (extending Durable Object), automatically benefiting from hibernation — sleeping when idle and waking with state intact when the agent needs it. This answers the "MCP servers need to be always on but can't run 24/7 for every user" problem.

An enterprise MCP architecture on Cloudflare:

graph TD
    A["AI Agent / Claude"] -->|"MCP Request"| B["Cloudflare Access
OAuth 2.0 / RFC 9728"]
    B --> C["AI Gateway
Code Mode reduces tokens"]
    C --> D["MCP Server Portal
(McpAgent on DO)"]
    D --> E["Tool: Database Query"]
    D --> F["Tool: File Storage"]
    D --> G["Tool: External API"]
    D -.->|"Hibernate when idle"| H["💤 State preserved
in SQLite"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#e94560,stroke:#fff,color:#fff
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style H fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Figure 3: Enterprise MCP deployment on Cloudflare

Standout features:

Code Mode: reduces token cost by compressing tool descriptions
Managed OAuth: implements RFC 9728 for agent authentication without building your own OAuth flow
Cloudflare Mesh: grants MCP servers access to private databases/APIs without manual tunneling
Scannable API tokens: resource-scoped permissions that honor least-privilege

7. The supporting service ecosystem

7.1. Storage Layer

Service	Type	Free Tier	Use case for agents
R2	Object storage (S3-compatible)	10 GB storage, 1M Class A / 10M Class B ops/month	Artifacts, code, models, large files
D1	SQLite database	5 GB storage, 5M rows read/day	Metadata, user data, agent state
KV	Key-value store	1 GB storage, 100K reads/day	Config, feature flags, session data
Queues	Message queue	10K operations/day (new 02/2026)	Task scheduling, async workflows
Vectorize	Vector database	5M vectors, 30M query dimensions/month	RAG, semantic search, embeddings

7.2. Agents Week 2026 — New services

Sandboxes (GA)

Isolated environments with a shell, filesystem, and background processes. Agents can create es, install packages, run builds, and maintain state across sessions. This is Tier 4 of the Execution Ladder.

Browser Run — upgraded headless browser

Concurrency is 4× higher than before. New features include Live View (watch what the browser is doing in real time) and Human-in-Loop (the agent pauses and waits for user input when needed). Ideal for web scraping, testing, and automated form filling.

Artifacts — Git-compatible storage

Versioned storage supporting tens of millions of repos. Agents can create, commit, and manage code repositories directly — without external GitHub/GitLab for small projects.

Email Service (Public Beta)

Agents send, receive, and process email natively — no separate SendGrid or SES. Useful for agents that need to interact with humans over email.

Flagship — feature flags at the edge

Evaluates feature flags with sub-millisecond latency using KV + Durable Objects. Agents can check flags before executing logic with virtually no overhead.

Unweight — LLM model compression

Reduces the footprint of LLM models by 22% through lossless compression. The practical impact: the same GPU fits more models, lowering inference cost on Workers AI.

8. Reference architecture: an AI Agent on Cloudflare

Here's an end-to-end architecture for a production AI agent running entirely on Cloudflare:

graph TD
    subgraph "Client Layer"
        U["👤 User"] --> W["Worker
(API Gateway)"]
    end

    subgraph "Agent Layer"
        W --> T["Think Agent
(Durable Object)"]
        T --> F1["Sub-agent 1
(Facet - Research)"]
        T --> F2["Sub-agent 2
(Facet - Code Gen)"]
        T --> F3["Sub-agent 3
(Facet - Review)"]
    end

    subgraph "AI Layer"
        F1 --> AI["Workers AI
Llama 4 / Gemma 4"]
        F2 --> GW["AI Gateway
→ Claude / GPT"]
        F3 --> AI
    end

    subgraph "Tool Layer"
        T --> MCP["MCP Server
(McpAgent on DO)"]
        MCP --> BR["Browser Run"]
        MCP --> SB["Sandbox"]
        MCP --> EX["External APIs"]
    end

    subgraph "Storage Layer"
        T --> D1["D1
Agent metadata"]
        T --> R2["R2
Files & artifacts"]
        T --> VZ["Vectorize
RAG embeddings"]
        T --> Q["Queues
Async tasks"]
    end

    style U fill:#e94560,stroke:#fff,color:#fff
    style W fill:#2c3e50,stroke:#fff,color:#fff
    style T fill:#e94560,stroke:#fff,color:#fff
    style F1 fill:#2c3e50,stroke:#fff,color:#fff
    style F2 fill:#2c3e50,stroke:#fff,color:#fff
    style F3 fill:#2c3e50,stroke:#fff,color:#fff
    style AI fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GW fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style MCP fill:#e94560,stroke:#fff,color:#fff
    style BR fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style SB fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style EX fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style D1 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style R2 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style VZ fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style Q fill:#f8f9fa,stroke:#ff9800,color:#2c3e50

Figure 4: end-to-end production AI Agent architecture on Cloudflare

9. Comparing with other platforms

Criterion	Cloudflare Agent Cloud	AWS Lambda + Bedrock	Azure Functions + OpenAI
Cold start	<5ms (V8 isolate)	100 ms–2 s (container)	200 ms–3 s (container)
Stateful agents	Durable Objects (native)	DynamoDB / Step Functions	Durable Functions
Agent hibernation	Built-in, automatic	None (DIY)	Yes (Durable Functions)
MCP support	McpAgent + native OAuth	Build your own	Build your own
Built-in AI models	50+ open-source at the edge	Bedrock (managed)	Azure OpenAI (managed)
Free tier	Very generous (Workers, R2, D1, KV, Queues)	1M Lambda requests, limited Bedrock	1M Functions requests, AI credit
Edge locations	330+ cities	30+ regions	60+ regions
Dynamic code execution	Dynamic Workers (native)	No native option	No native option

Important caveats

Cloudflare Workers have a CPU-time limit (10 ms free, 30 s paid) — not suitable for long, CPU-intensive tasks. Durable Object Facets are still in beta. Dynamic Workers are limited to Workers Paid plans. For workloads needing GPU training or heavy long-running compute, AWS/Azure/GCP are still better fits.

10. Getting started with Cloudflare Agent Cloud

Here's a simple example of building an AI agent with Project Think:

// wrangler.toml
// name = "my-ai-agent"
// main = "src/index.ts"
// compatibility_date = "2026-04-01"
// [ai]
// binding = "AI"
// [[durable_objects.bindings]]
// name = "AGENT"
// class_name = "ResearchAgent"

import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";
import { tool } from "ai";
import { z } from "zod";

export class ResearchAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/google/gemma-4-26b-a4b"
    );
  }

  getTools() {
    return {
      search: tool({
        description: "Search the web for information",
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          // Use AI Search or an external API
          const results = await this.env.AI.run(
            "@cf/cloudflare/ai-search",
            { query }
          );
          return results;
        },
      }),
      saveNote: tool({
        description: "Save a note to storage",
        parameters: z.object({
          title: z.string(),
          content: z.string()
        }),
        execute: async ({ title, content }) => {
          // Durable Object SQLite — persists across hibernation
          await this.sql.exec(
            "INSERT INTO notes (title, content, created_at) VALUES (?, ?, ?)",
            title, content, new Date().toISOString()
          );
          return { saved: true };
        },
      }),
    };
  }
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const id = env.AGENT.idFromName("default");
    const agent = env.AGENT.get(id);
    return agent.fetch(request);
  },
};

Deployment is a single command:

npx wrangler deploy

11. Cloudflare Developer Platform timeline

2017

Launched Cloudflare Workers — serverless on V8 isolates at the edge. "Compute near the user" takes off.

2021

Launched Durable Objects — solving serverless state. R2 Object Storage competes directly with S3, with no egress fees.

2022

D1 (SQLite at the edge) and Queues enter beta. The storage ecosystem starts to feel complete.

2023

Launched Workers AI and AI Gateway — Cloudflare enters the AI inference market. Vectorize for vector search.

2024

Initial Agents SDK. Support for MCP servers on Workers. Browser Rendering for headless automation.

02/2026

Queues free — 10K ops/day on the free plan. Dynamic Workers enters open beta.

04/2026

Agents Week: Project Think, Durable Object Facets, Sandboxes GA, Browser Run 4×, Artifacts, Email Service, AI Firewall, Managed OAuth, Flagship, Unweight. Cloudflare officially positions itself as Agent Cloud.

12. Conclusion

Cloudflare Agent Cloud marks an important shift: from "a place to host websites" to "a place to run AI agents". The combination of V8 isolates (fast, lightweight), Durable Objects (stateful, with hibernation), and a rich storage/AI ecosystem creates a platform few rivals can match on developer experience and operating cost.

The generous free tier — Workers (100K req/day), R2 (10 GB), D1 (5 GB), KV (1 GB), Queues (10K ops/day), Vectorize (5M vectors) — makes it an ideal place to prototype and even run production for small-to-medium AI agent projects at almost zero cost.

The bottom line

If you're building AI agents and need: (1) extremely fast cold starts, (2) automatic state management with hibernation, (3) native MCP server support, and (4) a generous free tier — Cloudflare Agent Cloud deserves to be the first platform you evaluate. The main limits are the CPU time cap and the fact that some features are still in beta.

References

# Cloudflare Agent Cloud 2026 — Building AI Agents at the Edge with Workers, Durable Objects, and Project Think

## 1. Cloudflare — From CDN to Agent Cloud

Cloudflare has been on a remarkable journey: from a pure CDN and web-security provider to a **full-fledged platform for AI Agents**. With Agents Week 2026 (April 2026), Cloudflare officially introduced the **Agent Cloud** concept — a vision that turns its global edge network into infrastructure for distributed, stateful, serverless AI agents.

The core differentiator: instead of renting a 24/7 VM to host each agent, Cloudflare lets agents **hibernate when idle** and only consume resources during actual work — pushing running costs for idle agents close to zero.

330+ Cities with edge PoPs

100K Free requests/day (Workers)

50+ Open-source AI models

<5ms Dynamic Workers cold start

## 2. Workers & Dynamic Workers — V8 Isolates at the Edge

### 2.1. Cloudflare Workers: the serverless edge platform

**Cloudflare Workers** runs JavaScript/TypeScript on V8 isolates — the same engine powering Chrome — at more than 330 edge locations worldwide. Not containers, not VMs — each request is handled inside a lightweight isolate with near-zero cold start.

Key technical characteristics:

- **Free tier:** 100,000 requests/day, 10 ms CPU time per invocation
- **Paid ($5/month):** 10 million requests, 30 s CPU time, unlimited static assets
- **Supported languages:** JavaScript, TypeScript, Python, Rust (via WASM)
- **Bindings:** direct connections to KV, R2, D1, Queues, and Durable Objects with no network hop

### 2.2. Dynamic Workers: code-at-runtime

**Dynamic Workers** (open beta in March 2026) is a major step forward: it allows code to be injected and executed *at runtime* via an API, with no prior deployment. It's the foundation for AI-generated code — the agent writes code, pushes it, and the code runs instantly.

#### Dynamic Workers vs containers

Dynamic Workers start up **100× faster** and use **1/10 the memory** of containers. With single-digit-millisecond startup and single-digit-MB memory, they're light enough to be used once and thrown away — ideal as a sandbox for AI-generated code.

Dynamic Workers follow the **zero-ambient-authority** security principle:

```typescript
// A Dynamic Worker starts with no permissions at all
const worker = await createDynamicWorker({
  code: agentGeneratedCode,
  bindings: {
    globalOutbound: null,  // No network access
    // Only grant what's needed via bindings
    DB: env.MY_D1_DATABASE,
    STORAGE: env.MY_R2_BUCKET,
  }
});

```

## 3. Durable Objects & Facets — State for AI Agents

### 3.1. Durable Objects: single-threaded actor model

**Durable Objects** solve serverless's biggest challenge: **state**. Each Durable Object is a single-threaded actor with:

- Its own **SQLite database** on local disk — near-zero latency
- **Transactional storage** for consistency
- **Hibernation:** it sleeps when idle and wakes on request — zero cost while inactive
- **WebSocket support:** maintains real-time connections

### 3.2. Durable Object Facets: isolation for dynamic code

**Facets** (Agents Week 2026) extend Durable Objects with a parent–child model:

```
graph TD
    A["🏗️ Parent Durable Object  
(Platform code)"] --> B["📊 Parent SQLite  
Metadata, billing, logs"]
    A --> C["🔒 Facet (Child)  
AI-generated dynamic code"]
    C --> D["💾 Child SQLite  
Application data"]
    B -.->|"❌ Isolated"| D
    A --> E["🔒 Facet (Child 2)  
Another application"]
    E --> F["💾 Child SQLite 2  
Separate data"]
    D -.->|"❌ Isolated"| F

style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
  
```
Figure 1: the parent–child architecture of Durable Object Facets

```typescript
export class AppRunner extends DurableObject {
  async fetch(request: Request): Promise<Response> {
    // Load dynamic code from R2 or an API
    const appCode = await this.env.R2.get("apps/user-123/code.js");

// Create a facet — each app gets its own SQLite
    const facet = this.ctx.facets.get("user-app-123", {
      className: "UserApp",
      code: await appCode.text(),
    });

return facet.fetch(request);
  }
}

```

## 4. Project Think — the Next-Generation Agents SDK

**Project Think** is Cloudflare's official framework for AI agents, built on top of Durable Objects. Instead of gluing primitives together yourself, Think provides a base class that handles the full lifecycle of an agent.

### 4.1. Core architecture

#### The Think base class — a minimal agent

```typescript
import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";

export class MyAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/moonshotai/kimi-k2.5"
    );
  }
}

```
Only `getModel()` needs to be overridden — Think manages conversation, memory, tool execution, and persistence.

### 4.2. Key primitives

**Durable Execution with Fibers:**

```typescript
await this.runFiber("research-task", async (fiber) => {
  const results = await this.searchWeb(query);
  await fiber.stash(); // Checkpoint — safe against crashes

const analysis = await this.analyzeResults(results);
  await fiber.stash(); // Second checkpoint

return this.generateReport(analysis);
});

```
**Sub-agents via Facets:**

**Persistent Sessions:**

### 4.3. The Execution Ladder — 5 tiers

| Tier | Name | Capability | Use case |
| --- | --- | --- | --- |
| 0 | Workspace | Durable filesystem (SQLite + R2) | Store files, config, data |
| 1 | Dynamic Workers | V8 isolate, zero ambient authority | Safely run code from AI |
| 2 | NPM Resolution | Bundler + npm packages | Complex code needing dependencies |
| 3 | Browser | Headless browser automation | Scraping, testing, screenshots |
| 4 | Sandbox | Full toolchain + git access | Building, compiling, deploying projects |

## 5. Workers AI & AI Gateway — Unified Model Access

### 5.1. Workers AI: 50+ models at the edge

Workers AI provides inference for 50+ open-source models directly on Cloudflare's GPU network. No infrastructure to manage — call a model via a binding just like calling a function:

```typescript
const response = await env.AI.run(
  "@cf/meta/llama-4-scout-17b-16e-instruct",
  {
    messages: [
      { role: "user", content: "Analyze microservices architecture" }
    ]
  }
);

```
Notable recent models (April 2026):

- **Google Gemma 4 26B A4B** — MoE with 26B total / 4B active, 256K context, supports vision + thinking + function calling
- **GLM-4.7-Flash** — 131K-token context, optimized for summarization
- **Qwen3-30B-A3B** — MoE activating only 3B parameters per forward pass
- **EmbeddingGemma-300M** — 768-dim vectors, optimized for low-latency embedding

### 5.2. AI Gateway: a unified proxy for every AI provider

**AI Gateway** acts as a unified inference layer, supporting 14+ providers (OpenAI, Anthropic, Google, Mistral, …) through a single interface. The new breakthrough: the same `AI.run()` binding works for both Workers AI models and third-party models.

```
graph LR
    A["🤖 AI Agent"] --> B["🌐 AI Gateway"]
    B --> C["Workers AI  
Llama, Gemma, Qwen"]
    B --> D["OpenAI  
GPT-4.1, o4"]
    B --> E["Anthropic  
Claude Opus, Sonnet"]
    B --> F["Google  
Gemini 2.5"]
    B --> G["Caching Layer"]
    B --> H["Rate Limiting"]
    B --> I["AI Firewall"]

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style H fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50
  
```
Figure 2: AI Gateway — a unified inference layer for multiple providers

AI Gateway provides:

- **Caching:** cache responses for identical prompts, reducing cost and latency
- **Rate limiting:** control quotas per user/key/endpoint
- **AI Firewall:** detect prompt injection and data exfiltration before requests reach the model
- **Analytics:** dashboards for token usage, latency, and error rate per model/provider
- **Fallback:** automatically switch to another provider when one has issues

## 6. MCP Server at the Edge

Cloudflare has partnered closely with Anthropic to build infrastructure for **remote MCP servers** — bringing the Model Context Protocol to the edge with scale and hibernation.

#### MCP + Durable Objects = Stateful Tool Servers

Each MCP server instance runs as an **McpAgent** (extending Durable Object), automatically benefiting from hibernation — sleeping when idle and waking with state intact when the agent needs it. This answers the "MCP servers need to be always on but can't run 24/7 for every user" problem.

An enterprise MCP architecture on Cloudflare:

```
graph TD
    A["AI Agent / Claude"] -->|"MCP Request"| B["Cloudflare Access  
OAuth 2.0 / RFC 9728"]
    B --> C["AI Gateway  
Code Mode reduces tokens"]
    C --> D["MCP Server Portal  
(McpAgent on DO)"]
    D --> E["Tool: Database Query"]
    D --> F["Tool: File Storage"]
    D --> G["Tool: External API"]
    D -.->|"Hibernate when idle"| H["💤 State preserved  
in SQLite"]

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#e94560,stroke:#fff,color:#fff
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style H fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
  
```
Figure 3: Enterprise MCP deployment on Cloudflare

Standout features:

- **Code Mode:** reduces token cost by compressing tool descriptions
- **Managed OAuth:** implements RFC 9728 for agent authentication without building your own OAuth flow
- **Cloudflare Mesh:** grants MCP servers access to private databases/APIs without manual tunneling
- **Scannable API tokens:** resource-scoped permissions that honor least-privilege

## 7. The supporting service ecosystem

### 7.1. Storage Layer

| Service | Type | Free Tier | Use case for agents |
| --- | --- | --- | --- |
| **R2** | Object storage (S3-compatible) | 10 GB storage, 1M Class A / 10M Class B ops/month | Artifacts, code, models, large files |
| **D1** | SQLite database | 5 GB storage, 5M rows read/day | Metadata, user data, agent state |
| **KV** | Key-value store | 1 GB storage, 100K reads/day | Config, feature flags, session data |
| **Queues** | Message queue | 10K operations/day (new 02/2026) | Task scheduling, async workflows |
| **Vectorize** | Vector database | 5M vectors, 30M query dimensions/month | RAG, semantic search, embeddings |

### 7.2. Agents Week 2026 — New services

#### Sandboxes (GA)

Isolated environments with a shell, filesystem, and background processes. Agents can create sandboxes, install packages, run builds, and maintain state across sessions. This is Tier 4 of the Execution Ladder.

#### Browser Run — upgraded headless browser

Concurrency is **4× higher** than before. New features include **Live View** (watch what the browser is doing in real time) and **Human-in-Loop** (the agent pauses and waits for user input when needed). Ideal for web scraping, testing, and automated form filling.

#### Artifacts — Git-compatible storage

Versioned storage supporting tens of millions of repos. Agents can create, commit, and manage code repositories directly — without external GitHub/GitLab for small projects.

#### Email Service (Public Beta)

Agents send, receive, and process email natively — no separate SendGrid or SES. Useful for agents that need to interact with humans over email.

#### Flagship — feature flags at the edge

Evaluates feature flags with **sub-millisecond** latency using KV + Durable Objects. Agents can check flags before executing logic with virtually no overhead.

#### Unweight — LLM model compression

Reduces the **footprint of LLM models by 22%** through lossless compression. The practical impact: the same GPU fits more models, lowering inference cost on Workers AI.

## 8. Reference architecture: an AI Agent on Cloudflare

Here's an end-to-end architecture for a production AI agent running entirely on Cloudflare:

```
graph TD
    subgraph "Client Layer"
        U["👤 User"] --> W["Worker  
(API Gateway)"]
    end

subgraph "Agent Layer"
        W --> T["Think Agent  
(Durable Object)"]
        T --> F1["Sub-agent 1  
(Facet - Research)"]
        T --> F2["Sub-agent 2  
(Facet - Code Gen)"]
        T --> F3["Sub-agent 3  
(Facet - Review)"]
    end

subgraph "AI Layer"
        F1 --> AI["Workers AI  
Llama 4 / Gemma 4"]
        F2 --> GW["AI Gateway  
→ Claude / GPT"]
        F3 --> AI
    end

subgraph "Tool Layer"
        T --> MCP["MCP Server  
(McpAgent on DO)"]
        MCP --> BR["Browser Run"]
        MCP --> SB["Sandbox"]
        MCP --> EX["External APIs"]
    end

subgraph "Storage Layer"
        T --> D1["D1  
Agent metadata"]
        T --> R2["R2  
Files & artifacts"]
        T --> VZ["Vectorize  
RAG embeddings"]
        T --> Q["Queues  
Async tasks"]
    end

style U fill:#e94560,stroke:#fff,color:#fff
    style W fill:#2c3e50,stroke:#fff,color:#fff
    style T fill:#e94560,stroke:#fff,color:#fff
    style F1 fill:#2c3e50,stroke:#fff,color:#fff
    style F2 fill:#2c3e50,stroke:#fff,color:#fff
    style F3 fill:#2c3e50,stroke:#fff,color:#fff
    style AI fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GW fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style MCP fill:#e94560,stroke:#fff,color:#fff
    style BR fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style SB fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style EX fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style D1 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style R2 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style VZ fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
    style Q fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
  
```
Figure 4: end-to-end production AI Agent architecture on Cloudflare

## 9. Comparing with other platforms

| Criterion | Cloudflare Agent Cloud | AWS Lambda + Bedrock | Azure Functions + OpenAI |
| --- | --- | --- | --- |
| **Cold start** | <5ms (V8 isolate) | 100 ms–2 s (container) | 200 ms–3 s (container) |
| **Stateful agents** | Durable Objects (native) | DynamoDB / Step Functions | Durable Functions |
| **Agent hibernation** | Built-in, automatic | None (DIY) | Yes (Durable Functions) |
| **MCP support** | McpAgent + native OAuth | Build your own | Build your own |
| **Built-in AI models** | 50+ open-source at the edge | Bedrock (managed) | Azure OpenAI (managed) |
| **Free tier** | Very generous (Workers, R2, D1, KV, Queues) | 1M Lambda requests, limited Bedrock | 1M Functions requests, AI credit |
| **Edge locations** | 330+ cities | 30+ regions | 60+ regions |
| **Dynamic code execution** | Dynamic Workers (native) | No native option | No native option |

#### Important caveats

Cloudflare Workers have a **CPU-time limit** (10 ms free, 30 s paid) — not suitable for long, CPU-intensive tasks. Durable Object Facets are still in beta. Dynamic Workers are limited to Workers Paid plans. For workloads needing GPU training or heavy long-running compute, AWS/Azure/GCP are still better fits.

## 10. Getting started with Cloudflare Agent Cloud

Here's a simple example of building an AI agent with Project Think:

```typescript
// wrangler.toml
// name = "my-ai-agent"
// main = "src/index.ts"
// compatibility_date = "2026-04-01"
// [ai]
// binding = "AI"
// [[durable_objects.bindings]]
// name = "AGENT"
// class_name = "ResearchAgent"

import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";
import { tool } from "ai";
import { z } from "zod";

export class ResearchAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/google/gemma-4-26b-a4b"
    );
  }

getTools() {
    return {
      search: tool({
        description: "Search the web for information",
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          // Use AI Search or an external API
          const results = await this.env.AI.run(
            "@cf/cloudflare/ai-search",
            { query }
          );
          return results;
        },
      }),
      saveNote: tool({
        description: "Save a note to storage",
        parameters: z.object({
          title: z.string(),
          content: z.string()
        }),
        execute: async ({ title, content }) => {
          // Durable Object SQLite — persists across hibernation
          await this.sql.exec(
            "INSERT INTO notes (title, content, created_at) VALUES (?, ?, ?)",
            title, content, new Date().toISOString()
          );
          return { saved: true };
        },
      }),
    };
  }
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const id = env.AGENT.idFromName("default");
    const agent = env.AGENT.get(id);
    return agent.fetch(request);
  },
};

```
Deployment is a single command:

```bash
npx wrangler deploy

```

## 11. Cloudflare Developer Platform timeline

2017

Launched **Cloudflare Workers** — serverless on V8 isolates at the edge. "Compute near the user" takes off.

2021

Launched **Durable Objects** — solving serverless state. **R2 Object Storage** competes directly with S3, with no egress fees.

2022

**D1** (SQLite at the edge) and **Queues** enter beta. The storage ecosystem starts to feel complete.

2023

Launched **Workers AI** and **AI Gateway** — Cloudflare enters the AI inference market. Vectorize for vector search.

2024

Initial **Agents SDK**. Support for MCP servers on Workers. Browser Rendering for headless automation.

02/2026

**Queues free** — 10K ops/day on the free plan. Dynamic Workers enters open beta.

04/2026

**Agents Week:** Project Think, Durable Object Facets, Sandboxes GA, Browser Run 4×, Artifacts, Email Service, AI Firewall, Managed OAuth, Flagship, Unweight. Cloudflare officially positions itself as **Agent Cloud**.

## 12. Conclusion

Cloudflare Agent Cloud marks an important shift: **from "a place to host websites" to "a place to run AI agents"**. The combination of V8 isolates (fast, lightweight), Durable Objects (stateful, with hibernation), and a rich storage/AI ecosystem creates a platform few rivals can match on developer experience and operating cost.

#### The bottom line

### References

- [Cloudflare Agents Week 2026 — Updates and Announcements](https://www.cloudflare.com/agents-week/updates/)
- [Project Think: Building the next generation of AI agents on Cloudflare](https://blog.cloudflare.com/project-think/)
- [Durable Objects in Dynamic Workers: Give each AI-generated app its own database](https://blog.cloudflare.com/durable-object-facets-dynamic-workers/)
- [Building AI Agents with MCP, Authentication & Authorization, and Durable Objects](https://blog.cloudflare.com/building-ai-agents-with-mcp-authn-authz-and-durable-objects/)
- [Cloudflare Workers Pricing](https://developers.cloudflare.com/workers/platform/pricing/)
- [Cloudflare Queues now available on Workers Free plan](https://developers.cloudflare.com/changelog/post/2026-02-04-queues-free-plan/)
- [New Workers AI models for text generation and embedding](https://developers.cloudflare.com/changelog/post/2026-04-09-new-workers-ai-models/)
- [Cloudflare Expands its Agent Cloud to Power the Next Generation of Agents](https://www.cloudflare.com/press/press-releases/2026/cloudflare-expands-its-agent-cloud-to-power-the-next-generation-of-agents/)

Database Sharding — Data Partitioning Strategies When Your System Hits the Ceiling

Cloudflare Tunnel + Zero Trust — Expose Internal Apps to the Internet Securely and for Free

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.