Cloudflare Agent Cloud 2026 — Building AI Agents at the Edge with Workers, Durable Objects, and Project Think
Posted on: 4/18/2026 2:09:47 AM
Table of contents
- 1. Cloudflare — From CDN to Agent Cloud
- 2. Workers & Dynamic Workers — V8 Isolates at the Edge
- 3. Durable Objects & Facets — State for AI Agents
- 4. Project Think — the Next-Generation Agents SDK
- 5. Workers AI & AI Gateway — Unified Model Access
- 6. MCP Server at the Edge
- 7. The supporting service ecosystem
- 8. Reference architecture: an AI Agent on Cloudflare
- 9. Comparing with other platforms
- 10. Getting started with Cloudflare Agent Cloud
- 11. Cloudflare Developer Platform timeline
- 12. Conclusion
1. Cloudflare — From CDN to Agent Cloud
Cloudflare has been on a remarkable journey: from a pure CDN and web-security provider to a full-fledged platform for AI Agents. With Agents Week 2026 (April 2026), Cloudflare officially introduced the Agent Cloud concept — a vision that turns its global edge network into infrastructure for distributed, stateful, serverless AI agents.
The core differentiator: instead of renting a 24/7 VM to host each agent, Cloudflare lets agents hibernate when idle and only consume resources during actual work — pushing running costs for idle agents close to zero.
2. Workers & Dynamic Workers — V8 Isolates at the Edge
2.1. Cloudflare Workers: the serverless edge platform
Cloudflare Workers runs JavaScript/TypeScript on V8 isolates — the same engine powering Chrome — at more than 330 edge locations worldwide. Not containers, not VMs — each request is handled inside a lightweight isolate with near-zero cold start.
Key technical characteristics:
- Free tier: 100,000 requests/day, 10 ms CPU time per invocation
- Paid ($5/month): 10 million requests, 30 s CPU time, unlimited static assets
- Supported languages: JavaScript, TypeScript, Python, Rust (via WASM)
- Bindings: direct connections to KV, R2, D1, Queues, and Durable Objects with no network hop
2.2. Dynamic Workers: code-at-runtime
Dynamic Workers (open beta in March 2026) is a major step forward: it allows code to be injected and executed at runtime via an API, with no prior deployment. It's the foundation for AI-generated code — the agent writes code, pushes it, and the code runs instantly.
Dynamic Workers vs containers
Dynamic Workers start up 100× faster and use 1/10 the memory of containers. With single-digit-millisecond startup and single-digit-MB memory, they're light enough to be used once and thrown away — ideal as a for AI-generated code.
Dynamic Workers follow the zero-ambient-authority security principle:
// A Dynamic Worker starts with no permissions at all
const worker = await createDynamicWorker({
code: agentGeneratedCode,
bindings: {
globalOutbound: null, // No network access
// Only grant what's needed via bindings
DB: env.MY_D1_DATABASE,
STORAGE: env.MY_R2_BUCKET,
}
});
3. Durable Objects & Facets — State for AI Agents
3.1. Durable Objects: single-threaded actor model
Durable Objects solve serverless's biggest challenge: state. Each Durable Object is a single-threaded actor with:
- Its own SQLite database on local disk — near-zero latency
- Transactional storage for consistency
- Hibernation: it sleeps when idle and wakes on request — zero cost while inactive
- WebSocket support: maintains real-time connections
The economic implications are clear: if you have 10,000 AI agents but only 1% are active at once, a traditional VM setup needs 10,000 instances running continuously. With Durable Objects + hibernation you only need ~100 active instances at any given moment.
3.2. Durable Object Facets: isolation for dynamic code
Facets (Agents Week 2026) extend Durable Objects with a parent–child model:
graph TD
A["🏗️ Parent Durable Object
(Platform code)"] --> B["📊 Parent SQLite
Metadata, billing, logs"]
A --> C["🔒 Facet (Child)
AI-generated dynamic code"]
C --> D["💾 Child SQLite
Application data"]
B -.->|"❌ Isolated"| D
A --> E["🔒 Facet (Child 2)
Another application"]
E --> F["💾 Child SQLite 2
Separate data"]
D -.->|"❌ Isolated"| F
style A fill:#e94560,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style E fill:#2c3e50,stroke:#fff,color:#fff
style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Figure 1: the parent–child architecture of Durable Object Facets
Each Facet has its own SQLite database, fully isolated from the parent and from other Facets. The parent controls rate limiting, quotas, and billing — while the child focuses purely on application logic. This is the foundation for platforms that let AI create applications with persistent state.
export class AppRunner extends DurableObject {
async fetch(request: Request): Promise<Response> {
// Load dynamic code from R2 or an API
const appCode = await this.env.R2.get("apps/user-123/code.js");
// Create a facet — each app gets its own SQLite
const facet = this.ctx.facets.get("user-app-123", {
className: "UserApp",
code: await appCode.text(),
});
return facet.fetch(request);
}
}
4. Project Think — the Next-Generation Agents SDK
Project Think is Cloudflare's official framework for AI agents, built on top of Durable Objects. Instead of gluing primitives together yourself, Think provides a base class that handles the full lifecycle of an agent.
4.1. Core architecture
The Think base class — a minimal agent
import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";
export class MyAgent extends Think<Env> {
getModel() {
return createWorkersAI({ binding: this.env.AI })(
"@cf/moonshotai/kimi-k2.5"
);
}
}
Only getModel() needs to be overridden — Think manages conversation, memory, tool execution, and persistence.
4.2. Key primitives
Durable Execution with Fibers:
Fibers allow agent loops to run for many minutes (or longer) without losing progress. Each fiber is recorded in SQLite before it runs, can checkpoint at any time, and recovers automatically if the platform restarts.
await this.runFiber("research-task", async (fiber) => {
const results = await this.searchWeb(query);
await fiber.stash(); // Checkpoint — safe against crashes
const analysis = await this.analyzeResults(results);
await fiber.stash(); // Second checkpoint
return this.generateReport(analysis);
});
Sub-agents via Facets:
Each sub-agent is a child Durable Object with its own SQLite, communicating over typed RPC. The parent agent delegates work to sub-agents — each running isolated and able to hibernate independently.
Persistent Sessions:
Conversations are stored as a tree (parent-message relationships), with non-destructive compaction (summarize rather than delete) and full-text search via SQLite FTS5. Sessions can be forked to explore multiple directions without losing the original context.
4.3. The Execution Ladder — 5 tiers
| Tier | Name | Capability | Use case |
|---|---|---|---|
| 0 | Workspace | Durable filesystem (SQLite + R2) | Store files, config, data |
| 1 | Dynamic Workers | V8 isolate, zero ambient authority | Safely run code from AI |
| 2 | NPM Resolution | Bundler + npm packages | Complex code needing dependencies |
| 3 | Browser | Headless browser automation | Scraping, testing, screenshots |
| 4 | Sandbox | Full toolchain + git access | Building, compiling, deploying projects |
5. Workers AI & AI Gateway — Unified Model Access
5.1. Workers AI: 50+ models at the edge
Workers AI provides inference for 50+ open-source models directly on Cloudflare's GPU network. No infrastructure to manage — call a model via a binding just like calling a function:
const response = await env.AI.run(
"@cf/meta/llama-4-scout-17b-16e-instruct",
{
messages: [
{ role: "user", content: "Analyze microservices architecture" }
]
}
);
Notable recent models (April 2026):
- Google Gemma 4 26B A4B — MoE with 26B total / 4B active, 256K context, supports vision + thinking + function calling
- GLM-4.7-Flash — 131K-token context, optimized for summarization
- Qwen3-30B-A3B — MoE activating only 3B parameters per forward pass
- EmbeddingGemma-300M — 768-dim vectors, optimized for low-latency embedding
5.2. AI Gateway: a unified proxy for every AI provider
AI Gateway acts as a unified inference layer, supporting 14+ providers (OpenAI, Anthropic, Google, Mistral, …) through a single interface. The new breakthrough: the same AI.run() binding works for both Workers AI models and third-party models.
graph LR
A["🤖 AI Agent"] --> B["🌐 AI Gateway"]
B --> C["Workers AI
Llama, Gemma, Qwen"]
B --> D["OpenAI
GPT-4.1, o4"]
B --> E["Anthropic
Claude Opus, Sonnet"]
B --> F["Google
Gemini 2.5"]
B --> G["Caching Layer"]
B --> H["Rate Limiting"]
B --> I["AI Firewall"]
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style G fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style H fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Figure 2: AI Gateway — a unified inference layer for multiple providers
AI Gateway provides:
- Caching: cache responses for identical prompts, reducing cost and latency
- Rate limiting: control quotas per user/key/endpoint
- AI Firewall: detect prompt injection and data exfiltration before requests reach the model
- Analytics: dashboards for token usage, latency, and error rate per model/provider
- Fallback: automatically switch to another provider when one has issues
6. MCP Server at the Edge
Cloudflare has partnered closely with Anthropic to build infrastructure for remote MCP servers — bringing the Model Context Protocol to the edge with scale and hibernation.
MCP + Durable Objects = Stateful Tool Servers
Each MCP server instance runs as an McpAgent (extending Durable Object), automatically benefiting from hibernation — sleeping when idle and waking with state intact when the agent needs it. This answers the "MCP servers need to be always on but can't run 24/7 for every user" problem.
An enterprise MCP architecture on Cloudflare:
graph TD
A["AI Agent / Claude"] -->|"MCP Request"| B["Cloudflare Access
OAuth 2.0 / RFC 9728"]
B --> C["AI Gateway
Code Mode reduces tokens"]
C --> D["MCP Server Portal
(McpAgent on DO)"]
D --> E["Tool: Database Query"]
D --> F["Tool: File Storage"]
D --> G["Tool: External API"]
D -.->|"Hibernate when idle"| H["💤 State preserved
in SQLite"]
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style D fill:#e94560,stroke:#fff,color:#fff
style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style H fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
Figure 3: Enterprise MCP deployment on Cloudflare
Standout features:
- Code Mode: reduces token cost by compressing tool descriptions
- Managed OAuth: implements RFC 9728 for agent authentication without building your own OAuth flow
- Cloudflare Mesh: grants MCP servers access to private databases/APIs without manual tunneling
- Scannable API tokens: resource-scoped permissions that honor least-privilege
7. The supporting service ecosystem
7.1. Storage Layer
| Service | Type | Free Tier | Use case for agents |
|---|---|---|---|
| R2 | Object storage (S3-compatible) | 10 GB storage, 1M Class A / 10M Class B ops/month | Artifacts, code, models, large files |
| D1 | SQLite database | 5 GB storage, 5M rows read/day | Metadata, user data, agent state |
| KV | Key-value store | 1 GB storage, 100K reads/day | Config, feature flags, session data |
| Queues | Message queue | 10K operations/day (new 02/2026) | Task scheduling, async workflows |
| Vectorize | Vector database | 5M vectors, 30M query dimensions/month | RAG, semantic search, embeddings |
7.2. Agents Week 2026 — New services
Sandboxes (GA)
Isolated environments with a shell, filesystem, and background processes. Agents can create es, install packages, run builds, and maintain state across sessions. This is Tier 4 of the Execution Ladder.
Browser Run — upgraded headless browser
Concurrency is 4× higher than before. New features include Live View (watch what the browser is doing in real time) and Human-in-Loop (the agent pauses and waits for user input when needed). Ideal for web scraping, testing, and automated form filling.
Artifacts — Git-compatible storage
Versioned storage supporting tens of millions of repos. Agents can create, commit, and manage code repositories directly — without external GitHub/GitLab for small projects.
Email Service (Public Beta)
Agents send, receive, and process email natively — no separate SendGrid or SES. Useful for agents that need to interact with humans over email.
Flagship — feature flags at the edge
Evaluates feature flags with sub-millisecond latency using KV + Durable Objects. Agents can check flags before executing logic with virtually no overhead.
Unweight — LLM model compression
Reduces the footprint of LLM models by 22% through lossless compression. The practical impact: the same GPU fits more models, lowering inference cost on Workers AI.
8. Reference architecture: an AI Agent on Cloudflare
Here's an end-to-end architecture for a production AI agent running entirely on Cloudflare:
graph TD
subgraph "Client Layer"
U["👤 User"] --> W["Worker
(API Gateway)"]
end
subgraph "Agent Layer"
W --> T["Think Agent
(Durable Object)"]
T --> F1["Sub-agent 1
(Facet - Research)"]
T --> F2["Sub-agent 2
(Facet - Code Gen)"]
T --> F3["Sub-agent 3
(Facet - Review)"]
end
subgraph "AI Layer"
F1 --> AI["Workers AI
Llama 4 / Gemma 4"]
F2 --> GW["AI Gateway
→ Claude / GPT"]
F3 --> AI
end
subgraph "Tool Layer"
T --> MCP["MCP Server
(McpAgent on DO)"]
MCP --> BR["Browser Run"]
MCP --> SB["Sandbox"]
MCP --> EX["External APIs"]
end
subgraph "Storage Layer"
T --> D1["D1
Agent metadata"]
T --> R2["R2
Files & artifacts"]
T --> VZ["Vectorize
RAG embeddings"]
T --> Q["Queues
Async tasks"]
end
style U fill:#e94560,stroke:#fff,color:#fff
style W fill:#2c3e50,stroke:#fff,color:#fff
style T fill:#e94560,stroke:#fff,color:#fff
style F1 fill:#2c3e50,stroke:#fff,color:#fff
style F2 fill:#2c3e50,stroke:#fff,color:#fff
style F3 fill:#2c3e50,stroke:#fff,color:#fff
style AI fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style GW fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style MCP fill:#e94560,stroke:#fff,color:#fff
style BR fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style SB fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style EX fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style D1 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
style R2 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
style VZ fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
style Q fill:#f8f9fa,stroke:#ff9800,color:#2c3e50
Figure 4: end-to-end production AI Agent architecture on Cloudflare
9. Comparing with other platforms
| Criterion | Cloudflare Agent Cloud | AWS Lambda + Bedrock | Azure Functions + OpenAI |
|---|---|---|---|
| Cold start | <5ms (V8 isolate) | 100 ms–2 s (container) | 200 ms–3 s (container) |
| Stateful agents | Durable Objects (native) | DynamoDB / Step Functions | Durable Functions |
| Agent hibernation | Built-in, automatic | None (DIY) | Yes (Durable Functions) |
| MCP support | McpAgent + native OAuth | Build your own | Build your own |
| Built-in AI models | 50+ open-source at the edge | Bedrock (managed) | Azure OpenAI (managed) |
| Free tier | Very generous (Workers, R2, D1, KV, Queues) | 1M Lambda requests, limited Bedrock | 1M Functions requests, AI credit |
| Edge locations | 330+ cities | 30+ regions | 60+ regions |
| Dynamic code execution | Dynamic Workers (native) | No native option | No native option |
Important caveats
Cloudflare Workers have a CPU-time limit (10 ms free, 30 s paid) — not suitable for long, CPU-intensive tasks. Durable Object Facets are still in beta. Dynamic Workers are limited to Workers Paid plans. For workloads needing GPU training or heavy long-running compute, AWS/Azure/GCP are still better fits.
10. Getting started with Cloudflare Agent Cloud
Here's a simple example of building an AI agent with Project Think:
// wrangler.toml
// name = "my-ai-agent"
// main = "src/index.ts"
// compatibility_date = "2026-04-01"
// [ai]
// binding = "AI"
// [[durable_objects.bindings]]
// name = "AGENT"
// class_name = "ResearchAgent"
import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";
import { tool } from "ai";
import { z } from "zod";
export class ResearchAgent extends Think<Env> {
getModel() {
return createWorkersAI({ binding: this.env.AI })(
"@cf/google/gemma-4-26b-a4b"
);
}
getTools() {
return {
search: tool({
description: "Search the web for information",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
// Use AI Search or an external API
const results = await this.env.AI.run(
"@cf/cloudflare/ai-search",
{ query }
);
return results;
},
}),
saveNote: tool({
description: "Save a note to storage",
parameters: z.object({
title: z.string(),
content: z.string()
}),
execute: async ({ title, content }) => {
// Durable Object SQLite — persists across hibernation
await this.sql.exec(
"INSERT INTO notes (title, content, created_at) VALUES (?, ?, ?)",
title, content, new Date().toISOString()
);
return { saved: true };
},
}),
};
}
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const id = env.AGENT.idFromName("default");
const agent = env.AGENT.get(id);
return agent.fetch(request);
},
};
Deployment is a single command:
npx wrangler deploy
11. Cloudflare Developer Platform timeline
12. Conclusion
Cloudflare Agent Cloud marks an important shift: from "a place to host websites" to "a place to run AI agents". The combination of V8 isolates (fast, lightweight), Durable Objects (stateful, with hibernation), and a rich storage/AI ecosystem creates a platform few rivals can match on developer experience and operating cost.
The generous free tier — Workers (100K req/day), R2 (10 GB), D1 (5 GB), KV (1 GB), Queues (10K ops/day), Vectorize (5M vectors) — makes it an ideal place to prototype and even run production for small-to-medium AI agent projects at almost zero cost.
The bottom line
If you're building AI agents and need: (1) extremely fast cold starts, (2) automatic state management with hibernation, (3) native MCP server support, and (4) a generous free tier — Cloudflare Agent Cloud deserves to be the first platform you evaluate. The main limits are the CPU time cap and the fact that some features are still in beta.
References
- Cloudflare Agents Week 2026 — Updates and Announcements
- Project Think: Building the next generation of AI agents on Cloudflare
- Durable Objects in Dynamic Workers: Give each AI-generated app its own database
- Building AI Agents with MCP, Authentication & Authorization, and Durable Objects
- Cloudflare Workers Pricing
- Cloudflare Queues now available on Workers Free plan
- New Workers AI models for text generation and embedding
- Cloudflare Expands its Agent Cloud to Power the Next Generation of Agents
Database Sharding — Data Partitioning Strategies When Your System Hits the Ceiling
Cloudflare Tunnel + Zero Trust — Expose Internal Apps to the Internet Securely and for Free
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.