Cloudflare Agent Cloud 2026 — Building AI Agents at the Edge with Workers, Durable Objects, and Project Think

Posted on: 4/18/2026 2:09:47 AM

1. Cloudflare — From CDN to Agent Cloud

Cloudflare has been on a remarkable journey: from a pure CDN and web-security provider to a full-fledged platform for AI Agents. With Agents Week 2026 (April 2026), Cloudflare officially introduced the Agent Cloud concept — a vision that turns its global edge network into infrastructure for distributed, stateful, serverless AI agents.

The core differentiator: instead of renting a 24/7 VM to host each agent, Cloudflare lets agents hibernate when idle and only consume resources during actual work — pushing running costs for idle agents close to zero.

330+ Cities with edge PoPs
100K Free requests/day (Workers)
50+ Open-source AI models
<5ms Dynamic Workers cold start

2. Workers & Dynamic Workers — V8 Isolates at the Edge

2.1. Cloudflare Workers: the serverless edge platform

Cloudflare Workers runs JavaScript/TypeScript on V8 isolates — the same engine powering Chrome — at more than 330 edge locations worldwide. Not containers, not VMs — each request is handled inside a lightweight isolate with near-zero cold start.

Key technical characteristics:

  • Free tier: 100,000 requests/day, 10 ms CPU time per invocation
  • Paid ($5/month): 10 million requests, 30 s CPU time, unlimited static assets
  • Supported languages: JavaScript, TypeScript, Python, Rust (via WASM)
  • Bindings: direct connections to KV, R2, D1, Queues, and Durable Objects with no network hop

2.2. Dynamic Workers: code-at-runtime

Dynamic Workers (open beta in March 2026) is a major step forward: it allows code to be injected and executed at runtime via an API, with no prior deployment. It's the foundation for AI-generated code — the agent writes code, pushes it, and the code runs instantly.

Dynamic Workers vs containers

Dynamic Workers start up 100× faster and use 1/10 the memory of containers. With single-digit-millisecond startup and single-digit-MB memory, they're light enough to be used once and thrown away — ideal as a for AI-generated code.

Dynamic Workers follow the zero-ambient-authority security principle:

// A Dynamic Worker starts with no permissions at all
const worker = await createDynamicWorker({
  code: agentGeneratedCode,
  bindings: {
    globalOutbound: null,  // No network access
    // Only grant what's needed via bindings
    DB: env.MY_D1_DATABASE,
    STORAGE: env.MY_R2_BUCKET,
  }
});

3. Durable Objects & Facets — State for AI Agents

3.1. Durable Objects: single-threaded actor model

Durable Objects solve serverless's biggest challenge: state. Each Durable Object is a single-threaded actor with:

  • Its own SQLite database on local disk — near-zero latency
  • Transactional storage for consistency
  • Hibernation: it sleeps when idle and wakes on request — zero cost while inactive
  • WebSocket support: maintains real-time connections

The economic implications are clear: if you have 10,000 AI agents but only 1% are active at once, a traditional VM setup needs 10,000 instances running continuously. With Durable Objects + hibernation you only need ~100 active instances at any given moment.

3.2. Durable Object Facets: isolation for dynamic code

Facets (Agents Week 2026) extend Durable Objects with a parent–child model:

graph TD
    A["🏗️ Parent Durable Object
(Platform code)"] --> B["📊 Parent SQLite
Metadata, billing, logs"] A --> C["🔒 Facet (Child)
AI-generated dynamic code"] C --> D["💾 Child SQLite
Application data"] B -.->|"❌ Isolated"| D A --> E["🔒 Facet (Child 2)
Another application"] E --> F["💾 Child SQLite 2
Separate data"] D -.->|"❌ Isolated"| F style A fill:#e94560,stroke:#fff,color:#fff style C fill:#2c3e50,stroke:#fff,color:#fff style E fill:#2c3e50,stroke:#fff,color:#fff style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 1: the parent–child architecture of Durable Object Facets

Each Facet has its own SQLite database, fully isolated from the parent and from other Facets. The parent controls rate limiting, quotas, and billing — while the child focuses purely on application logic. This is the foundation for platforms that let AI create applications with persistent state.

export class AppRunner extends DurableObject {
  async fetch(request: Request): Promise<Response> {
    // Load dynamic code from R2 or an API
    const appCode = await this.env.R2.get("apps/user-123/code.js");

    // Create a facet — each app gets its own SQLite
    const facet = this.ctx.facets.get("user-app-123", {
      className: "UserApp",
      code: await appCode.text(),
    });

    return facet.fetch(request);
  }
}

4. Project Think — the Next-Generation Agents SDK

Project Think is Cloudflare's official framework for AI agents, built on top of Durable Objects. Instead of gluing primitives together yourself, Think provides a base class that handles the full lifecycle of an agent.

4.1. Core architecture

The Think base class — a minimal agent

import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";

export class MyAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/moonshotai/kimi-k2.5"
    );
  }
}

Only getModel() needs to be overridden — Think manages conversation, memory, tool execution, and persistence.

4.2. Key primitives

Durable Execution with Fibers:

Fibers allow agent loops to run for many minutes (or longer) without losing progress. Each fiber is recorded in SQLite before it runs, can checkpoint at any time, and recovers automatically if the platform restarts.

await this.runFiber("research-task", async (fiber) => {
  const results = await this.searchWeb(query);
  await fiber.stash(); // Checkpoint — safe against crashes

  const analysis = await this.analyzeResults(results);
  await fiber.stash(); // Second checkpoint

  return this.generateReport(analysis);
});

Sub-agents via Facets:

Each sub-agent is a child Durable Object with its own SQLite, communicating over typed RPC. The parent agent delegates work to sub-agents — each running isolated and able to hibernate independently.

Persistent Sessions:

Conversations are stored as a tree (parent-message relationships), with non-destructive compaction (summarize rather than delete) and full-text search via SQLite FTS5. Sessions can be forked to explore multiple directions without losing the original context.

4.3. The Execution Ladder — 5 tiers

Tier Name Capability Use case
0 Workspace Durable filesystem (SQLite + R2) Store files, config, data
1 Dynamic Workers V8 isolate, zero ambient authority Safely run code from AI
2 NPM Resolution Bundler + npm packages Complex code needing dependencies
3 Browser Headless browser automation Scraping, testing, screenshots
4 Sandbox Full toolchain + git access Building, compiling, deploying projects

5. Workers AI & AI Gateway — Unified Model Access

5.1. Workers AI: 50+ models at the edge

Workers AI provides inference for 50+ open-source models directly on Cloudflare's GPU network. No infrastructure to manage — call a model via a binding just like calling a function:

const response = await env.AI.run(
  "@cf/meta/llama-4-scout-17b-16e-instruct",
  {
    messages: [
      { role: "user", content: "Analyze microservices architecture" }
    ]
  }
);

Notable recent models (April 2026):

  • Google Gemma 4 26B A4B — MoE with 26B total / 4B active, 256K context, supports vision + thinking + function calling
  • GLM-4.7-Flash — 131K-token context, optimized for summarization
  • Qwen3-30B-A3B — MoE activating only 3B parameters per forward pass
  • EmbeddingGemma-300M — 768-dim vectors, optimized for low-latency embedding

5.2. AI Gateway: a unified proxy for every AI provider

AI Gateway acts as a unified inference layer, supporting 14+ providers (OpenAI, Anthropic, Google, Mistral, …) through a single interface. The new breakthrough: the same AI.run() binding works for both Workers AI models and third-party models.

graph LR
    A["🤖 AI Agent"] --> B["🌐 AI Gateway"]
    B --> C["Workers AI
Llama, Gemma, Qwen"] B --> D["OpenAI
GPT-4.1, o4"] B --> E["Anthropic
Claude Opus, Sonnet"] B --> F["Google
Gemini 2.5"] B --> G["Caching Layer"] B --> H["Rate Limiting"] B --> I["AI Firewall"] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#2c3e50,stroke:#fff,color:#fff style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style G fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style H fill:#f8f9fa,stroke:#ff9800,color:#2c3e50 style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 2: AI Gateway — a unified inference layer for multiple providers

AI Gateway provides:

  • Caching: cache responses for identical prompts, reducing cost and latency
  • Rate limiting: control quotas per user/key/endpoint
  • AI Firewall: detect prompt injection and data exfiltration before requests reach the model
  • Analytics: dashboards for token usage, latency, and error rate per model/provider
  • Fallback: automatically switch to another provider when one has issues

6. MCP Server at the Edge

Cloudflare has partnered closely with Anthropic to build infrastructure for remote MCP servers — bringing the Model Context Protocol to the edge with scale and hibernation.

MCP + Durable Objects = Stateful Tool Servers

Each MCP server instance runs as an McpAgent (extending Durable Object), automatically benefiting from hibernation — sleeping when idle and waking with state intact when the agent needs it. This answers the "MCP servers need to be always on but can't run 24/7 for every user" problem.

An enterprise MCP architecture on Cloudflare:

graph TD
    A["AI Agent / Claude"] -->|"MCP Request"| B["Cloudflare Access
OAuth 2.0 / RFC 9728"] B --> C["AI Gateway
Code Mode reduces tokens"] C --> D["MCP Server Portal
(McpAgent on DO)"] D --> E["Tool: Database Query"] D --> F["Tool: File Storage"] D --> G["Tool: External API"] D -.->|"Hibernate when idle"| H["💤 State preserved
in SQLite"] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#2c3e50,stroke:#fff,color:#fff style C fill:#2c3e50,stroke:#fff,color:#fff style D fill:#e94560,stroke:#fff,color:#fff style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style H fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Figure 3: Enterprise MCP deployment on Cloudflare

Standout features:

  • Code Mode: reduces token cost by compressing tool descriptions
  • Managed OAuth: implements RFC 9728 for agent authentication without building your own OAuth flow
  • Cloudflare Mesh: grants MCP servers access to private databases/APIs without manual tunneling
  • Scannable API tokens: resource-scoped permissions that honor least-privilege

7. The supporting service ecosystem

7.1. Storage Layer

Service Type Free Tier Use case for agents
R2 Object storage (S3-compatible) 10 GB storage, 1M Class A / 10M Class B ops/month Artifacts, code, models, large files
D1 SQLite database 5 GB storage, 5M rows read/day Metadata, user data, agent state
KV Key-value store 1 GB storage, 100K reads/day Config, feature flags, session data
Queues Message queue 10K operations/day (new 02/2026) Task scheduling, async workflows
Vectorize Vector database 5M vectors, 30M query dimensions/month RAG, semantic search, embeddings

7.2. Agents Week 2026 — New services

Sandboxes (GA)

Isolated environments with a shell, filesystem, and background processes. Agents can create es, install packages, run builds, and maintain state across sessions. This is Tier 4 of the Execution Ladder.

Browser Run — upgraded headless browser

Concurrency is 4× higher than before. New features include Live View (watch what the browser is doing in real time) and Human-in-Loop (the agent pauses and waits for user input when needed). Ideal for web scraping, testing, and automated form filling.

Artifacts — Git-compatible storage

Versioned storage supporting tens of millions of repos. Agents can create, commit, and manage code repositories directly — without external GitHub/GitLab for small projects.

Email Service (Public Beta)

Agents send, receive, and process email natively — no separate SendGrid or SES. Useful for agents that need to interact with humans over email.

Flagship — feature flags at the edge

Evaluates feature flags with sub-millisecond latency using KV + Durable Objects. Agents can check flags before executing logic with virtually no overhead.

Unweight — LLM model compression

Reduces the footprint of LLM models by 22% through lossless compression. The practical impact: the same GPU fits more models, lowering inference cost on Workers AI.

8. Reference architecture: an AI Agent on Cloudflare

Here's an end-to-end architecture for a production AI agent running entirely on Cloudflare:

graph TD
    subgraph "Client Layer"
        U["👤 User"] --> W["Worker
(API Gateway)"] end subgraph "Agent Layer" W --> T["Think Agent
(Durable Object)"] T --> F1["Sub-agent 1
(Facet - Research)"] T --> F2["Sub-agent 2
(Facet - Code Gen)"] T --> F3["Sub-agent 3
(Facet - Review)"] end subgraph "AI Layer" F1 --> AI["Workers AI
Llama 4 / Gemma 4"] F2 --> GW["AI Gateway
→ Claude / GPT"] F3 --> AI end subgraph "Tool Layer" T --> MCP["MCP Server
(McpAgent on DO)"] MCP --> BR["Browser Run"] MCP --> SB["Sandbox"] MCP --> EX["External APIs"] end subgraph "Storage Layer" T --> D1["D1
Agent metadata"] T --> R2["R2
Files & artifacts"] T --> VZ["Vectorize
RAG embeddings"] T --> Q["Queues
Async tasks"] end style U fill:#e94560,stroke:#fff,color:#fff style W fill:#2c3e50,stroke:#fff,color:#fff style T fill:#e94560,stroke:#fff,color:#fff style F1 fill:#2c3e50,stroke:#fff,color:#fff style F2 fill:#2c3e50,stroke:#fff,color:#fff style F3 fill:#2c3e50,stroke:#fff,color:#fff style AI fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style GW fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style MCP fill:#e94560,stroke:#fff,color:#fff style BR fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style SB fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style EX fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style D1 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50 style R2 fill:#f8f9fa,stroke:#ff9800,color:#2c3e50 style VZ fill:#f8f9fa,stroke:#ff9800,color:#2c3e50 style Q fill:#f8f9fa,stroke:#ff9800,color:#2c3e50

Figure 4: end-to-end production AI Agent architecture on Cloudflare

9. Comparing with other platforms

Criterion Cloudflare Agent Cloud AWS Lambda + Bedrock Azure Functions + OpenAI
Cold start <5ms (V8 isolate) 100 ms–2 s (container) 200 ms–3 s (container)
Stateful agents Durable Objects (native) DynamoDB / Step Functions Durable Functions
Agent hibernation Built-in, automatic None (DIY) Yes (Durable Functions)
MCP support McpAgent + native OAuth Build your own Build your own
Built-in AI models 50+ open-source at the edge Bedrock (managed) Azure OpenAI (managed)
Free tier Very generous (Workers, R2, D1, KV, Queues) 1M Lambda requests, limited Bedrock 1M Functions requests, AI credit
Edge locations 330+ cities 30+ regions 60+ regions
Dynamic code execution Dynamic Workers (native) No native option No native option

Important caveats

Cloudflare Workers have a CPU-time limit (10 ms free, 30 s paid) — not suitable for long, CPU-intensive tasks. Durable Object Facets are still in beta. Dynamic Workers are limited to Workers Paid plans. For workloads needing GPU training or heavy long-running compute, AWS/Azure/GCP are still better fits.

10. Getting started with Cloudflare Agent Cloud

Here's a simple example of building an AI agent with Project Think:

// wrangler.toml
// name = "my-ai-agent"
// main = "src/index.ts"
// compatibility_date = "2026-04-01"
// [ai]
// binding = "AI"
// [[durable_objects.bindings]]
// name = "AGENT"
// class_name = "ResearchAgent"

import { Think } from "@cloudflare/agents";
import { createWorkersAI } from "@cloudflare/agents/ai";
import { tool } from "ai";
import { z } from "zod";

export class ResearchAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/google/gemma-4-26b-a4b"
    );
  }

  getTools() {
    return {
      search: tool({
        description: "Search the web for information",
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          // Use AI Search or an external API
          const results = await this.env.AI.run(
            "@cf/cloudflare/ai-search",
            { query }
          );
          return results;
        },
      }),
      saveNote: tool({
        description: "Save a note to storage",
        parameters: z.object({
          title: z.string(),
          content: z.string()
        }),
        execute: async ({ title, content }) => {
          // Durable Object SQLite — persists across hibernation
          await this.sql.exec(
            "INSERT INTO notes (title, content, created_at) VALUES (?, ?, ?)",
            title, content, new Date().toISOString()
          );
          return { saved: true };
        },
      }),
    };
  }
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const id = env.AGENT.idFromName("default");
    const agent = env.AGENT.get(id);
    return agent.fetch(request);
  },
};

Deployment is a single command:

npx wrangler deploy

11. Cloudflare Developer Platform timeline

2017
Launched Cloudflare Workers — serverless on V8 isolates at the edge. "Compute near the user" takes off.
2021
Launched Durable Objects — solving serverless state. R2 Object Storage competes directly with S3, with no egress fees.
2022
D1 (SQLite at the edge) and Queues enter beta. The storage ecosystem starts to feel complete.
2023
Launched Workers AI and AI Gateway — Cloudflare enters the AI inference market. Vectorize for vector search.
2024
Initial Agents SDK. Support for MCP servers on Workers. Browser Rendering for headless automation.
02/2026
Queues free — 10K ops/day on the free plan. Dynamic Workers enters open beta.
04/2026
Agents Week: Project Think, Durable Object Facets, Sandboxes GA, Browser Run 4×, Artifacts, Email Service, AI Firewall, Managed OAuth, Flagship, Unweight. Cloudflare officially positions itself as Agent Cloud.

12. Conclusion

Cloudflare Agent Cloud marks an important shift: from "a place to host websites" to "a place to run AI agents". The combination of V8 isolates (fast, lightweight), Durable Objects (stateful, with hibernation), and a rich storage/AI ecosystem creates a platform few rivals can match on developer experience and operating cost.

The generous free tier — Workers (100K req/day), R2 (10 GB), D1 (5 GB), KV (1 GB), Queues (10K ops/day), Vectorize (5M vectors) — makes it an ideal place to prototype and even run production for small-to-medium AI agent projects at almost zero cost.

The bottom line

If you're building AI agents and need: (1) extremely fast cold starts, (2) automatic state management with hibernation, (3) native MCP server support, and (4) a generous free tier — Cloudflare Agent Cloud deserves to be the first platform you evaluate. The main limits are the CPU time cap and the fact that some features are still in beta.

References