Cloudflare AI Platform 2026 — Edge Infrastructure for Serverless AI Agents

Posted on: 4/17/2026 8:08:58 PM

Table of contents

1. A tour of the Cloudflare AI Platform
2. AI Gateway — A unified inference layer
1. 2.1. Standout features
2. 2.2. Usage inside Workers
  1. 💡 The nice bit
3. Workers AI — Inference at the edge
1. 3.1. Model catalog
2. 3.2. Bring Your Own Model (BYOM)
  1. 📊 Free tier
4. Dynamic Workers — Sandbox for AI agents
1. 4.1. Compared with traditional containers
5. Project Think — A framework for 3rd-generation AI agents
1. 5.1. The three generations of AI agents
2. 5.2. Core architecture
6. Project Think's five primitives
7. The Think base class — a built-in agentic loop
1. 7.1. Lifecycle hooks
8. Persistent memory — Agents that remember everything
9. Self-authored extensions
1. ⚠️ Security by architecture
10. Pricing and the free tier
1. 10.1. Realistic cost estimates
  1. 💡 Cost comparison
11. Hands-on: your first agent with Project Think
12. When should you use the Cloudflare AI Platform?
13. Conclusion
1. 📌 Quick recap

When AI agents stop being simple chatbots and turn into distributed systems executing millions of tasks in parallel, the question is no longer "which model do we use?" but "where does the agent run, how, and at what cost?". Cloudflare has just delivered an ambitious answer with its AI Platform — a complete infrastructure layer that turns a global network of 330+ data centers into a runtime for serverless AI agents, from inference to execution, from to persistent memory.

This article dives deep into the Cloudflare AI Platform 2026 architecture, including AI Gateway, Workers AI, Dynamic Workers, and especially Project Think — the framework for building next-generation AI agents with durable execution, sub-agent orchestration, and zero idle cost.

1. A tour of the Cloudflare AI Platform

330+ Global data centers

70+ Ready-to-use AI models

14+ Integrated providers

10,000 Free Neurons/day

Cloudflare has evolved from a CDN/security vendor into a full-stack AI infrastructure. Instead of just caching and protecting traffic, the edge network is now where inference runs, where code executes, where agent state lives, and where multi-model workflows are orchestrated — all serverless.

2. AI Gateway — A unified inference layer

AI Gateway is the unified middle layer between your application and any AI model. Instead of integrating directly with each provider (OpenAI, Anthropic, Google, …), you call everything through one API.

graph LR
    A["🖥️ Application"] --> B["AI Gateway"]
    B --> C["OpenAI"]
    B --> D["Anthropic"]
    B --> E["Google AI"]
    B --> F["Workers AI"]
    B --> G["Custom Model"]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

AI Gateway — one API, many providers, automatic failover

2.1. Standout features

Feature	Description	Benefit
Automatic failover	Automatically switches to a backup provider when a model goes down	High uptime, no complex retry logic
Streaming resilience	Buffers the streaming response independently of the agent's lifetime, allowing reconnects	No lost responses during network interruptions
Cost attribution	Attach custom metadata (team, user, workflow) to every request	Segment-level cost control
Unified billing	Manage spend across every provider in one place	One dashboard, no need to aggregate multiple invoices

2.2. Usage inside Workers

// Call a model through the AI binding — same API for every provider
const response = await env.AI.run(
  'anthropic/claude-sonnet-4-6',
  { input: 'Analyze microservices architecture' },
  {
    gateway: { id: "default" },
    metadata: { teamId: "backend", userId: 12345 }
  }
);

💡 The nice bit

When a model is available on multiple providers (e.g. Llama 3 on both Workers AI and Replicate), AI Gateway automatically routes to the fastest endpoint and fails over when needed — no retry logic required.

3. Workers AI — Inference at the edge

Workers AI lets you run AI models directly on Cloudflare's edge network, minimizing latency by pushing inference as close to the user as possible.

3.1. Model catalog

The model ecosystem keeps growing:

Model type	Examples	Cost (per M tokens/units)
Text generation	Llama 3.2-1b, DeepSeek R1-32b	$0.027 – $4.88 output
Embedding	BGE-small, BGE-large	$0.020 – $0.204
Image generation	Flux-1-Schnell, Flux-2-Dev	~$0.00005/tile
Speech-to-Text	Whisper	$0.0005/minute
Text-to-Speech	Deepgram Aura	$0.015/1k chars

3.2. Bring Your Own Model (BYOM)

Cloudflare integrates Cog technology from Replicate (the Replicate team officially joined Cloudflare), letting you package custom models into containers and deploy them onto Workers AI:

# cog.yaml — packaging a fine-tuned model
build:
  python_version: "3.11"
  python_packages:
    - "torch==2.3.0"
    - "transformers==4.42.0"

predict: "predict.py:Predictor"

📊 Free tier

10,000 Neurons free every day on both Free and Paid plans. Beyond that it's just $0.011/1,000 Neurons. For small text-generation workloads (Llama 3.2-1b), 10K neurons translates to thousands of requests — plenty for prototypes and side projects.

4. Dynamic Workers — Sandbox for AI agents

This is the keystone of agent execution. Dynamic Workers is a V8-isolate-based runtime that starts up 100× faster than traditional containers, letting AI agents safely execute code in a .

graph TD
    A["AI Agent"] --> B["Generate Code"]
    B --> C["Dynamic Worker
V8 Isolate"]
    C --> D["Execute
Sandboxed"]
    D --> E{"Result"}
    E -->|"✅ Success"| F["Return to Agent"]
    E -->|"❌ Timeout/Error"| G["Agent retries
or switches strategy"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

Dynamic Workers — agent writes code, the isolate executes it safely

4.1. Compared with traditional containers

Criterion	Container (Docker/K8s)	Dynamic Workers
Cold start	Seconds to minutes	Milliseconds
Isolation	Process-level	V8 isolate (lighter)
Idle cost	You pay even when unused	Zero when hibernated
Scale	Manual/HPA	Millions of concurrent — automatic
Security	Needs network policy config	Sandboxed by architecture

5. Project Think — A framework for 3rd-generation AI agents

Project Think is Cloudflare's most ambitious vision for AI agents: not just a framework, but an infrastructure architecture that makes agents first-class citizens of the edge network.

5.1. The three generations of AI agents

Generation 1 — Chatbot

Stateless, reactive, no memory of context. Every request is a brand-new conversation.

Generation 2 — Coding agent

Stateful, uses tools, but runs on a single laptop or server. Claude Code, Cursor, GitHub Copilot belong here.

Generation 3 — Infrastructure agent

Durable, distributed, serverless, Internet-native. Survives crashes, costs nothing when idle, secured by architecture rather than behavioral constraints. Project Think targets this generation.

5.2. Core architecture

graph TB
    subgraph "Project Think Architecture"
        A["Think Base Class"] --> B["Durable Objects
Identity + State + SQLite"]
        A --> C["Dynamic Workers
Code Execution"]
        A --> D["AI Gateway
Multi-model Inference"]
        A --> E["R2 + SQLite
Persistent Filesystem"]

        B --> F["Fibers
Durable Execution"]
        B --> G["Facets
Sub-Agents"]
        B --> H["Sessions
Conversation Trees"]

        C --> I["Tier 0: Workspace"]
        C --> J["Tier 1: JS Sandbox"]
        C --> K["Tier 2: npm Runtime"]
        C --> L["Tier 3: Headless Browser"]
        C --> M["Tier 4: Full Sandbox"]
    end

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style J fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style K fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style L fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style M fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Project Think architecture — from the base class to the execution ladder

Every agent is a Durable Object — with its own identity, persistent state in SQLite, and automatic hibernation when idle. This completely rewrites agent economics:

Metric	VMs/Containers	Durable Objects
Idle cost	Full compute 24/7	Zero (hibernated)
10,000 agents, 1% active	10,000 running instances	~100 active at once
Model	"1 server for N users"	"1 agent per user"

6. Project Think's five primitives

6.1. Fibers — Durable execution

Agents survive crashes thanks to a checkpoint-and-recover mechanism. Every runFiber() call writes a checkpoint to SQLite before running; if the environment gets terminated, the agent recovers automatically:

class ResearchAgent extends Think<Env> {
  async onChat(message: string) {
    // The fiber is registered in SQLite before it runs
    await this.runFiber('research', async () => {
      const sources = await this.searchWeb(message);
      const analysis = await this.analyze(sources);
      return this.respond(analysis);
    });
  }

  // If it crashes mid-way, the fiber recovers itself
  async onFiberRecovered(fiberId: string) {
    console.log(`Recovering fiber: ${fiberId}`);
    // Resume from the last checkpoint
  }
}

A sub-agent is a separate Durable Object with its own SQLite database, communicating via typed RPC. There's no implicit data sharing — every sub-agent is fully isolated:

// The parent agent delegates to sub-agents
const researcher = this.createFacet('researcher');
const writer = this.createFacet('writer');

// Typed RPC — type-safe, isolated
const findings = await researcher.chat(
  'Find 5 recent articles about edge computing',
  streamRelay
);

const draft = await writer.chat(
  `Write a summary based on: ${findings}`,
  streamRelay
);

6.3. Persistent Sessions — Conversation trees

Instead of only storing linear history, Project Think supports tree-structured conversations: fork branches, non-destructive compaction, and full-text search via SQLite FTS5.

6.4. Sandboxed code execution

Instead of calling tools step by step (chat → call tool → chat → call tool), the agent writes a complete program and runs it in a . The @cloudflare/codemode package reports a 99.9% token reduction over traditional tool-calling.

6.5. Execution Ladder — Gradual capability escalation

Tier	Capability	Use case
Tier 0	Workspace (filesystem)	Read/write files, manage projects
Tier 1	Dynamic Workers (JS )	Computation, data transformation
Tier 2	npm resolution at runtime	Using NPM packages on the fly
Tier 3	Headless browser	Web scraping, automation
Tier 4	Full (git, compiler, tests)	Build & deploy pipelines

💡 Progressive capability

The Execution Ladder lets an agent start small (Tier 0) and only "climb" when truly needed. Agent writing a simple script runs on Tier 1, needs to fetch web pages → Tier 3, needs to build a project → Tier 4 only then. You save resources and shrink the attack surface.

7. The Think base class — a built-in agentic loop

Project Think ships the Think base class that handles the full lifecycle: agentic loop, message persistence, streaming, tool execution, and extensions.

import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';

export class MyAgent extends Think<Env> {
  // Just declare the model
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

  // Lifecycle hooks for customization
  async beforeTurn(messages) {
    // Inject system context, load memory
  }

  async beforeToolCall(toolName, args) {
    // Validate, log, or modify tool calls
  }

  async afterToolCall(toolName, result) {
    // Post-process, cache results
  }

  async onStepFinish(step) {
    // Checkpoint progress
  }
}

7.1. Lifecycle hooks

graph LR
    A["beforeTurn()"] --> B["streamText()"]
    B --> C["beforeToolCall()"]
    C --> D["Tool Execution"]
    D --> E["afterToolCall()"]
    E --> F["onStepFinish()"]
    F -->|"Needs another tool"| C
    F -->|"Done"| G["onChatResponse()"]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff

The lifecycle of a turn inside a Think agent

8. Persistent memory — Agents that remember everything

Project Think implements memory via Context Blocks — structured sections the model can read and update, persisted across hibernation:

MEMORY (Important facts about user) [42%, 462/1100 tokens]
- User prefers writing TypeScript
- Timezone: UTC+7
- Current project: e-commerce platform

PREFERENCES (Working style) [18%, 198/1100 tokens]
- Prefer functional over OOP
- Wants concise responses

Token usage is shown as a percentage, giving the agent the ability to self-manage its context window — it knows when to compact memory to avoid exceeding limits.

9. Self-authored extensions

This is the boldest feature: the agent writes its own extensions. Extensions are TypeScript code running inside Dynamic Workers, with declared permissions:

// The agent generates an extension when a new capability is needed
const extension = {
  name: 'price-checker',
  permissions: {
    network: ['api.example.com'],  // Only this domain is reachable
    workspace: ['read']             // Read-only file access
  },
  code: `
    export async function checkPrice(symbol: string) {
      const res = await fetch('https://api.example.com/price/' + symbol);
      return res.json();
    }
  `
};

// Extensions persist in storage and survive hibernation
await this.installExtension(extension);

⚠️ Security by architecture

Extensions run inside isolated V8 isolates; declared permissions are enforced at runtime. The agent can't grant itself more permissions — this is structural security, not prompt-based restrictions.

10. Pricing and the free tier

One of the most appealing parts of the Cloudflare AI Platform is its friendly pricing model:

$0 10,000 Neurons/day (Free)

$0.011 Per 1,000 Neurons (over the free limit)

$5/month Workers Paid plan

50,000 Concurrent workflow instances

10.1. Realistic cost estimates

Scenario	Estimate	Cost/month
Side project / prototype	~5,000 text-generation requests/day	$0 (in the free tier)
Startup (1,000 users)	50K requests/day, mixed models	~$15–30
Enterprise (10K agents)	1% active, Durable Objects + inference	You only pay for ~100 active agents

💡 Cost comparison

With the traditional container model, running 10,000 agent instances 24/7 on EC2/GKE can cost thousands of dollars a month. Durable Objects hibernate when idle → near-zero cost for the 99% of agents that aren't active.

11. Hands-on: your first agent with Project Think

11.1. Setup

# Initialize the project
npm create cloudflare@latest my-agent -- --template think
cd my-agent

# Install dependencies
npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider

11.2. A basic agent

// src/index.ts
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';
import { tool } from 'ai';
import { z } from 'zod';

interface Env {
  AI: Ai;
}

export class BlogAssistant extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

  getTools() {
    return {
      searchArticles: tool({
        description: 'Search articles by keyword',
        parameters: z.object({
          query: z.string().describe('Search keyword')
        }),
        execute: async ({ query }) => {
          // Agent logic here
          return { results: [] };
        }
      })
    };
  }
}

export default {
  fetch(request: Request, env: Env) {
    // Route to agent
  }
};

11.3. Deploy

# Deploy to Cloudflare
npx wrangler deploy

# The agent is live at the edge, 330+ locations
# Zero cold start, zero idle cost

12. When should you use the Cloudflare AI Platform?

Use it ✅	Think twice ⚠️
Agents that need low latency globally	Workloads that need dedicated GPUs (heavy fine-tuning)
Thousands of agents/users, mostly idle	Massive models not yet on Workers AI
Multi-model workflows (needing failover)	You need a full Linux environment (not just a JS )
Rapid prototyping on the free tier	You already have a well-tuned Kubernetes cluster
Agents that need persistent state + memory	Compliance demands specific data residency

13. Conclusion

Cloudflare AI Platform 2026 isn't just "adding AI to a CDN" — it's a complete infrastructure platform for the next generation of agents. With AI Gateway for unified inference, Workers AI to run models at the edge, Dynamic Workers to code execution, and Project Think to turn agents into durable infrastructure, Cloudflare is betting that the future of AI agents lives not on laptops or cloud VMs, but on the global edge network.

The 10,000-neurons-per-day free tier is enough to start experimenting today — no credit card required. And as you scale, the Durable Objects model guarantees you only pay for what's actually running.

📌 Quick recap

AI Gateway = unified inference layer for 14+ providers. Workers AI = 70+ models running at the edge. Dynamic Workers = V8 100× faster than containers. Project Think = framework for durable, distributed agents with zero idle cost. Free tier = 10K neurons/day.

References:
Cloudflare's AI Platform: an inference layer designed for agents
Project Think: building the next generation of AI agents on Cloudflare
Workers AI Pricing — Cloudflare Docs
Workers AI Overview — Cloudflare Docs
Cloudflare expands Agent Cloud — SiliconANGLE (04/2026)

#Cloudflare #AI Agent #Serverless #Edge Computing #Workers AI #system design

# Cloudflare AI Platform 2026 — Edge Infrastructure for Serverless AI Agents

When AI agents stop being simple chatbots and turn into distributed systems executing millions of tasks in parallel, the question is no longer *"which model do we use?"* but *"where does the agent run, how, and at what cost?"*. Cloudflare has just delivered an ambitious answer with its **AI Platform** — a complete infrastructure layer that turns a global network of 330+ data centers into a runtime for serverless AI agents, from inference to execution, from sandbox to persistent memory.

This article dives deep into the Cloudflare AI Platform 2026 architecture, including AI Gateway, Workers AI, Dynamic Workers, and especially **Project Think** — the framework for building next-generation AI agents with durable execution, sub-agent orchestration, and zero idle cost.

## 1. A tour of the Cloudflare AI Platform

330+ Global data centers

70+ Ready-to-use AI models

14+ Integrated providers

10,000 Free Neurons/day

Cloudflare has evolved from a CDN/security vendor into a **full-stack AI infrastructure**. Instead of just caching and protecting traffic, the edge network is now where inference runs, where code executes, where agent state lives, and where multi-model workflows are orchestrated — all serverless.

## 2. AI Gateway — A unified inference layer

```
graph LR
    A["🖥️ Application"] --> B["AI Gateway"]
    B --> C["OpenAI"]
    B --> D["Anthropic"]
    B --> E["Google AI"]
    B --> F["Workers AI"]
    B --> G["Custom Model"]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

```
AI Gateway — one API, many providers, automatic failover

### 2.1. Standout features

| Feature | Description | Benefit |
| --- | --- | --- |
| **Automatic failover** | Automatically switches to a backup provider when a model goes down | High uptime, no complex retry logic |
| **Streaming resilience** | Buffers the streaming response independently of the agent's lifetime, allowing reconnects | No lost responses during network interruptions |
| **Cost attribution** | Attach custom metadata (team, user, workflow) to every request | Segment-level cost control |
| **Unified billing** | Manage spend across every provider in one place | One dashboard, no need to aggregate multiple invoices |

### 2.2. Usage inside Workers

```typescript
// Call a model through the AI binding — same API for every provider
const response = await env.AI.run(
  'anthropic/claude-sonnet-4-6',
  { input: 'Analyze microservices architecture' },
  {
    gateway: { id: "default" },
    metadata: { teamId: "backend", userId: 12345 }
  }
);

```

#### 💡 The nice bit

## 3. Workers AI — Inference at the edge

Workers AI lets you run AI models directly on Cloudflare's edge network, minimizing latency by pushing inference as close to the user as possible.

### 3.1. Model catalog

The model ecosystem keeps growing:

| Model type | Examples | Cost (per M tokens/units) |
| --- | --- | --- |
| **Text generation** | Llama 3.2-1b, DeepSeek R1-32b | $0.027 – $4.88 output |
| **Embedding** | BGE-small, BGE-large | $0.020 – $0.204 |
| **Image generation** | Flux-1-Schnell, Flux-2-Dev | ~$0.00005/tile |
| **Speech-to-Text** | Whisper | $0.0005/minute |
| **Text-to-Speech** | Deepgram Aura | $0.015/1k chars |

### 3.2. Bring Your Own Model (BYOM)

Cloudflare integrates **Cog** technology from Replicate (the Replicate team officially joined Cloudflare), letting you package custom models into containers and deploy them onto Workers AI:

```yaml
# cog.yaml — packaging a fine-tuned model
build:
  python_version: "3.11"
  python_packages:
    - "torch==2.3.0"
    - "transformers==4.42.0"

predict: "predict.py:Predictor"

```

#### 📊 Free tier

**10,000 Neurons free every day** on both Free and Paid plans. Beyond that it's just **$0.011/1,000 Neurons**. For small text-generation workloads (Llama 3.2-1b), 10K neurons translates to thousands of requests — plenty for prototypes and side projects.

## 4. Dynamic Workers — Sandbox for AI agents

This is the keystone of agent execution. Dynamic Workers is a V8-isolate-based runtime that starts up **100× faster than traditional containers**, letting AI agents safely execute code in a sandbox.

```
graph TD
    A["AI Agent"] --> B["Generate Code"]
    B --> C["Dynamic Worker  
V8 Isolate"]
    C --> D["Execute  
Sandboxed"]
    D --> E{"Result"}
    E -->|"✅ Success"| F["Return to Agent"]
    E -->|"❌ Timeout/Error"| G["Agent retries  
or switches strategy"]

style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

```
Dynamic Workers — agent writes code, the isolate executes it safely

### 4.1. Compared with traditional containers

| Criterion | Container (Docker/K8s) | Dynamic Workers |
| --- | --- | --- |
| **Cold start** | Seconds to minutes | Milliseconds |
| **Isolation** | Process-level | V8 isolate (lighter) |
| **Idle cost** | You pay even when unused | Zero when hibernated |
| **Scale** | Manual/HPA | Millions of concurrent — automatic |
| **Security** | Needs network policy config | Sandboxed by architecture |

## 5. Project Think — A framework for 3rd-generation AI agents

Project Think is Cloudflare's most ambitious vision for AI agents: not just a framework, but **an infrastructure architecture** that makes agents first-class citizens of the edge network.

### 5.1. The three generations of AI agents

Generation 1 — Chatbot

Stateless, reactive, no memory of context. Every request is a brand-new conversation.

Generation 2 — Coding agent

Stateful, uses tools, but runs on a single laptop or server. Claude Code, Cursor, GitHub Copilot belong here.

Generation 3 — Infrastructure agent

**Durable, distributed, serverless, Internet-native.** Survives crashes, costs nothing when idle, secured by architecture rather than behavioral constraints. Project Think targets this generation.

### 5.2. Core architecture

```
graph TB
    subgraph "Project Think Architecture"
        A["Think Base Class"] --> B["Durable Objects  
Identity + State + SQLite"]
        A --> C["Dynamic Workers  
Code Execution"]
        A --> D["AI Gateway  
Multi-model Inference"]
        A --> E["R2 + SQLite  
Persistent Filesystem"]

B --> F["Fibers  
Durable Execution"]
        B --> G["Facets  
Sub-Agents"]
        B --> H["Sessions  
Conversation Trees"]

C --> I["Tier 0: Workspace"]
        C --> J["Tier 1: JS Sandbox"]
        C --> K["Tier 2: npm Runtime"]
        C --> L["Tier 3: Headless Browser"]
        C --> M["Tier 4: Full Sandbox"]
    end

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style J fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style K fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style L fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style M fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

```
Project Think architecture — from the base class to the execution ladder

Every agent is a **Durable Object** — with its own identity, persistent state in SQLite, and automatic hibernation when idle. This completely rewrites agent economics:

| Metric | VMs/Containers | Durable Objects |
| --- | --- | --- |
| **Idle cost** | Full compute 24/7 | Zero (hibernated) |
| **10,000 agents, 1% active** | 10,000 running instances | ~100 active at once |
| **Model** | "1 server for N users" | "1 agent per user" |

## 6. Project Think's five primitives

### 6.1. Fibers — Durable execution

Agents survive crashes thanks to a checkpoint-and-recover mechanism. Every `runFiber()` call writes a checkpoint to SQLite before running; if the environment gets terminated, the agent recovers automatically:

```typescript
class ResearchAgent extends Think<Env> {
  async onChat(message: string) {
    // The fiber is registered in SQLite before it runs
    await this.runFiber('research', async () => {
      const sources = await this.searchWeb(message);
      const analysis = await this.analyze(sources);
      return this.respond(analysis);
    });
  }

// If it crashes mid-way, the fiber recovers itself
  async onFiberRecovered(fiberId: string) {
    console.log(`Recovering fiber: ${fiberId}`);
    // Resume from the last checkpoint
  }
}

```

### 6.2. Facets — Sub-agent orchestration

A sub-agent is a separate Durable Object with its own SQLite database, communicating via typed RPC. There's no implicit data sharing — every sub-agent is fully isolated:

```typescript
// The parent agent delegates to sub-agents
const researcher = this.createFacet('researcher');
const writer = this.createFacet('writer');

// Typed RPC — type-safe, isolated
const findings = await researcher.chat(
  'Find 5 recent articles about edge computing',
  streamRelay
);

const draft = await writer.chat(
  `Write a summary based on: ${findings}`,
  streamRelay
);

```

### 6.3. Persistent Sessions — Conversation trees

Instead of only storing linear history, Project Think supports **tree-structured conversations**: fork branches, non-destructive compaction, and full-text search via SQLite FTS5.

### 6.4. Sandboxed code execution

Instead of calling tools step by step (chat → call tool → chat → call tool), the agent writes a complete program and runs it in a sandbox. The `@cloudflare/codemode` package reports a **99.9% token reduction** over traditional tool-calling.

### 6.5. Execution Ladder — Gradual capability escalation

| Tier | Capability | Use case |
| --- | --- | --- |
| **Tier 0** | Workspace (filesystem) | Read/write files, manage projects |
| **Tier 1** | Dynamic Workers (JS sandbox) | Computation, data transformation |
| **Tier 2** | npm resolution at runtime | Using NPM packages on the fly |
| **Tier 3** | Headless browser | Web scraping, automation |
| **Tier 4** | Full sandbox (git, compiler, tests) | Build & deploy pipelines |

#### 💡 Progressive capability

## 7. The Think base class — a built-in agentic loop

Project Think ships the `Think` base class that handles the full lifecycle: agentic loop, message persistence, streaming, tool execution, and extensions.

```typescript
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';

export class MyAgent extends Think<Env> {
  // Just declare the model
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

// Lifecycle hooks for customization
  async beforeTurn(messages) {
    // Inject system context, load memory
  }

async beforeToolCall(toolName, args) {
    // Validate, log, or modify tool calls
  }

async afterToolCall(toolName, result) {
    // Post-process, cache results
  }

async onStepFinish(step) {
    // Checkpoint progress
  }
}

```

### 7.1. Lifecycle hooks

```
graph LR
    A["beforeTurn()"] --> B["streamText()"]
    B --> C["beforeToolCall()"]
    C --> D["Tool Execution"]
    D --> E["afterToolCall()"]
    E --> F["onStepFinish()"]
    F -->|"Needs another tool"| C
    F -->|"Done"| G["onChatResponse()"]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff

```
The lifecycle of a turn inside a Think agent

## 8. Persistent memory — Agents that remember everything

Project Think implements memory via **Context Blocks** — structured sections the model can read and update, persisted across hibernation:

```text
MEMORY (Important facts about user) [42%, 462/1100 tokens]
- User prefers writing TypeScript
- Timezone: UTC+7
- Current project: e-commerce platform

PREFERENCES (Working style) [18%, 198/1100 tokens]
- Prefer functional over OOP
- Wants concise responses

```
Token usage is shown as a percentage, giving the agent the ability to self-manage its context window — it knows when to compact memory to avoid exceeding limits.

## 9. Self-authored extensions

This is the boldest feature: the agent **writes its own extensions**. Extensions are TypeScript code running inside Dynamic Workers, with declared permissions:

```typescript
// The agent generates an extension when a new capability is needed
const extension = {
  name: 'price-checker',
  permissions: {
    network: ['api.example.com'],  // Only this domain is reachable
    workspace: ['read']             // Read-only file access
  },
  code: `
    export async function checkPrice(symbol: string) {
      const res = await fetch('https://api.example.com/price/' + symbol);
      return res.json();
    }
  `
};

// Extensions persist in storage and survive hibernation
await this.installExtension(extension);

```

#### ⚠️ Security by architecture

Extensions run inside isolated V8 isolates; declared permissions are enforced at runtime. The agent can't grant itself more permissions — this is structural security, not prompt-based restrictions.

## 10. Pricing and the free tier

One of the most appealing parts of the Cloudflare AI Platform is its friendly pricing model:

$0 10,000 Neurons/day (Free)

$0.011 Per 1,000 Neurons (over the free limit)

$5/month Workers Paid plan

50,000 Concurrent workflow instances

### 10.1. Realistic cost estimates

| Scenario | Estimate | Cost/month |
| --- | --- | --- |
| **Side project / prototype** | ~5,000 text-generation requests/day | $0 (in the free tier) |
| **Startup (1,000 users)** | 50K requests/day, mixed models | ~$15–30 |
| **Enterprise (10K agents)** | 1% active, Durable Objects + inference | You only pay for ~100 active agents |

#### 💡 Cost comparison

## 11. Hands-on: your first agent with Project Think

### 11.1. Setup

```bash
# Initialize the project
npm create cloudflare@latest my-agent -- --template think
cd my-agent

# Install dependencies
npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider

```

### 11.2. A basic agent

```typescript
// src/index.ts
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';
import { tool } from 'ai';
import { z } from 'zod';

interface Env {
  AI: Ai;
}

export class BlogAssistant extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

getTools() {
    return {
      searchArticles: tool({
        description: 'Search articles by keyword',
        parameters: z.object({
          query: z.string().describe('Search keyword')
        }),
        execute: async ({ query }) => {
          // Agent logic here
          return { results: [] };
        }
      })
    };
  }
}

export default {
  fetch(request: Request, env: Env) {
    // Route to agent
  }
};

```

### 11.3. Deploy

```bash
# Deploy to Cloudflare
npx wrangler deploy

# The agent is live at the edge, 330+ locations
# Zero cold start, zero idle cost

```

## 12. When should you use the Cloudflare AI Platform?

| Use it ✅ | Think twice ⚠️ |
| --- | --- |
| Agents that need low latency globally | Workloads that need dedicated GPUs (heavy fine-tuning) |
| Thousands of agents/users, mostly idle | Massive models not yet on Workers AI |
| Multi-model workflows (needing failover) | You need a full Linux environment (not just a JS sandbox) |
| Rapid prototyping on the free tier | You already have a well-tuned Kubernetes cluster |
| Agents that need persistent state + memory | Compliance demands specific data residency |

## 13. Conclusion

Cloudflare AI Platform 2026 isn't just "adding AI to a CDN" — it's a complete infrastructure platform for the next generation of agents. With AI Gateway for unified inference, Workers AI to run models at the edge, Dynamic Workers to sandbox code execution, and Project Think to turn agents into durable infrastructure, Cloudflare is betting that the future of AI agents lives not on laptops or cloud VMs, but on **the global edge network**.

#### 📌 Quick recap

**AI Gateway** = unified inference layer for 14+ providers. **Workers AI** = 70+ models running at the edge. **Dynamic Workers** = V8 sandbox 100× faster than containers. **Project Think** = framework for durable, distributed agents with zero idle cost. Free tier = 10K neurons/day.

**References:**  
[Cloudflare's AI Platform: an inference layer designed for agents](https://blog.cloudflare.com/ai-platform/)  
[Project Think: building the next generation of AI agents on Cloudflare](https://blog.cloudflare.com/project-think/)  
[Workers AI Pricing — Cloudflare Docs](https://developers.cloudflare.com/workers-ai/platform/pricing/)  
[Workers AI Overview — Cloudflare Docs](https://developers.cloudflare.com/workers-ai/)  
[Cloudflare expands Agent Cloud — SiliconANGLE (04/2026)](https://siliconangle.com/2026/04/13/cloudflare-expands-agent-cloud-new-tools-build-scale-ai-agents/)

Comprehensive API Security 2026 — OWASP Top 10, JWT Hardening, and Defense in Depth

SignalR on .NET 10 — Real-Time Communication, Scale-Out, and Notification Push for Production

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.