Cloudflare AI Platform 2026 — Edge Infrastructure for Serverless AI Agents
Posted on: 4/17/2026 8:08:58 PM
Table of contents
- 1. A tour of the Cloudflare AI Platform
- 2. AI Gateway — A unified inference layer
- 3. Workers AI — Inference at the edge
- 4. Dynamic Workers — Sandbox for AI agents
- 5. Project Think — A framework for 3rd-generation AI agents
- 6. Project Think's five primitives
- 7. The Think base class — a built-in agentic loop
- 8. Persistent memory — Agents that remember everything
- 9. Self-authored extensions
- 10. Pricing and the free tier
- 11. Hands-on: your first agent with Project Think
- 12. When should you use the Cloudflare AI Platform?
- 13. Conclusion
When AI agents stop being simple chatbots and turn into distributed systems executing millions of tasks in parallel, the question is no longer "which model do we use?" but "where does the agent run, how, and at what cost?". Cloudflare has just delivered an ambitious answer with its AI Platform — a complete infrastructure layer that turns a global network of 330+ data centers into a runtime for serverless AI agents, from inference to execution, from to persistent memory.
This article dives deep into the Cloudflare AI Platform 2026 architecture, including AI Gateway, Workers AI, Dynamic Workers, and especially Project Think — the framework for building next-generation AI agents with durable execution, sub-agent orchestration, and zero idle cost.
1. A tour of the Cloudflare AI Platform
Cloudflare has evolved from a CDN/security vendor into a full-stack AI infrastructure. Instead of just caching and protecting traffic, the edge network is now where inference runs, where code executes, where agent state lives, and where multi-model workflows are orchestrated — all serverless.
2. AI Gateway — A unified inference layer
AI Gateway is the unified middle layer between your application and any AI model. Instead of integrating directly with each provider (OpenAI, Anthropic, Google, …), you call everything through one API.
graph LR
A["🖥️ Application"] --> B["AI Gateway"]
B --> C["OpenAI"]
B --> D["Anthropic"]
B --> E["Google AI"]
B --> F["Workers AI"]
B --> G["Custom Model"]
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B fill:#e94560,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style D fill:#2c3e50,stroke:#fff,color:#fff
style E fill:#2c3e50,stroke:#fff,color:#fff
style F fill:#2c3e50,stroke:#fff,color:#fff
style G fill:#2c3e50,stroke:#fff,color:#fff
AI Gateway — one API, many providers, automatic failover
2.1. Standout features
| Feature | Description | Benefit |
|---|---|---|
| Automatic failover | Automatically switches to a backup provider when a model goes down | High uptime, no complex retry logic |
| Streaming resilience | Buffers the streaming response independently of the agent's lifetime, allowing reconnects | No lost responses during network interruptions |
| Cost attribution | Attach custom metadata (team, user, workflow) to every request | Segment-level cost control |
| Unified billing | Manage spend across every provider in one place | One dashboard, no need to aggregate multiple invoices |
2.2. Usage inside Workers
// Call a model through the AI binding — same API for every provider
const response = await env.AI.run(
'anthropic/claude-sonnet-4-6',
{ input: 'Analyze microservices architecture' },
{
gateway: { id: "default" },
metadata: { teamId: "backend", userId: 12345 }
}
);
💡 The nice bit
When a model is available on multiple providers (e.g. Llama 3 on both Workers AI and Replicate), AI Gateway automatically routes to the fastest endpoint and fails over when needed — no retry logic required.
3. Workers AI — Inference at the edge
Workers AI lets you run AI models directly on Cloudflare's edge network, minimizing latency by pushing inference as close to the user as possible.
3.1. Model catalog
The model ecosystem keeps growing:
| Model type | Examples | Cost (per M tokens/units) |
|---|---|---|
| Text generation | Llama 3.2-1b, DeepSeek R1-32b | $0.027 – $4.88 output |
| Embedding | BGE-small, BGE-large | $0.020 – $0.204 |
| Image generation | Flux-1-Schnell, Flux-2-Dev | ~$0.00005/tile |
| Speech-to-Text | Whisper | $0.0005/minute |
| Text-to-Speech | Deepgram Aura | $0.015/1k chars |
3.2. Bring Your Own Model (BYOM)
Cloudflare integrates Cog technology from Replicate (the Replicate team officially joined Cloudflare), letting you package custom models into containers and deploy them onto Workers AI:
# cog.yaml — packaging a fine-tuned model
build:
python_version: "3.11"
python_packages:
- "torch==2.3.0"
- "transformers==4.42.0"
predict: "predict.py:Predictor"
📊 Free tier
10,000 Neurons free every day on both Free and Paid plans. Beyond that it's just $0.011/1,000 Neurons. For small text-generation workloads (Llama 3.2-1b), 10K neurons translates to thousands of requests — plenty for prototypes and side projects.
4. Dynamic Workers — Sandbox for AI agents
This is the keystone of agent execution. Dynamic Workers is a V8-isolate-based runtime that starts up 100× faster than traditional containers, letting AI agents safely execute code in a .
graph TD
A["AI Agent"] --> B["Generate Code"]
B --> C["Dynamic Worker
V8 Isolate"]
C --> D["Execute
Sandboxed"]
D --> E{"Result"}
E -->|"✅ Success"| F["Return to Agent"]
E -->|"❌ Timeout/Error"| G["Agent retries
or switches strategy"]
style A fill:#e94560,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style F fill:#4CAF50,stroke:#fff,color:#fff
style G fill:#ff9800,stroke:#fff,color:#fff
Dynamic Workers — agent writes code, the isolate executes it safely
4.1. Compared with traditional containers
| Criterion | Container (Docker/K8s) | Dynamic Workers |
|---|---|---|
| Cold start | Seconds to minutes | Milliseconds |
| Isolation | Process-level | V8 isolate (lighter) |
| Idle cost | You pay even when unused | Zero when hibernated |
| Scale | Manual/HPA | Millions of concurrent — automatic |
| Security | Needs network policy config | Sandboxed by architecture |
5. Project Think — A framework for 3rd-generation AI agents
Project Think is Cloudflare's most ambitious vision for AI agents: not just a framework, but an infrastructure architecture that makes agents first-class citizens of the edge network.
5.1. The three generations of AI agents
5.2. Core architecture
graph TB
subgraph "Project Think Architecture"
A["Think Base Class"] --> B["Durable Objects
Identity + State + SQLite"]
A --> C["Dynamic Workers
Code Execution"]
A --> D["AI Gateway
Multi-model Inference"]
A --> E["R2 + SQLite
Persistent Filesystem"]
B --> F["Fibers
Durable Execution"]
B --> G["Facets
Sub-Agents"]
B --> H["Sessions
Conversation Trees"]
C --> I["Tier 0: Workspace"]
C --> J["Tier 1: JS Sandbox"]
C --> K["Tier 2: npm Runtime"]
C --> L["Tier 3: Headless Browser"]
C --> M["Tier 4: Full Sandbox"]
end
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style D fill:#2c3e50,stroke:#fff,color:#fff
style E fill:#2c3e50,stroke:#fff,color:#fff
style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style J fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style K fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style L fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style M fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
Project Think architecture — from the base class to the execution ladder
Every agent is a Durable Object — with its own identity, persistent state in SQLite, and automatic hibernation when idle. This completely rewrites agent economics:
| Metric | VMs/Containers | Durable Objects |
|---|---|---|
| Idle cost | Full compute 24/7 | Zero (hibernated) |
| 10,000 agents, 1% active | 10,000 running instances | ~100 active at once |
| Model | "1 server for N users" | "1 agent per user" |
6. Project Think's five primitives
6.1. Fibers — Durable execution
Agents survive crashes thanks to a checkpoint-and-recover mechanism. Every runFiber() call writes a checkpoint to SQLite before running; if the environment gets terminated, the agent recovers automatically:
class ResearchAgent extends Think<Env> {
async onChat(message: string) {
// The fiber is registered in SQLite before it runs
await this.runFiber('research', async () => {
const sources = await this.searchWeb(message);
const analysis = await this.analyze(sources);
return this.respond(analysis);
});
}
// If it crashes mid-way, the fiber recovers itself
async onFiberRecovered(fiberId: string) {
console.log(`Recovering fiber: ${fiberId}`);
// Resume from the last checkpoint
}
}
6.2. Facets — Sub-agent orchestration
A sub-agent is a separate Durable Object with its own SQLite database, communicating via typed RPC. There's no implicit data sharing — every sub-agent is fully isolated:
// The parent agent delegates to sub-agents
const researcher = this.createFacet('researcher');
const writer = this.createFacet('writer');
// Typed RPC — type-safe, isolated
const findings = await researcher.chat(
'Find 5 recent articles about edge computing',
streamRelay
);
const draft = await writer.chat(
`Write a summary based on: ${findings}`,
streamRelay
);
6.3. Persistent Sessions — Conversation trees
Instead of only storing linear history, Project Think supports tree-structured conversations: fork branches, non-destructive compaction, and full-text search via SQLite FTS5.
6.4. Sandboxed code execution
Instead of calling tools step by step (chat → call tool → chat → call tool), the agent writes a complete program and runs it in a . The @cloudflare/codemode package reports a 99.9% token reduction over traditional tool-calling.
6.5. Execution Ladder — Gradual capability escalation
| Tier | Capability | Use case |
|---|---|---|
| Tier 0 | Workspace (filesystem) | Read/write files, manage projects |
| Tier 1 | Dynamic Workers (JS ) | Computation, data transformation |
| Tier 2 | npm resolution at runtime | Using NPM packages on the fly |
| Tier 3 | Headless browser | Web scraping, automation |
| Tier 4 | Full (git, compiler, tests) | Build & deploy pipelines |
💡 Progressive capability
The Execution Ladder lets an agent start small (Tier 0) and only "climb" when truly needed. Agent writing a simple script runs on Tier 1, needs to fetch web pages → Tier 3, needs to build a project → Tier 4 only then. You save resources and shrink the attack surface.
7. The Think base class — a built-in agentic loop
Project Think ships the Think base class that handles the full lifecycle: agentic loop, message persistence, streaming, tool execution, and extensions.
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';
export class MyAgent extends Think<Env> {
// Just declare the model
getModel() {
return createWorkersAI({ binding: this.env.AI })(
'@cf/meta/llama-4-scout-17b-16e'
);
}
// Lifecycle hooks for customization
async beforeTurn(messages) {
// Inject system context, load memory
}
async beforeToolCall(toolName, args) {
// Validate, log, or modify tool calls
}
async afterToolCall(toolName, result) {
// Post-process, cache results
}
async onStepFinish(step) {
// Checkpoint progress
}
}
7.1. Lifecycle hooks
graph LR
A["beforeTurn()"] --> B["streamText()"]
B --> C["beforeToolCall()"]
C --> D["Tool Execution"]
D --> E["afterToolCall()"]
E --> F["onStepFinish()"]
F -->|"Needs another tool"| C
F -->|"Done"| G["onChatResponse()"]
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B fill:#e94560,stroke:#fff,color:#fff
style D fill:#2c3e50,stroke:#fff,color:#fff
style G fill:#4CAF50,stroke:#fff,color:#fff
The lifecycle of a turn inside a Think agent
8. Persistent memory — Agents that remember everything
Project Think implements memory via Context Blocks — structured sections the model can read and update, persisted across hibernation:
MEMORY (Important facts about user) [42%, 462/1100 tokens]
- User prefers writing TypeScript
- Timezone: UTC+7
- Current project: e-commerce platform
PREFERENCES (Working style) [18%, 198/1100 tokens]
- Prefer functional over OOP
- Wants concise responses
Token usage is shown as a percentage, giving the agent the ability to self-manage its context window — it knows when to compact memory to avoid exceeding limits.
9. Self-authored extensions
This is the boldest feature: the agent writes its own extensions. Extensions are TypeScript code running inside Dynamic Workers, with declared permissions:
// The agent generates an extension when a new capability is needed
const extension = {
name: 'price-checker',
permissions: {
network: ['api.example.com'], // Only this domain is reachable
workspace: ['read'] // Read-only file access
},
code: `
export async function checkPrice(symbol: string) {
const res = await fetch('https://api.example.com/price/' + symbol);
return res.json();
}
`
};
// Extensions persist in storage and survive hibernation
await this.installExtension(extension);
⚠️ Security by architecture
Extensions run inside isolated V8 isolates; declared permissions are enforced at runtime. The agent can't grant itself more permissions — this is structural security, not prompt-based restrictions.
10. Pricing and the free tier
One of the most appealing parts of the Cloudflare AI Platform is its friendly pricing model:
10.1. Realistic cost estimates
| Scenario | Estimate | Cost/month |
|---|---|---|
| Side project / prototype | ~5,000 text-generation requests/day | $0 (in the free tier) |
| Startup (1,000 users) | 50K requests/day, mixed models | ~$15–30 |
| Enterprise (10K agents) | 1% active, Durable Objects + inference | You only pay for ~100 active agents |
💡 Cost comparison
With the traditional container model, running 10,000 agent instances 24/7 on EC2/GKE can cost thousands of dollars a month. Durable Objects hibernate when idle → near-zero cost for the 99% of agents that aren't active.
11. Hands-on: your first agent with Project Think
11.1. Setup
# Initialize the project
npm create cloudflare@latest my-agent -- --template think
cd my-agent
# Install dependencies
npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider
11.2. A basic agent
// src/index.ts
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';
import { tool } from 'ai';
import { z } from 'zod';
interface Env {
AI: Ai;
}
export class BlogAssistant extends Think<Env> {
getModel() {
return createWorkersAI({ binding: this.env.AI })(
'@cf/meta/llama-4-scout-17b-16e'
);
}
getTools() {
return {
searchArticles: tool({
description: 'Search articles by keyword',
parameters: z.object({
query: z.string().describe('Search keyword')
}),
execute: async ({ query }) => {
// Agent logic here
return { results: [] };
}
})
};
}
}
export default {
fetch(request: Request, env: Env) {
// Route to agent
}
};
11.3. Deploy
# Deploy to Cloudflare
npx wrangler deploy
# The agent is live at the edge, 330+ locations
# Zero cold start, zero idle cost
12. When should you use the Cloudflare AI Platform?
| Use it ✅ | Think twice ⚠️ |
|---|---|
| Agents that need low latency globally | Workloads that need dedicated GPUs (heavy fine-tuning) |
| Thousands of agents/users, mostly idle | Massive models not yet on Workers AI |
| Multi-model workflows (needing failover) | You need a full Linux environment (not just a JS ) |
| Rapid prototyping on the free tier | You already have a well-tuned Kubernetes cluster |
| Agents that need persistent state + memory | Compliance demands specific data residency |
13. Conclusion
Cloudflare AI Platform 2026 isn't just "adding AI to a CDN" — it's a complete infrastructure platform for the next generation of agents. With AI Gateway for unified inference, Workers AI to run models at the edge, Dynamic Workers to code execution, and Project Think to turn agents into durable infrastructure, Cloudflare is betting that the future of AI agents lives not on laptops or cloud VMs, but on the global edge network.
The 10,000-neurons-per-day free tier is enough to start experimenting today — no credit card required. And as you scale, the Durable Objects model guarantees you only pay for what's actually running.
📌 Quick recap
AI Gateway = unified inference layer for 14+ providers. Workers AI = 70+ models running at the edge. Dynamic Workers = V8 100× faster than containers. Project Think = framework for durable, distributed agents with zero idle cost. Free tier = 10K neurons/day.
References:
Cloudflare's AI Platform: an inference layer designed for agents
Project Think: building the next generation of AI agents on Cloudflare
Workers AI Pricing — Cloudflare Docs
Workers AI Overview — Cloudflare Docs
Cloudflare expands Agent Cloud — SiliconANGLE (04/2026)
Comprehensive API Security 2026 — OWASP Top 10, JWT Hardening, and Defense in Depth
SignalR on .NET 10 — Real-Time Communication, Scale-Out, and Notification Push for Production
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.