Cloudflare AI Platform 2026 — Hạ tầng Edge cho AI Agent Serverless

Posted on: 4/17/2026 8:08:58 PM

Table of contents

1. Toàn cảnh Cloudflare AI Platform
2. AI Gateway — Unified Inference Layer
1. 2.1. Tính năng nổi bật
2. 2.2. Cách sử dụng trong Workers
  1. 💡 Điểm hay
3. Workers AI — Inference tại Edge
1. 3.1. Model Catalog
2. 3.2. Bring Your Own Model (BYOM)
  1. 📊 Free Tier
4. Dynamic Workers — Sandbox cho AI Agent
1. 4.1. So sánh với Container truyền thống
5. Project Think — Framework cho AI Agent thế hệ 3
1. 5.1. Ba thế hệ AI Agent
2. 5.2. Kiến trúc cốt lõi
6. Năm Primitive của Project Think
7. Think Base Class — Agentic Loop có sẵn
1. 7.1. Lifecycle Hooks
8. Persistent Memory — Agent nhớ mọi thứ
9. Self-Authored Extensions
1. ⚠️ Security by Architecture
10. Chi phí và Free Tier
1. 10.1. Ước tính chi phí thực tế
  1. 💡 So sánh chi phí
11. Hands-on: Tạo Agent đầu tiên với Project Think
12. Khi nào nên dùng Cloudflare AI Platform?
13. Kết luận
1. 📌 Tóm tắt nhanh

Khi AI Agent không còn là chatbot đơn giản mà trở thành hệ thống phân tán thực thi hàng triệu tác vụ song song, câu hỏi không còn là "dùng model nào?" mà là "chạy agent ở đâu, bằng cách nào, với chi phí bao nhiêu?". Cloudflare vừa đưa ra câu trả lời đầy tham vọng với AI Platform — một lớp hạ tầng hoàn chỉnh biến mạng lưới 330+ data center toàn cầu thành nền tảng chạy AI agent serverless, từ inference đến execution, từ đến persistent memory.

Bài viết này sẽ phân tích sâu kiến trúc Cloudflare AI Platform 2026, bao gồm AI Gateway, Workers AI, Dynamic Workers, và đặc biệt là Project Think — framework xây dựng AI agent thế hệ mới với durable execution, sub-agent orchestration, và zero idle cost.

1. Toàn cảnh Cloudflare AI Platform

330+ Data center toàn cầu

70+ AI model sẵn sàng

14+ Provider tích hợp

10,000 Neurons miễn phí/ngày

Cloudflare đã chuyển mình từ CDN/Security thành một full-stack AI infrastructure. Thay vì chỉ cache và bảo vệ traffic, mạng lưới edge giờ đây là nơi chạy inference, execute code, lưu trạng thái agent, và điều phối multi-model workflow — tất cả serverless.

2. AI Gateway — Unified Inference Layer

AI Gateway là lớp trung gian thống nhất giữa ứng dụng của bạn và bất kỳ AI model nào. Thay vì integrate trực tiếp với từng provider (OpenAI, Anthropic, Google...), bạn gọi qua một API duy nhất.

graph LR
    A["🖥️ Application"] --> B["AI Gateway"]
    B --> C["OpenAI"]
    B --> D["Anthropic"]
    B --> E["Google AI"]
    B --> F["Workers AI"]
    B --> G["Custom Model"]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

AI Gateway — một API, nhiều provider, automatic failover

2.1. Tính năng nổi bật

Tính năng	Mô tả	Lợi ích
Automatic Failover	Tự động chuyển sang provider backup khi model gặp sự cố	Uptime cao, không cần retry logic phức tạp
Streaming Resilience	Buffer streaming response độc lập với agent lifetime, cho phép reconnect	Không mất response khi network gián đoạn
Cost Attribution	Gắn custom metadata (team, user, workflow) vào mỗi request	Kiểm soát chi phí theo segment cụ thể
Unified Billing	Quản lý chi phí tất cả provider tại một nơi	Một dashboard, không cần aggregate nhiều invoice

2.2. Cách sử dụng trong Workers

// Gọi model qua AI binding — cùng API cho mọi provider
const response = await env.AI.run(
  'anthropic/claude-sonnet-4-6',
  { input: 'Phân tích kiến trúc microservices' },
  {
    gateway: { id: "default" },
    metadata: { teamId: "backend", userId: 12345 }
  }
);

💡 Điểm hay

Khi một model có sẵn trên nhiều provider (ví dụ Llama 3 trên Workers AI và Replicate), AI Gateway tự động route đến endpoint nhanh nhất và failover khi cần — không cần bạn viết logic retry.

3. Workers AI — Inference tại Edge

Workers AI cho phép chạy AI model trực tiếp trên mạng lưới edge của Cloudflare, tối ưu latency bằng cách đưa inference gần người dùng nhất có thể.

3.1. Model Catalog

Hệ sinh thái model ngày càng phong phú:

Loại model	Ví dụ	Chi phí (per M tokens/units)
Text Generation	Llama 3.2-1b, DeepSeek R1-32b	$0.027 – $4.88 output
Embedding	BGE-small, BGE-large	$0.020 – $0.204
Image Generation	Flux-1-Schnell, Flux-2-Dev	~$0.00005/tile
Speech-to-Text	Whisper	$0.0005/phút
Text-to-Speech	Deepgram Aura	$0.015/1k ký tự

3.2. Bring Your Own Model (BYOM)

Cloudflare tích hợp công nghệ Cog từ Replicate (đội ngũ Replicate đã chính thức gia nhập Cloudflare), cho phép đóng gói model custom vào container và deploy lên Workers AI:

# cog.yaml — đóng gói model fine-tuned
build:
  python_version: "3.11"
  python_packages:
    - "torch==2.3.0"
    - "transformers==4.42.0"

predict: "predict.py:Predictor"

📊 Free Tier

10,000 Neurons miễn phí mỗi ngày cho cả Free và Paid plan. Vượt ngưỡng chỉ tốn $0.011/1,000 Neurons. Với text generation nhỏ (Llama 3.2-1b), 10K neurons tương đương hàng nghìn request — đủ cho prototype và side project.

4. Dynamic Workers — Sandbox cho AI Agent

Đây là thành phần then chốt cho agent execution. Dynamic Workers là runtime dựa trên V8 isolate, khởi động nhanh gấp 100 lần container truyền thống, cho phép AI agent thực thi code an toàn trong môi trường .

graph TD
    A["AI Agent"] --> B["Generate Code"]
    B --> C["Dynamic Worker
V8 Isolate"]
    C --> D["Execute
Sandboxed"]
    D --> E{"Kết quả"}
    E -->|"✅ Thành công"| F["Trả về Agent"]
    E -->|"❌ Timeout/Error"| G["Agent retry
hoặc strategy khác"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

Dynamic Workers — agent viết code, isolate thực thi an toàn

4.1. So sánh với Container truyền thống

Tiêu chí	Container (Docker/K8s)	Dynamic Workers
Cold start	Hàng giây đến phút	Millisecond
Isolation	Process-level	V8 isolate (nhẹ hơn)
Chi phí idle	Trả tiền cả khi không dùng	Zero khi hibernated
Scale	Manual/HPA	Hàng triệu concurrent tự động
Security	Cần config network policy	Sandboxed by architecture

5. Project Think — Framework cho AI Agent thế hệ 3

Project Think là tầm nhìn lớn nhất của Cloudflare cho AI agent: không chỉ là framework, mà là kiến trúc hạ tầng biến agent thành first-class citizen trên edge network.

5.1. Ba thế hệ AI Agent

Thế hệ 1 — Chatbot

Stateless, reactive, không nhớ context. Mỗi request là một cuộc hội thoại mới.

Thế hệ 2 — Coding Agent

Stateful, sử dụng tool, nhưng chạy trên laptop/server đơn lẻ. Claude Code, Cursor, GitHub Copilot thuộc thế hệ này.

Thế hệ 3 — Infrastructure Agent

Durable, distributed, serverless, Internet-native. Sống sót qua crash, không tốn chi phí khi idle, bảo mật bằng kiến trúc thay vì behavior constraint. Project Think nhắm đến đây.

5.2. Kiến trúc cốt lõi

graph TB
    subgraph "Project Think Architecture"
        A["Think Base Class"] --> B["Durable Objects
Identity + State + SQLite"]
        A --> C["Dynamic Workers
Code Execution"]
        A --> D["AI Gateway
Multi-model Inference"]
        A --> E["R2 + SQLite
Persistent Filesystem"]

        B --> F["Fibers
Durable Execution"]
        B --> G["Facets
Sub-Agents"]
        B --> H["Sessions
Conversation Trees"]

        C --> I["Tier 0: Workspace"]
        C --> J["Tier 1: JS Sandbox"]
        C --> K["Tier 2: npm Runtime"]
        C --> L["Tier 3: Headless Browser"]
        C --> M["Tier 4: Full Sandbox"]
    end

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style J fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style K fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style L fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style M fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Kiến trúc Project Think — từ base class đến execution ladder

Mỗi agent là một Durable Object — có identity riêng, persistent state trong SQLite, và tự động hibernate khi không hoạt động. Điều này thay đổi hoàn toàn kinh tế học của agent:

Metric	VMs/Container	Durable Objects
Chi phí idle	Trả full compute 24/7	Zero (hibernated)
10,000 agents, 1% active	10,000 instances chạy	~100 active cùng lúc
Mô hình	"1 server cho N users"	"1 agent cho mỗi user"

6. Năm Primitive của Project Think

6.1. Fibers — Durable Execution

Agent sống sót qua crash nhờ cơ chế checkpoint-recovery. Mỗi runFiber() ghi checkpoint vào SQLite trước khi thực thi, nếu environment bị terminate, agent recover tự động:

class ResearchAgent extends Think<Env> {
  async onChat(message: string) {
    // Fiber đăng ký trong SQLite trước khi chạy
    await this.runFiber('research', async () => {
      const sources = await this.searchWeb(message);
      const analysis = await this.analyze(sources);
      return this.respond(analysis);
    });
  }

  // Nếu crash giữa chừng, fiber tự recover
  async onFiberRecovered(fiberId: string) {
    console.log(`Recovering fiber: ${fiberId}`);
    // Resume từ checkpoint cuối cùng
  }
}

Agent con là Durable Object riêng biệt với SQLite database riêng, giao tiếp qua typed RPC. Không có implicit data sharing — mỗi sub-agent hoàn toàn isolated:

// Parent agent phân công cho sub-agents
const researcher = this.createFacet('researcher');
const writer = this.createFacet('writer');

// Typed RPC — type-safe, isolated
const findings = await researcher.chat(
  'Tìm 5 bài viết mới nhất về edge computing',
  streamRelay
);

const draft = await writer.chat(
  `Viết tóm tắt dựa trên: ${findings}`,
  streamRelay
);

6.3. Persistent Sessions — Conversation Trees

Không chỉ lưu lịch sử tuyến tính, Project Think hỗ trợ tree-structured conversations: fork nhánh, non-destructive compaction, và full-text search qua SQLite FTS5.

6.4. Sandboxed Code Execution

Thay vì gọi tool từng bước (chat → call tool → chat → call tool), agent viết một chương trình hoàn chỉnh và chạy trong . Package @cloudflare/codemode đạt giảm 99.9% token so với cách gọi tool truyền thống.

6.5. Execution Ladder — Nâng cấp capability dần dần

Tier	Capability	Use case
Tier 0	Workspace (filesystem)	Đọc/ghi file, quản lý project
Tier 1	Dynamic Workers (JS )	Tính toán, transform data
Tier 2	npm resolution at runtime	Dùng thư viện NPM on-the-fly
Tier 3	Headless browser	Web scraping, automation
Tier 4	Full (git, compiler, test)	Build & deploy pipeline

💡 Progressive Capability

Execution Ladder cho phép agent bắt đầu nhẹ nhàng (Tier 0) và chỉ "leo thang" khi thực sự cần. Agent viết script đơn giản chạy Tier 1, cần fetch web thì lên Tier 3, cần build project thì mới dùng Tier 4. Tiết kiệm resource và giảm attack surface.

7. Think Base Class — Agentic Loop có sẵn

Project Think cung cấp base class Think xử lý toàn bộ lifecycle: agentic loop, message persistence, streaming, tool execution, và extensions.

import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';

export class MyAgent extends Think<Env> {
  // Chỉ cần khai báo model
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

  // Lifecycle hooks cho customization
  async beforeTurn(messages) {
    // Inject system context, load memory
  }

  async beforeToolCall(toolName, args) {
    // Validate, log, or modify tool calls
  }

  async afterToolCall(toolName, result) {
    // Post-process, cache results
  }

  async onStepFinish(step) {
    // Checkpoint progress
  }
}

7.1. Lifecycle Hooks

graph LR
    A["beforeTurn()"] --> B["streamText()"]
    B --> C["beforeToolCall()"]
    C --> D["Tool Execution"]
    D --> E["afterToolCall()"]
    E --> F["onStepFinish()"]
    F -->|"Cần tool khác"| C
    F -->|"Hoàn tất"| G["onChatResponse()"]

    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff

Lifecycle của một turn trong Think agent

8. Persistent Memory — Agent nhớ mọi thứ

Project Think implement memory qua Context Blocks — các section có cấu trúc mà model có thể đọc và cập nhật, persist qua hibernation:

MEMORY (Important facts about user) [42%, 462/1100 tokens]
- User thích viết code TypeScript
- Timezone: UTC+7
- Project hiện tại: e-commerce platform

PREFERENCES (Working style) [18%, 198/1100 tokens]
- Prefer functional over OOP
- Muốn response ngắn gọn

Token usage hiển thị dạng phần trăm, giúp agent tự quản lý context window — biết khi nào cần compact memory để không vượt giới hạn.

9. Self-Authored Extensions

Đây là tính năng táo bạo nhất: agent tự viết extension cho chính mình. Extension là TypeScript code chạy trong Dynamic Workers, với declared permissions:

// Agent tự tạo extension khi cần capability mới
const extension = {
  name: 'price-checker',
  permissions: {
    network: ['api.example.com'],  // Chỉ được truy cập domain này
    workspace: ['read']             // Chỉ đọc file
  },
  code: `
    export async function checkPrice(symbol: string) {
      const res = await fetch('https://api.example.com/price/' + symbol);
      return res.json();
    }
  `
};

// Extension persist trong storage, survive hibernation
await this.installExtension(extension);

⚠️ Security by Architecture

Extension chạy trong V8 isolate riêng biệt, declared permissions được enforce tại runtime. Agent không thể tự grant thêm quyền — đây là structural security, không phải prompt-based restriction.

10. Chi phí và Free Tier

Một trong những điểm hấp dẫn nhất của Cloudflare AI Platform là mô hình pricing thân thiện:

$0 10,000 Neurons/ngày (Free)

$0.011 Per 1,000 Neurons (vượt free)

$5/tháng Workers Paid plan

50,000 Concurrent workflow instances

10.1. Ước tính chi phí thực tế

Scenario	Ước tính	Chi phí/tháng
Side project / Prototype	~5,000 requests text generation/ngày	$0 (trong free tier)
Startup (1,000 users)	50K requests/ngày, mixed models	~$15–30
Enterprise (10K agents)	1% active, Durable Objects + inference	Chỉ trả cho ~100 active agents

💡 So sánh chi phí

Với mô hình container truyền thống, 10,000 agent instances chạy 24/7 trên EC2/GKE có thể tốn hàng nghìn USD/tháng. Durable Objects hibernate khi idle → chi phí gần zero cho 99% agents không hoạt động.

11. Hands-on: Tạo Agent đầu tiên với Project Think

11.1. Setup

# Khởi tạo project
npm create cloudflare@latest my-agent -- --template think
cd my-agent

# Cài dependencies
npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider

11.2. Agent cơ bản

// src/index.ts
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';
import { tool } from 'ai';
import { z } from 'zod';

interface Env {
  AI: Ai;
}

export class BlogAssistant extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

  getTools() {
    return {
      searchArticles: tool({
        description: 'Tìm kiếm bài viết theo keyword',
        parameters: z.object({
          query: z.string().describe('Từ khóa tìm kiếm')
        }),
        execute: async ({ query }) => {
          // Agent logic here
          return { results: [] };
        }
      })
    };
  }
}

export default {
  fetch(request: Request, env: Env) {
    // Route to agent
  }
};

11.3. Deploy

# Deploy lên Cloudflare
npx wrangler deploy

# Agent sẵn sàng tại edge, 330+ locations
# Zero cold start, zero idle cost

12. Khi nào nên dùng Cloudflare AI Platform?

Nên dùng ✅	Cân nhắc ⚠️
Agent cần low latency toàn cầu	Workload cần GPU dedicated (fine-tuning heavy)
Hàng nghìn agent/user, phần lớn idle	Model cực lớn chưa có trên Workers AI
Multi-model workflow (cần failover)	Cần full Linux environment (không chỉ JS )
Prototype nhanh với free tier	Đã có Kubernetes cluster tối ưu sẵn
Agent cần persistent state + memory	Compliance yêu cầu data residency cụ thể

13. Kết luận

Cloudflare AI Platform 2026 không chỉ là "thêm AI vào CDN" — đây là một nền tảng hạ tầng hoàn chỉnh cho thế hệ agent tiếp theo. Với AI Gateway thống nhất inference, Workers AI chạy model tại edge, Dynamic Workers code execution, và Project Think biến agent thành durable infrastructure — Cloudflare đang đặt cược rằng tương lai của AI agent không nằm trên laptop hay cloud VM, mà nằm trên mạng lưới edge toàn cầu.

Free tier 10,000 neurons/ngày đủ để bạn bắt đầu thử nghiệm ngay hôm nay, không cần thẻ tín dụng. Và khi scale lên, mô hình Durable Objects đảm bảo bạn chỉ trả tiền cho những gì thực sự chạy.

📌 Tóm tắt nhanh

AI Gateway = unified inference layer cho 14+ providers. Workers AI = chạy 70+ models tại edge. Dynamic Workers = V8 nhanh gấp 100x container. Project Think = framework xây agent durable, distributed, zero idle cost. Free tier = 10K neurons/ngày.

Nguồn tham khảo:
Cloudflare's AI Platform: an inference layer designed for agents
Project Think: building the next generation of AI agents on Cloudflare
Workers AI Pricing — Cloudflare Docs
Workers AI Overview — Cloudflare Docs
Cloudflare expands Agent Cloud — SiliconANGLE (04/2026)

#Cloudflare #AI Agent #Serverless #Edge Computing #Workers AI #system design

# Cloudflare AI Platform 2026 — Hạ tầng Edge cho AI Agent Serverless

Khi AI Agent không còn là chatbot đơn giản mà trở thành hệ thống phân tán thực thi hàng triệu tác vụ song song, câu hỏi không còn là *"dùng model nào?"* mà là *"chạy agent ở đâu, bằng cách nào, với chi phí bao nhiêu?"*. Cloudflare vừa đưa ra câu trả lời đầy tham vọng với **AI Platform** — một lớp hạ tầng hoàn chỉnh biến mạng lưới 330+ data center toàn cầu thành nền tảng chạy AI agent serverless, từ inference đến execution, từ sandbox đến persistent memory.

Bài viết này sẽ phân tích sâu kiến trúc Cloudflare AI Platform 2026, bao gồm AI Gateway, Workers AI, Dynamic Workers, và đặc biệt là **Project Think** — framework xây dựng AI agent thế hệ mới với durable execution, sub-agent orchestration, và zero idle cost.

## 1. Toàn cảnh Cloudflare AI Platform

330+ Data center toàn cầu

70+ AI model sẵn sàng

14+ Provider tích hợp

10,000 Neurons miễn phí/ngày

Cloudflare đã chuyển mình từ CDN/Security thành một **full-stack AI infrastructure**. Thay vì chỉ cache và bảo vệ traffic, mạng lưới edge giờ đây là nơi chạy inference, execute code, lưu trạng thái agent, và điều phối multi-model workflow — tất cả serverless.

## 2. AI Gateway — Unified Inference Layer

```
graph LR
    A["🖥️ Application"] --> B["AI Gateway"]
    B --> C["OpenAI"]
    B --> D["Anthropic"]
    B --> E["Google AI"]
    B --> F["Workers AI"]
    B --> G["Custom Model"]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

```
AI Gateway — một API, nhiều provider, automatic failover

### 2.1. Tính năng nổi bật

| Tính năng | Mô tả | Lợi ích |
| --- | --- | --- |
| **Automatic Failover** | Tự động chuyển sang provider backup khi model gặp sự cố | Uptime cao, không cần retry logic phức tạp |
| **Streaming Resilience** | Buffer streaming response độc lập với agent lifetime, cho phép reconnect | Không mất response khi network gián đoạn |
| **Cost Attribution** | Gắn custom metadata (team, user, workflow) vào mỗi request | Kiểm soát chi phí theo segment cụ thể |
| **Unified Billing** | Quản lý chi phí tất cả provider tại một nơi | Một dashboard, không cần aggregate nhiều invoice |

### 2.2. Cách sử dụng trong Workers

```typescript
// Gọi model qua AI binding — cùng API cho mọi provider
const response = await env.AI.run(
  'anthropic/claude-sonnet-4-6',
  { input: 'Phân tích kiến trúc microservices' },
  {
    gateway: { id: "default" },
    metadata: { teamId: "backend", userId: 12345 }
  }
);

```

#### 💡 Điểm hay

## 3. Workers AI — Inference tại Edge

Workers AI cho phép chạy AI model trực tiếp trên mạng lưới edge của Cloudflare, tối ưu latency bằng cách đưa inference gần người dùng nhất có thể.

### 3.1. Model Catalog

Hệ sinh thái model ngày càng phong phú:

| Loại model | Ví dụ | Chi phí (per M tokens/units) |
| --- | --- | --- |
| **Text Generation** | Llama 3.2-1b, DeepSeek R1-32b | $0.027 – $4.88 output |
| **Embedding** | BGE-small, BGE-large | $0.020 – $0.204 |
| **Image Generation** | Flux-1-Schnell, Flux-2-Dev | ~$0.00005/tile |
| **Speech-to-Text** | Whisper | $0.0005/phút |
| **Text-to-Speech** | Deepgram Aura | $0.015/1k ký tự |

### 3.2. Bring Your Own Model (BYOM)

Cloudflare tích hợp công nghệ **Cog** từ Replicate (đội ngũ Replicate đã chính thức gia nhập Cloudflare), cho phép đóng gói model custom vào container và deploy lên Workers AI:

```yaml
# cog.yaml — đóng gói model fine-tuned
build:
  python_version: "3.11"
  python_packages:
    - "torch==2.3.0"
    - "transformers==4.42.0"

predict: "predict.py:Predictor"

```

#### 📊 Free Tier

**10,000 Neurons miễn phí mỗi ngày** cho cả Free và Paid plan. Vượt ngưỡng chỉ tốn **$0.011/1,000 Neurons**. Với text generation nhỏ (Llama 3.2-1b), 10K neurons tương đương hàng nghìn request — đủ cho prototype và side project.

## 4. Dynamic Workers — Sandbox cho AI Agent

Đây là thành phần then chốt cho agent execution. Dynamic Workers là runtime dựa trên V8 isolate, khởi động **nhanh gấp 100 lần container truyền thống**, cho phép AI agent thực thi code an toàn trong môi trường sandbox.

```
graph TD
    A["AI Agent"] --> B["Generate Code"]
    B --> C["Dynamic Worker  
V8 Isolate"]
    C --> D["Execute  
Sandboxed"]
    D --> E{"Kết quả"}
    E -->|"✅ Thành công"| F["Trả về Agent"]
    E -->|"❌ Timeout/Error"| G["Agent retry  
hoặc strategy khác"]

style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

```
Dynamic Workers — agent viết code, isolate thực thi an toàn

### 4.1. So sánh với Container truyền thống

| Tiêu chí | Container (Docker/K8s) | Dynamic Workers |
| --- | --- | --- |
| **Cold start** | Hàng giây đến phút | Millisecond |
| **Isolation** | Process-level | V8 isolate (nhẹ hơn) |
| **Chi phí idle** | Trả tiền cả khi không dùng | Zero khi hibernated |
| **Scale** | Manual/HPA | Hàng triệu concurrent tự động |
| **Security** | Cần config network policy | Sandboxed by architecture |

## 5. Project Think — Framework cho AI Agent thế hệ 3

Project Think là tầm nhìn lớn nhất của Cloudflare cho AI agent: không chỉ là framework, mà là **kiến trúc hạ tầng** biến agent thành first-class citizen trên edge network.

### 5.1. Ba thế hệ AI Agent

Thế hệ 1 — Chatbot

Stateless, reactive, không nhớ context. Mỗi request là một cuộc hội thoại mới.

Thế hệ 2 — Coding Agent

Stateful, sử dụng tool, nhưng chạy trên laptop/server đơn lẻ. Claude Code, Cursor, GitHub Copilot thuộc thế hệ này.

Thế hệ 3 — Infrastructure Agent

**Durable, distributed, serverless, Internet-native.** Sống sót qua crash, không tốn chi phí khi idle, bảo mật bằng kiến trúc thay vì behavior constraint. Project Think nhắm đến đây.

### 5.2. Kiến trúc cốt lõi

```
graph TB
    subgraph "Project Think Architecture"
        A["Think Base Class"] --> B["Durable Objects  
Identity + State + SQLite"]
        A --> C["Dynamic Workers  
Code Execution"]
        A --> D["AI Gateway  
Multi-model Inference"]
        A --> E["R2 + SQLite  
Persistent Filesystem"]

B --> F["Fibers  
Durable Execution"]
        B --> G["Facets  
Sub-Agents"]
        B --> H["Sessions  
Conversation Trees"]

C --> I["Tier 0: Workspace"]
        C --> J["Tier 1: JS Sandbox"]
        C --> K["Tier 2: npm Runtime"]
        C --> L["Tier 3: Headless Browser"]
        C --> M["Tier 4: Full Sandbox"]
    end

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style H fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style J fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style K fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style L fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style M fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

```
Kiến trúc Project Think — từ base class đến execution ladder

Mỗi agent là một **Durable Object** — có identity riêng, persistent state trong SQLite, và tự động hibernate khi không hoạt động. Điều này thay đổi hoàn toàn kinh tế học của agent:

| Metric | VMs/Container | Durable Objects |
| --- | --- | --- |
| **Chi phí idle** | Trả full compute 24/7 | Zero (hibernated) |
| **10,000 agents, 1% active** | 10,000 instances chạy | ~100 active cùng lúc |
| **Mô hình** | "1 server cho N users" | "1 agent cho mỗi user" |

## 6. Năm Primitive của Project Think

### 6.1. Fibers — Durable Execution

Agent sống sót qua crash nhờ cơ chế checkpoint-recovery. Mỗi `runFiber()` ghi checkpoint vào SQLite trước khi thực thi, nếu environment bị terminate, agent recover tự động:

```typescript
class ResearchAgent extends Think<Env> {
  async onChat(message: string) {
    // Fiber đăng ký trong SQLite trước khi chạy
    await this.runFiber('research', async () => {
      const sources = await this.searchWeb(message);
      const analysis = await this.analyze(sources);
      return this.respond(analysis);
    });
  }

// Nếu crash giữa chừng, fiber tự recover
  async onFiberRecovered(fiberId: string) {
    console.log(`Recovering fiber: ${fiberId}`);
    // Resume từ checkpoint cuối cùng
  }
}

```

### 6.2. Facets — Sub-Agent Orchestration

Agent con là Durable Object riêng biệt với SQLite database riêng, giao tiếp qua typed RPC. Không có implicit data sharing — mỗi sub-agent hoàn toàn isolated:

```typescript
// Parent agent phân công cho sub-agents
const researcher = this.createFacet('researcher');
const writer = this.createFacet('writer');

// Typed RPC — type-safe, isolated
const findings = await researcher.chat(
  'Tìm 5 bài viết mới nhất về edge computing',
  streamRelay
);

const draft = await writer.chat(
  `Viết tóm tắt dựa trên: ${findings}`,
  streamRelay
);

```

### 6.3. Persistent Sessions — Conversation Trees

Không chỉ lưu lịch sử tuyến tính, Project Think hỗ trợ **tree-structured conversations**: fork nhánh, non-destructive compaction, và full-text search qua SQLite FTS5.

### 6.4. Sandboxed Code Execution

Thay vì gọi tool từng bước (chat → call tool → chat → call tool), agent viết một chương trình hoàn chỉnh và chạy trong sandbox. Package `@cloudflare/codemode` đạt **giảm 99.9% token** so với cách gọi tool truyền thống.

### 6.5. Execution Ladder — Nâng cấp capability dần dần

| Tier | Capability | Use case |
| --- | --- | --- |
| **Tier 0** | Workspace (filesystem) | Đọc/ghi file, quản lý project |
| **Tier 1** | Dynamic Workers (JS sandbox) | Tính toán, transform data |
| **Tier 2** | npm resolution at runtime | Dùng thư viện NPM on-the-fly |
| **Tier 3** | Headless browser | Web scraping, automation |
| **Tier 4** | Full sandbox (git, compiler, test) | Build & deploy pipeline |

#### 💡 Progressive Capability

## 7. Think Base Class — Agentic Loop có sẵn

Project Think cung cấp base class `Think` xử lý toàn bộ lifecycle: agentic loop, message persistence, streaming, tool execution, và extensions.

```typescript
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';

export class MyAgent extends Think<Env> {
  // Chỉ cần khai báo model
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

// Lifecycle hooks cho customization
  async beforeTurn(messages) {
    // Inject system context, load memory
  }

async beforeToolCall(toolName, args) {
    // Validate, log, or modify tool calls
  }

async afterToolCall(toolName, result) {
    // Post-process, cache results
  }

async onStepFinish(step) {
    // Checkpoint progress
  }
}

```

### 7.1. Lifecycle Hooks

```
graph LR
    A["beforeTurn()"] --> B["streamText()"]
    B --> C["beforeToolCall()"]
    C --> D["Tool Execution"]
    D --> E["afterToolCall()"]
    E --> F["onStepFinish()"]
    F -->|"Cần tool khác"| C
    F -->|"Hoàn tất"| G["onChatResponse()"]

style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff

```
Lifecycle của một turn trong Think agent

## 8. Persistent Memory — Agent nhớ mọi thứ

Project Think implement memory qua **Context Blocks** — các section có cấu trúc mà model có thể đọc và cập nhật, persist qua hibernation:

```text
MEMORY (Important facts about user) [42%, 462/1100 tokens]
- User thích viết code TypeScript
- Timezone: UTC+7
- Project hiện tại: e-commerce platform

PREFERENCES (Working style) [18%, 198/1100 tokens]
- Prefer functional over OOP
- Muốn response ngắn gọn

```
Token usage hiển thị dạng phần trăm, giúp agent tự quản lý context window — biết khi nào cần compact memory để không vượt giới hạn.

## 9. Self-Authored Extensions

Đây là tính năng táo bạo nhất: agent **tự viết extension cho chính mình**. Extension là TypeScript code chạy trong Dynamic Workers, với declared permissions:

```typescript
// Agent tự tạo extension khi cần capability mới
const extension = {
  name: 'price-checker',
  permissions: {
    network: ['api.example.com'],  // Chỉ được truy cập domain này
    workspace: ['read']             // Chỉ đọc file
  },
  code: `
    export async function checkPrice(symbol: string) {
      const res = await fetch('https://api.example.com/price/' + symbol);
      return res.json();
    }
  `
};

// Extension persist trong storage, survive hibernation
await this.installExtension(extension);

```

#### ⚠️ Security by Architecture

## 10. Chi phí và Free Tier

Một trong những điểm hấp dẫn nhất của Cloudflare AI Platform là mô hình pricing thân thiện:

$0 10,000 Neurons/ngày (Free)

$0.011 Per 1,000 Neurons (vượt free)

$5/tháng Workers Paid plan

50,000 Concurrent workflow instances

### 10.1. Ước tính chi phí thực tế

| Scenario | Ước tính | Chi phí/tháng |
| --- | --- | --- |
| **Side project / Prototype** | ~5,000 requests text generation/ngày | $0 (trong free tier) |
| **Startup (1,000 users)** | 50K requests/ngày, mixed models | ~$15–30 |
| **Enterprise (10K agents)** | 1% active, Durable Objects + inference | Chỉ trả cho ~100 active agents |

#### 💡 So sánh chi phí

## 11. Hands-on: Tạo Agent đầu tiên với Project Think

### 11.1. Setup

```bash
# Khởi tạo project
npm create cloudflare@latest my-agent -- --template think
cd my-agent

# Cài dependencies
npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider

```

### 11.2. Agent cơ bản

```typescript
// src/index.ts
import { Think } from '@cloudflare/think';
import { createWorkersAI } from 'workers-ai-provider';
import { tool } from 'ai';
import { z } from 'zod';

interface Env {
  AI: Ai;
}

export class BlogAssistant extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      '@cf/meta/llama-4-scout-17b-16e'
    );
  }

getTools() {
    return {
      searchArticles: tool({
        description: 'Tìm kiếm bài viết theo keyword',
        parameters: z.object({
          query: z.string().describe('Từ khóa tìm kiếm')
        }),
        execute: async ({ query }) => {
          // Agent logic here
          return { results: [] };
        }
      })
    };
  }
}

export default {
  fetch(request: Request, env: Env) {
    // Route to agent
  }
};

```

### 11.3. Deploy

```bash
# Deploy lên Cloudflare
npx wrangler deploy

# Agent sẵn sàng tại edge, 330+ locations
# Zero cold start, zero idle cost

```

## 12. Khi nào nên dùng Cloudflare AI Platform?

| Nên dùng ✅ | Cân nhắc ⚠️ |
| --- | --- |
| Agent cần low latency toàn cầu | Workload cần GPU dedicated (fine-tuning heavy) |
| Hàng nghìn agent/user, phần lớn idle | Model cực lớn chưa có trên Workers AI |
| Multi-model workflow (cần failover) | Cần full Linux environment (không chỉ JS sandbox) |
| Prototype nhanh với free tier | Đã có Kubernetes cluster tối ưu sẵn |
| Agent cần persistent state + memory | Compliance yêu cầu data residency cụ thể |

## 13. Kết luận

Cloudflare AI Platform 2026 không chỉ là "thêm AI vào CDN" — đây là một nền tảng hạ tầng hoàn chỉnh cho thế hệ agent tiếp theo. Với AI Gateway thống nhất inference, Workers AI chạy model tại edge, Dynamic Workers sandbox code execution, và Project Think biến agent thành durable infrastructure — Cloudflare đang đặt cược rằng tương lai của AI agent không nằm trên laptop hay cloud VM, mà nằm trên **mạng lưới edge toàn cầu**.

#### 📌 Tóm tắt nhanh

**AI Gateway** = unified inference layer cho 14+ providers. **Workers AI** = chạy 70+ models tại edge. **Dynamic Workers** = V8 sandbox nhanh gấp 100x container. **Project Think** = framework xây agent durable, distributed, zero idle cost. Free tier = 10K neurons/ngày.

**Nguồn tham khảo:**  
[Cloudflare's AI Platform: an inference layer designed for agents](https://blog.cloudflare.com/ai-platform/)  
[Project Think: building the next generation of AI agents on Cloudflare](https://blog.cloudflare.com/project-think/)  
[Workers AI Pricing — Cloudflare Docs](https://developers.cloudflare.com/workers-ai/platform/pricing/)  
[Workers AI Overview — Cloudflare Docs](https://developers.cloudflare.com/workers-ai/)  
[Cloudflare expands Agent Cloud — SiliconANGLE (04/2026)](https://siliconangle.com/2026/04/13/cloudflare-expands-agent-cloud-new-tools-build-scale-ai-agents/)

Bảo mật API toàn diện 2026 — OWASP Top 10, JWT Hardening và Defense in Depth

SignalR trên .NET 10 — Real-Time Communication, Scale-Out và Notification Push cho Production

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.