WebGPU — Kỷ Nguyên Mới Của GPU Computing Trên Trình Duyệt

Posted on: 4/25/2026 8:13:39 AM

Table of contents

Mục lục
1. WebGPU là gì và tại sao nó thay đổi cuộc chơi?
1. Tại sao developer cần quan tâm?
2. Từ WebGL đến WebGPU — 15 năm tiến hóa
1. Bước nhảy kiến trúc
3. Kiến trúc kỹ thuật của WebGPU
1. 3.1. Các thành phần cốt lõi
2. 3.2. Command Buffer Model
  1. Lợi ích của Command Buffer Model
4. WGSL — Ngôn ngữ Shader thế hệ mới
1. 4.1. Cú pháp cơ bản
2. 4.2. Compute Shader
5. Render Pipeline vs Compute Pipeline
6. Benchmark: WebGPU vs WebGL — Số liệu thực tế
1. Lưu ý về benchmark
2. 6.1. Tại sao WebGPU nhanh hơn?
7. AI/ML Inference ngay trên trình duyệt
1. 7.1. Các framework hỗ trợ
2. 7.2. Ví dụ: Chạy text inference với WebLLM
  1. Chi phí = 0
8. Data Visualization và Game Development
1. 8.1. Data Visualization quy mô lớn
2. 8.2. Game Development
  1. WebGPU + WebAssembly = Native-like performance
9. Tích hợp với Framework phổ biến
10. Hướng dẫn bắt đầu với WebGPU
1. 10.1. Khởi tạo cơ bản
2. 10.2. Ví dụ hoàn chỉnh: Compute Pipeline
11. Thách thức và hướng đi tương lai
1. 11.1. Thách thức hiện tại
2. 11.2. WebGPU v2 — Đang phát triển
  1. Lời khuyên cho developer

1. WebGPU là gì và tại sao nó thay đổi cuộc chơi?

WebGPU là API đồ họa và tính toán thế hệ mới dành cho web, được thiết kế bởi W3C GPU for the Web Working Group với sự tham gia của Google, Mozilla, Apple và Microsoft. Khác với WebGL — vốn là lớp wrapper trên OpenGL ES đã ra đời từ hơn 15 năm trước — WebGPU được xây dựng từ đầu dựa trên kiến trúc của các API GPU hiện đại: Vulkan, Metal và Direct3D 12.

Điểm đột phá cốt lõi: WebGPU không chỉ là API render đồ họa, mà còn cung cấp compute pipeline hoàn chỉnh — cho phép chạy tính toán tổng quát (GPGPU) trực tiếp trên GPU thông qua trình duyệt. Điều này mở ra khả năng chạy machine learning inference, xử lý hình ảnh, mô phỏng vật lý, và phân tích dữ liệu lớn ngay trên client mà không cần server.

70% Browser support toàn cầu (2026)

15-30x Nhanh hơn WebGL cho compute workload

80% Hiệu năng native cho AI inference

65% Ứng dụng web mới đã dùng WebGPU

Tại sao developer cần quan tâm?

WebGPU không thay thế hoàn toàn WebGL ngay lập tức — vẫn còn 30% thiết bị cần fallback. Nhưng với việc tất cả trình duyệt lớn đã ship WebGPU từ đầu 2026, đây là thời điểm bắt đầu áp dụng. Nếu ứng dụng của bạn liên quan đến visualisation, AI trên client, hay bất kỳ xử lý dữ liệu nặng nào, WebGPU mang lại lợi thế hiệu năng không thể bỏ qua.

2. Từ WebGL đến WebGPU — 15 năm tiến hóa

Hành trình từ WebGL 1.0 đến WebGPU là câu chuyện về sự tiến hóa tất yếu khi nhu cầu tính toán trên web vượt xa khả năng của OpenGL ES.

2011

WebGL 1.0 ra mắt — dựa trên OpenGL ES 2.0. Lần đầu web truy cập được GPU nhưng bị giới hạn ở render pipeline duy nhất.

2017

WebGL 2.0 — nâng lên OpenGL ES 3.0 với transform feedback, instanced rendering. Nhưng vẫn thiếu compute shader — mọi GPGPU phải "hack" qua fragment shader.

2017-2018

Apple, Google, Mozilla bắt đầu thiết kế API GPU mới. Apple đề xuất WebMetal (dựa trên Metal), Google đề xuất WebGPU dựa trên Vulkan/D3D12.

2023

Chrome 113 ship WebGPU đầu tiên trên Windows, macOS, ChromeOS. Thời kỳ Origin Trial kết thúc.

01/2026

Firefox 147 bật WebGPU trên Windows và ARM64 macOS. Safari ship mặc định trên iOS 26, macOS Tahoe 26. Tất cả trình duyệt lớn đã hỗ trợ.

Q1/2026

WebGPU v2 bắt đầu phát triển — hướng tới subgroup operations, bindless resources, multi-draw indirect.

Bước nhảy kiến trúc

WebGL bắt buộc dùng mô hình immediate-mode — mỗi lệnh draw gửi trực tiếp đến driver, CPU phải chờ GPU xong mới tiếp tục. WebGPU chuyển sang command buffer model — ghi lệnh vào buffer trước, rồi submit cả batch. Đây chính xác là cách Vulkan, Metal và D3D12 hoạt động.

3. Kiến trúc kỹ thuật của WebGPU

WebGPU được thiết kế theo mô hình low-level với explicit state management, cho phép developer kiểm soát chi tiết cách GPU xử lý dữ liệu.

graph TB
    subgraph Browser["Trình duyệt"]
        JS["JavaScript / WASM
Application Code"]
        API["WebGPU API
(navigator.gpu)"]
    end

    subgraph Abstraction["Abstraction Layer"]
        Dawn["Dawn (Chrome)
C++ implementation"]
        wgpu["wgpu (Firefox)
Rust implementation"]
        WebKit["WebKit GPU Process
(Safari)"]
    end

    subgraph Native["Native GPU APIs"]
        Vulkan["Vulkan
(Linux, Windows, Android)"]
        Metal["Metal
(macOS, iOS)"]
        D3D12["Direct3D 12
(Windows)"]
    end

    subgraph Hardware["Phần cứng"]
        GPU["GPU Hardware
NVIDIA / AMD / Intel / Apple"]
    end

    JS --> API
    API --> Dawn
    API --> wgpu
    API --> WebKit
    Dawn --> Vulkan
    Dawn --> D3D12
    Dawn --> Metal
    wgpu --> Vulkan
    wgpu --> Metal
    wgpu --> D3D12
    WebKit --> Metal
    Vulkan --> GPU
    Metal --> GPU
    D3D12 --> GPU

    style JS fill:#e94560,stroke:#fff,color:#fff
    style API fill:#e94560,stroke:#fff,color:#fff
    style Dawn fill:#2c3e50,stroke:#fff,color:#fff
    style wgpu fill:#2c3e50,stroke:#fff,color:#fff
    style WebKit fill:#2c3e50,stroke:#fff,color:#fff
    style Vulkan fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style Metal fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D3D12 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GPU fill:#4CAF50,stroke:#fff,color:#fff

Kiến trúc WebGPU: từ JavaScript xuống GPU hardware qua abstraction layer

3.1. Các thành phần cốt lõi

WebGPU API được tổ chức xung quanh một số khái niệm chính:

Thành phần	Vai trò	Tương đương Vulkan
GPUAdapter	Đại diện cho GPU vật lý. Cung cấp thông tin về capabilities và limits.	VkPhysicalDevice
GPUDevice	Kết nối logic đến GPU. Tạo tất cả resource từ đây.	VkDevice
GPUBuffer	Vùng nhớ trên GPU. Chứa vertex data, uniform, storage data.	VkBuffer
GPUTexture	Hình ảnh trên GPU dùng cho sampling, render target.	VkImage
GPUCommandEncoder	Ghi lệnh vào command buffer trước khi submit.	VkCommandBuffer
GPUBindGroup	Gom các resource (buffer, texture, sampler) để shader truy cập.	VkDescriptorSet
GPURenderPipeline	Pipeline cho đồ họa: vertex → rasterization → fragment.	VkGraphicsPipeline
GPUComputePipeline	Pipeline cho tính toán tổng quát — không liên quan đến render.	VkComputePipeline

3.2. Command Buffer Model

Thay vì gửi lệnh trực tiếp như WebGL, WebGPU sử dụng mô hình record-then-submit:

sequenceDiagram
    participant App as JavaScript
    participant Enc as CommandEncoder
    participant Queue as GPUQueue
    participant GPU as GPU Hardware

    App->>Enc: createCommandEncoder()
    App->>Enc: beginRenderPass() / beginComputePass()
    App->>Enc: setPipeline(), setBindGroup()
    App->>Enc: draw() / dispatch()
    App->>Enc: end()
    Enc->>Enc: finish() → CommandBuffer
    App->>Queue: submit([commandBuffer])
    Queue->>GPU: Execute batch
    GPU-->>App: Kết quả (async)

Luồng xử lý: ghi lệnh → đóng gói → submit batch → GPU thực thi async

Lợi ích của Command Buffer Model

Mô hình này cho phép CPU chuẩn bị lệnh tiếp theo trong khi GPU vẫn đang xử lý batch trước. WebGL buộc CPU phải chờ GPU xong (blocking) ở mỗi draw call, gây bottleneck nghiêm trọng khi scene phức tạp. Với WebGPU, bạn có thể record nhiều command buffer trên các thread khác nhau (thông qua Web Workers) rồi submit cùng lúc.

4. WGSL — Ngôn ngữ Shader thế hệ mới

WebGPU Shading Language (WGSL) là ngôn ngữ shader mới được thiết kế riêng cho WebGPU, thay thế GLSL ES mà WebGL sử dụng. WGSL có cú pháp gần giống Rust, được biên dịch thành SPIR-V (Vulkan), MSL (Metal), hoặc HLSL (D3D12) tùy theo backend.

4.1. Cú pháp cơ bản

// Vertex Shader
struct VertexInput {
    @location(0) position: vec3f,
    @location(1) color: vec3f,
}

struct VertexOutput {
    @builtin(position) position: vec4f,
    @location(0) color: vec3f,
}

@group(0) @binding(0)
var<uniform> mvp: mat4x4f;

@vertex
fn vs_main(input: VertexInput) -> VertexOutput {
    var output: VertexOutput;
    output.position = mvp * vec4f(input.position, 1.0);
    output.color = input.color;
    return output;
}

// Fragment Shader
@fragment
fn fs_main(input: VertexOutput) -> @location(0) vec4f {
    return vec4f(input.color, 1.0);
}

4.2. Compute Shader

// Matrix multiplication trên GPU
@group(0) @binding(0) var<storage, read> matA: array<f32>;
@group(0) @binding(1) var<storage, read> matB: array<f32>;
@group(0) @binding(2) var<storage, read_write> matC: array<f32>;
@group(0) @binding(3) var<uniform> dims: vec3u; // M, N, K

@compute @workgroup_size(16, 16)
fn matmul(@builtin(global_invocation_id) gid: vec3u) {
    let row = gid.x;
    let col = gid.y;
    let M = dims.x;
    let N = dims.y;
    let K = dims.z;

    if (row >= M || col >= N) { return; }

    var sum: f32 = 0.0;
    for (var i: u32 = 0u; i < K; i = i + 1u) {
        sum = sum + matA[row * K + i] * matB[i * N + col];
    }
    matC[row * N + col] = sum;
}

Đặc điểm	WGSL	GLSL ES (WebGL)
Type system	Strong static typing, Rust-like	C-like, implicit conversions
Compute shader	Native support (@compute)	Không hỗ trợ
Storage buffers	var<storage> — read/write trực tiếp	Không có (phải dùng texture)
Workgroup memory	var<workgroup> — shared fast memory	Không có
Resource binding	@group(n) @binding(m) — explicit	Uniform location — implicit
Compilation target	SPIR-V, MSL, HLSL, DXIL	GLSL → driver

5. Render Pipeline vs Compute Pipeline

WebGPU cung cấp hai loại pipeline hoàn toàn tách biệt, phục vụ hai mục đích khác nhau:

🎨 Render Pipeline

Mục đích: Vẽ đồ họa lên screen hoặc texture.

Các stage:

Vertex Stage — biến đổi tọa độ đỉnh
Rasterization (fixed-function) — chuyển primitive thành fragment
Fragment Stage — tính màu cho mỗi pixel

Use case: 3D rendering, game, data visualization có hình ảnh.

⚡ Compute Pipeline

Mục đích: Tính toán tổng quát trên GPU — không liên quan đến render.

Các stage:

Compute Stage — duy nhất 1 stage lập trình được
Không có fixed-function stage
Chạy trên workgroup 3D (x, y, z)

Use case: ML inference, image processing, physics simulation, data pipeline.

graph LR
    subgraph Render["Render Pipeline"]
        V["Vertex
Shader"] --> R["Rasterizer
(fixed)"] --> F["Fragment
Shader"] --> O["Output
Texture"]
    end

    subgraph Compute["Compute Pipeline"]
        I["Input
Buffers"] --> C["Compute
Shader"] --> OB["Output
Buffers"]
    end

    style V fill:#e94560,stroke:#fff,color:#fff
    style F fill:#e94560,stroke:#fff,color:#fff
    style R fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style O fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style OB fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Render Pipeline (3 stage + fixed-function) vs Compute Pipeline (1 stage, thuần tính toán)

5.1. Workgroup — Đơn vị tính toán song song

Compute shader trong WebGPU chạy trên một lưới 3D các workgroup. Mỗi workgroup chứa một số cố định thread (invocation), và các thread trong cùng workgroup có thể chia sẻ bộ nhớ nhanh var<workgroup>.

// Workgroup size 16x16 = 256 threads per workgroup
@compute @workgroup_size(16, 16, 1)
fn main(
    @builtin(global_invocation_id) global_id: vec3u,
    @builtin(local_invocation_id) local_id: vec3u,
    @builtin(workgroup_id) wg_id: vec3u
) {
    // global_id = wg_id * workgroup_size + local_id
    // Tổng threads = dispatch(nx, ny, nz) * workgroup_size(16,16,1)
}

Best practice: chọn workgroup_size

Khuyến nghị chung là 64 threads (ví dụ 64×1×1 hoặc 8×8×1) — đây là bội chung của wavefront size trên AMD (64) và warp size trên NVIDIA (32). WebGPU giới hạn tối đa 256 invocations per workgroup. Nếu dữ liệu có dạng 2D (ảnh, matrix), dùng layout vuông (16×16) sẽ tối ưu cache locality.

6. Benchmark: WebGPU vs WebGL — Số liệu thực tế

Các benchmark thực tế năm 2026 cho thấy WebGPU vượt trội ở hầu hết mọi khía cạnh, đặc biệt khi workload tận dụng compute pipeline.

Benchmark	WebGL 2.0	WebGPU	Tỷ lệ cải thiện
Matrix multiply 2048×2048	~340ms	~45ms	7.5x
Particle system 100K	~12 FPS	~58 FPS	~5x
ML inference (Phi-3-mini, per token)	~300ms	~80ms	3.7x
Data visualization 1M points	Không render nổi	60 FPS	∞
Image filter 4K (Gaussian blur)	~28ms	~3ms	9.3x
Battery life (liên tục render)	~2 giờ	~3 giờ	+50%

Lưu ý về benchmark

Các con số trên được đo trên desktop hardware (NVIDIA RTX 4070 / Apple M3). Trên thiết bị di động, khoảng cách nhỏ hơn nhưng vẫn có lợi rõ rệt cho WebGPU nhờ giảm overhead từ driver abstraction. Ngoài ra, khoảng cách hiệu năng giữa WebGPU và native API (Vulkan/Metal trực tiếp) vẫn còn ~20% do overhead của ing trình duyệt.

6.1. Tại sao WebGPU nhanh hơn?

Sự vượt trội không đến từ "GPU mạnh hơn" — cùng một GPU, cùng một trình duyệt. Lý do nằm ở kiến trúc API:

Command buffer batching: WebGL gửi từng lệnh, CPU phải chờ. WebGPU batch hàng nghìn lệnh rồi submit một lần → giảm CPU overhead 10-50x.
Compute pipeline thay vì hack fragment shader: WebGL phải encode matrix thành texture, chạy fragment shader để "tính toán", rồi readPixels kết quả. WebGPU dùng storage buffer — đọc/ghi trực tiếp, zero encoding overhead.
Async by design: Mọi thao tác nặng (buffer mapping, shader compilation, pipeline creation) đều async — CPU không bao giờ bị block.
Pipeline State Object: WebGL phải validate toàn bộ GL state trước mỗi draw call. WebGPU bake state vào pipeline object một lần, reuse mãi — zero validation overhead per draw.

7. AI/ML Inference ngay trên trình duyệt

Đây là use case gây chấn động nhất của WebGPU trong 2026. Trước đây, chạy LLM trên client gần như không thể — WebGL không có compute shader, WASM quá chậm cho matrix operations. WebGPU thay đổi hoàn toàn bức tranh này.

graph TB
    subgraph Client["Client Browser"]
        Model["ONNX / GGUF Model
(cached trong IndexedDB)"]
        RT["Runtime
Transformers.js v3 / WebLLM / ONNX Runtime Web"]
        WG["WebGPU Compute Pipeline
Matrix Multiply + Attention + Softmax"]
        Result["Kết quả inference"]
    end

    subgraph Server["Không cần Server"]
        S["❌ Không API call
❌ Không GPU server cost
❌ Không latency mạng"]
    end

    Model --> RT
    RT --> WG
    WG --> Result

    style Model fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RT fill:#e94560,stroke:#fff,color:#fff
    style WG fill:#4CAF50,stroke:#fff,color:#fff
    style Result fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style S fill:#f8f9fa,stroke:#e0e0e0,color:#888

Luồng AI inference hoàn toàn trên client — zero server cost

7.1. Các framework hỗ trợ

Framework	Mô tả	Performance (so với native)
Transformers.js v3	Port Hugging Face cho browser, hỗ trợ 100+ model.	~70-80% native
WebLLM	Chạy LLM (Llama, Phi, Gemma) hoàn toàn trên browser.	~80% native
ONNX Runtime Web	Microsoft ONNX Runtime với WebGPU backend.	~75% native
MediaPipe	Google ML tasks (hand tracking, pose, segmentation).	~85% native

7.2. Ví dụ: Chạy text inference với WebLLM

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Phi-3.5-mini-instruct-q4f16_1-MLC", {
  initProgressCallback: (progress) => {
    console.log(`Loading: ${progress.text}`);
  }
});

const response = await engine.chat.completions.create({
  messages: [
    { role: "system", content: "Bạn là trợ lý AI hữu ích." },
    { role: "user", content: "Giải thích WebGPU trong 3 câu." }
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Chi phí = 0

Với WebGPU inference trên client, bạn không phải trả tiền GPU server. Model được cache trong IndexedDB sau lần tải đầu tiên (~2-4GB cho model 3B params, q4 quantized). Mỗi request sau đó đều miễn phí, không latency mạng, và hoàn toàn private — dữ liệu người dùng không rời khỏi thiết bị.

8. Data Visualization và Game Development

8.1. Data Visualization quy mô lớn

Render 1 triệu data point ở 60 FPS — một workload mà Canvas2D hoặc WebGL sẽ đơ hoàn toàn. WebGPU compute pipeline xử lý binning, aggregation và layout trực tiếp trên GPU, sau đó render pipeline vẽ kết quả.

// Compute pass: aggregate 1M points thành heatmap bins
const computePass = encoder.beginComputePass();
computePass.setPipeline(aggregatePipeline);
computePass.setBindGroup(0, bindGroup);
// 1,000,000 points / 256 threads per workgroup = 3907 workgroups
computePass.dispatchWorkgroups(3907);
computePass.end();

// Render pass: vẽ heatmap từ aggregated bins
const renderPass = encoder.beginRenderPass(renderPassDescriptor);
renderPass.setPipeline(renderPipeline);
renderPass.setBindGroup(0, renderBindGroup);
renderPass.draw(6, binCount); // Instanced drawing
renderPass.end();

device.queue.submit([encoder.finish()]);

8.2. Game Development

Với WebGPU, browser game đạt chất lượng gần console:

Babylon.js 7+ — hỗ trợ WebGPU với PBR rendering, realtime shadows, post-processing pipeline.
Three.js r171+ — WebGPU renderer tích hợp sẵn, auto-fallback WebGL 2.
Unreal Engine 5 — đã thêm WebGPU backend (từ 04/2024).
PlayCanvas — engine WebGPU-first, tối ưu cho mobile browser game.

WebGPU + WebAssembly = Native-like performance

Kết hợp WASM (cho CPU logic) và WebGPU (cho GPU rendering + compute) cho phép port game engine C++ sang web với hiệu năng chỉ chậm hơn native ~20-25%. Emscripten đã hỗ trợ WebGPU binding từ 2024.

9. Tích hợp với Framework phổ biến

9.1. Three.js — Chuyển đổi chỉ 2 dòng code

// Trước (WebGL)
import * as THREE from 'three';
const renderer = new THREE.WebGLRenderer({ canvas });

// Sau (WebGPU) — chỉ đổi 2 dòng
import * as THREE from 'three';
import WebGPURenderer from 'three/addons/renderers/webgpu/WebGPURenderer.js';
const renderer = new WebGPURenderer({ canvas });
await renderer.init();

9.2. Babylon.js — Async initialization

const engine = new BABYLON.WebGPUEngine(canvas);
await engine.initAsync();

// Tất cả API Babylon.js hoạt động bình thường
const scene = new BABYLON.Scene(engine);
const camera = new BABYLON.ArcRotateCamera("cam", 0, 0, 10, BABYLON.Vector3.Zero(), scene);

9.3. Vue.js + WebGPU — Component pattern

// composable: useWebGPU.ts
export function useWebGPU(canvasRef: Ref<HTMLCanvasElement | null>) {
  const device = shallowRef<GPUDevice | null>(null);
  const context = shallowRef<GPUCanvasContext | null>(null);

  onMounted(async () => {
    if (!navigator.gpu) {
      console.warn('WebGPU not supported, falling back to WebGL');
      return;
    }
    const adapter = await navigator.gpu.requestAdapter();
    if (!adapter) return;

    device.value = await adapter.requestDevice();
    context.value = canvasRef.value!.getContext('webgpu')!;
    context.value.configure({
      device: device.value,
      format: navigator.gpu.getPreferredCanvasFormat(),
      alphaMode: 'premultiplied',
    });
  });

  onUnmounted(() => {
    device.value?.destroy();
  });

  return { device, context };
}

10. Hướng dẫn bắt đầu với WebGPU

10.1. Khởi tạo cơ bản

async function initWebGPU() {
  // 1. Kiểm tra hỗ trợ
  if (!navigator.gpu) {
    throw new Error('WebGPU is not supported in this browser');
  }

  // 2. Request adapter (GPU vật lý)
  const adapter = await navigator.gpu.requestAdapter({
    powerPreference: 'high-performance' // hoặc 'low-power'
  });
  if (!adapter) throw new Error('No GPU adapter found');

  // 3. Request device (kết nối logic)
  const device = await adapter.requestDevice({
    requiredFeatures: ['timestamp-query'], // optional features
    requiredLimits: {
      maxStorageBufferBindingSize: 256 * 1024 * 1024, // 256MB
    }
  });

  // 4. Handle device loss
  device.lost.then((info) => {
    console.error(`GPU device lost: ${info.reason} - ${info.message}`);
    if (info.reason !== 'destroyed') {
      initWebGPU(); // retry
    }
  });

  // 5. Configure canvas context
  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const format = navigator.gpu.getPreferredCanvasFormat();
  context.configure({ device, format, alphaMode: 'premultiplied' });

  return { device, context, format };
}

10.2. Ví dụ hoàn chỉnh: Compute Pipeline

async function runCompute(device) {
  // Dữ liệu input
  const data = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8]);

  // Tạo buffer
  const inputBuffer = device.createBuffer({
    size: data.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });
  device.queue.writeBuffer(inputBuffer, 0, data);

  const outputBuffer = device.createBuffer({
    size: data.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC,
  });

  const readBuffer = device.createBuffer({
    size: data.byteLength,
    usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
  });

  // Shader: nhân đôi mỗi phần tử
  const shaderModule = device.createShaderModule({
    code: `
      @group(0) @binding(0) var<storage, read> input: array<f32>;
      @group(0) @binding(1) var<storage, read_write> output: array<f32>;

      @compute @workgroup_size(64)
      fn main(@builtin(global_invocation_id) gid: vec3u) {
        if (gid.x < arrayLength(&input)) {
          output[gid.x] = input[gid.x] * 2.0;
        }
      }
    `
  });

  // Tạo pipeline
  const pipeline = device.createComputePipeline({
    layout: 'auto',
    compute: { module: shaderModule, entryPoint: 'main' }
  });

  // Bind group
  const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: inputBuffer } },
      { binding: 1, resource: { buffer: outputBuffer } },
    ]
  });

  // Encode và submit
  const encoder = device.createCommandEncoder();
  const pass = encoder.beginComputePass();
  pass.setPipeline(pipeline);
  pass.setBindGroup(0, bindGroup);
  pass.dispatchWorkgroups(Math.ceil(data.length / 64));
  pass.end();

  encoder.copyBufferToBuffer(outputBuffer, 0, readBuffer, 0, data.byteLength);
  device.queue.submit([encoder.finish()]);

  // Đọc kết quả
  await readBuffer.mapAsync(GPUMapMode.READ);
  const result = new Float32Array(readBuffer.getMappedRange());
  console.log('Result:', Array.from(result));
  // Output: [2, 4, 6, 8, 10, 12, 14, 16]
  readBuffer.unmap();
}

11. Thách thức và hướng đi tương lai

11.1. Thách thức hiện tại

Thách thức	Chi tiết	Giải pháp / Hướng đi
Device compatibility	45% thiết bị cũ thiếu hỗ trợ storage buffer trong vertex shader.	Feature detection + WebGL fallback. Three.js đã tự động fallback.
Driver issues	NVIDIA 572.xx crash, AMD Radeon HD 7700 artifacts, Intel iGPU hangs.	Browser blocklist + driver update. Chrome có list deny cụ thể.
Learning curve	API phức tạp hơn WebGL nhiều — explicit resource management, pipeline creation.	Dùng framework (Three.js, Babylon.js) thay vì raw API cho đa số use case.
30% thiết bị chưa hỗ trợ	Chủ yếu thiết bị Android cũ, iOS < 26.	Progressive enhancement: WebGPU khi có, WebGL khi không.

11.2. WebGPU v2 — Đang phát triển

Nhóm W3C GPU for the Web đang thiết kế phiên bản tiếp theo với các tính năng quan trọng:

Subgroup operations — cho phép thread trong cùng subgroup (warp/wavefront) communicate trực tiếp, tăng tốc reduction và scan operations 2-4x.
Bindless resources — truy cập resource qua index thay vì bind group cố định, giảm overhead khi scene có nhiều material/texture.
Multi-draw indirect — submit hàng nghìn draw call trong một lệnh duy nhất, GPU tự quyết định draw gì.
Ray tracing — expose hardware RT cores cho real-time ray tracing trên web.
64-bit atomics — cần thiết cho các thuật toán scientific computing chính xác cao.

Lời khuyên cho developer

Nếu bạn đang dùng Three.js hay Babylon.js, chuyển sang WebGPU renderer ngay — migration cost gần như bằng 0 và performance tăng rõ rệt. Nếu bạn làm AI/ML trên web, WebGPU là bắt buộc. Còn nếu bạn cần raw GPU compute cho data processing hay simulation, đây là lúc học WGSL và WebGPU API trực tiếp. Với 70% browser support và tất cả major browser đã ship, WebGPU không còn là "experimental" — nó là hiện tại.

Nguồn tham khảo:

#javascript #Web Performance #Vue.js #WebGPU #WGSL #GPU Computing #WebGL #AI Inference #Three.js

# WebGPU — Kỷ Nguyên Mới Của GPU Computing Trên Trình Duyệt

## 1. WebGPU là gì và tại sao nó thay đổi cuộc chơi?

Điểm đột phá cốt lõi: WebGPU không chỉ là API render đồ họa, mà còn cung cấp **compute pipeline** hoàn chỉnh — cho phép chạy tính toán tổng quát (GPGPU) trực tiếp trên GPU thông qua trình duyệt. Điều này mở ra khả năng chạy machine learning inference, xử lý hình ảnh, mô phỏng vật lý, và phân tích dữ liệu lớn ngay trên client mà không cần server.

70% Browser support toàn cầu (2026)

15-30x Nhanh hơn WebGL cho compute workload

80% Hiệu năng native cho AI inference

65% Ứng dụng web mới đã dùng WebGPU

#### Tại sao developer cần quan tâm?

## 2. Từ WebGL đến WebGPU — 15 năm tiến hóa

Hành trình từ WebGL 1.0 đến WebGPU là câu chuyện về sự tiến hóa tất yếu khi nhu cầu tính toán trên web vượt xa khả năng của OpenGL ES.

2011

**WebGL 1.0** ra mắt — dựa trên OpenGL ES 2.0. Lần đầu web truy cập được GPU nhưng bị giới hạn ở render pipeline duy nhất.

2017

**WebGL 2.0** — nâng lên OpenGL ES 3.0 với transform feedback, instanced rendering. Nhưng vẫn thiếu compute shader — mọi GPGPU phải "hack" qua fragment shader.

2017-2018

Apple, Google, Mozilla bắt đầu thiết kế API GPU mới. Apple đề xuất WebMetal (dựa trên Metal), Google đề xuất WebGPU dựa trên Vulkan/D3D12.

2023

**Chrome 113** ship WebGPU đầu tiên trên Windows, macOS, ChromeOS. Thời kỳ Origin Trial kết thúc.

01/2026

**Firefox 147** bật WebGPU trên Windows và ARM64 macOS. Safari ship mặc định trên iOS 26, macOS Tahoe 26. *Tất cả trình duyệt lớn đã hỗ trợ.*

Q1/2026

**WebGPU v2** bắt đầu phát triển — hướng tới subgroup operations, bindless resources, multi-draw indirect.

#### Bước nhảy kiến trúc

WebGL bắt buộc dùng mô hình **immediate-mode** — mỗi lệnh draw gửi trực tiếp đến driver, CPU phải chờ GPU xong mới tiếp tục. WebGPU chuyển sang **command buffer model** — ghi lệnh vào buffer trước, rồi submit cả batch. Đây chính xác là cách Vulkan, Metal và D3D12 hoạt động.

## 3. Kiến trúc kỹ thuật của WebGPU

WebGPU được thiết kế theo mô hình low-level với explicit state management, cho phép developer kiểm soát chi tiết cách GPU xử lý dữ liệu.

```
graph TB
    subgraph Browser["Trình duyệt"]
        JS["JavaScript / WASM  
Application Code"]
        API["WebGPU API  
(navigator.gpu)"]
    end

subgraph Abstraction["Abstraction Layer"]
        Dawn["Dawn (Chrome)  
C++ implementation"]
        wgpu["wgpu (Firefox)  
Rust implementation"]
        WebKit["WebKit GPU Process  
(Safari)"]
    end

subgraph Native["Native GPU APIs"]
        Vulkan["Vulkan  
(Linux, Windows, Android)"]
        Metal["Metal  
(macOS, iOS)"]
        D3D12["Direct3D 12  
(Windows)"]
    end

subgraph Hardware["Phần cứng"]
        GPU["GPU Hardware  
NVIDIA / AMD / Intel / Apple"]
    end

JS --> API
    API --> Dawn
    API --> wgpu
    API --> WebKit
    Dawn --> Vulkan
    Dawn --> D3D12
    Dawn --> Metal
    wgpu --> Vulkan
    wgpu --> Metal
    wgpu --> D3D12
    WebKit --> Metal
    Vulkan --> GPU
    Metal --> GPU
    D3D12 --> GPU

style JS fill:#e94560,stroke:#fff,color:#fff
    style API fill:#e94560,stroke:#fff,color:#fff
    style Dawn fill:#2c3e50,stroke:#fff,color:#fff
    style wgpu fill:#2c3e50,stroke:#fff,color:#fff
    style WebKit fill:#2c3e50,stroke:#fff,color:#fff
    style Vulkan fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style Metal fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D3D12 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style GPU fill:#4CAF50,stroke:#fff,color:#fff

```
Kiến trúc WebGPU: từ JavaScript xuống GPU hardware qua abstraction layer

### 3.1. Các thành phần cốt lõi

WebGPU API được tổ chức xung quanh một số khái niệm chính:

| Thành phần | Vai trò | Tương đương Vulkan |
| --- | --- | --- |
| **GPUAdapter** | Đại diện cho GPU vật lý. Cung cấp thông tin về capabilities và limits. | VkPhysicalDevice |
| **GPUDevice** | Kết nối logic đến GPU. Tạo tất cả resource từ đây. | VkDevice |
| **GPUBuffer** | Vùng nhớ trên GPU. Chứa vertex data, uniform, storage data. | VkBuffer |
| **GPUTexture** | Hình ảnh trên GPU dùng cho sampling, render target. | VkImage |
| **GPUCommandEncoder** | Ghi lệnh vào command buffer trước khi submit. | VkCommandBuffer |
| **GPUBindGroup** | Gom các resource (buffer, texture, sampler) để shader truy cập. | VkDescriptorSet |
| **GPURenderPipeline** | Pipeline cho đồ họa: vertex → rasterization → fragment. | VkGraphicsPipeline |
| **GPUComputePipeline** | Pipeline cho tính toán tổng quát — không liên quan đến render. | VkComputePipeline |

### 3.2. Command Buffer Model

Thay vì gửi lệnh trực tiếp như WebGL, WebGPU sử dụng mô hình **record-then-submit**:

```
sequenceDiagram
    participant App as JavaScript
    participant Enc as CommandEncoder
    participant Queue as GPUQueue
    participant GPU as GPU Hardware

App->>Enc: createCommandEncoder()
    App->>Enc: beginRenderPass() / beginComputePass()
    App->>Enc: setPipeline(), setBindGroup()
    App->>Enc: draw() / dispatch()
    App->>Enc: end()
    Enc->>Enc: finish() → CommandBuffer
    App->>Queue: submit([commandBuffer])
    Queue->>GPU: Execute batch
    GPU-->>App: Kết quả (async)

```
Luồng xử lý: ghi lệnh → đóng gói → submit batch → GPU thực thi async

#### Lợi ích của Command Buffer Model

## 4. WGSL — Ngôn ngữ Shader thế hệ mới

**WebGPU Shading Language (WGSL)** là ngôn ngữ shader mới được thiết kế riêng cho WebGPU, thay thế GLSL ES mà WebGL sử dụng. WGSL có cú pháp gần giống Rust, được biên dịch thành SPIR-V (Vulkan), MSL (Metal), hoặc HLSL (D3D12) tùy theo backend.

### 4.1. Cú pháp cơ bản

```wgsl
// Vertex Shader
struct VertexInput {
    @location(0) position: vec3f,
    @location(1) color: vec3f,
}

struct VertexOutput {
    @builtin(position) position: vec4f,
    @location(0) color: vec3f,
}

@group(0) @binding(0)
var<uniform> mvp: mat4x4f;

@vertex
fn vs_main(input: VertexInput) -> VertexOutput {
    var output: VertexOutput;
    output.position = mvp * vec4f(input.position, 1.0);
    output.color = input.color;
    return output;
}

// Fragment Shader
@fragment
fn fs_main(input: VertexOutput) -> @location(0) vec4f {
    return vec4f(input.color, 1.0);
}

```

### 4.2. Compute Shader

```wgsl
// Matrix multiplication trên GPU
@group(0) @binding(0) var<storage, read> matA: array<f32>;
@group(0) @binding(1) var<storage, read> matB: array<f32>;
@group(0) @binding(2) var<storage, read_write> matC: array<f32>;
@group(0) @binding(3) var<uniform> dims: vec3u; // M, N, K

@compute @workgroup_size(16, 16)
fn matmul(@builtin(global_invocation_id) gid: vec3u) {
    let row = gid.x;
    let col = gid.y;
    let M = dims.x;
    let N = dims.y;
    let K = dims.z;

if (row >= M || col >= N) { return; }

var sum: f32 = 0.0;
    for (var i: u32 = 0u; i < K; i = i + 1u) {
        sum = sum + matA[row * K + i] * matB[i * N + col];
    }
    matC[row * N + col] = sum;
}

```

| Đặc điểm | WGSL | GLSL ES (WebGL) |
| --- | --- | --- |
| Type system | Strong static typing, Rust-like | C-like, implicit conversions |
| Compute shader | Native support (@compute) | Không hỗ trợ |
| Storage buffers | var<storage> — read/write trực tiếp | Không có (phải dùng texture) |
| Workgroup memory | var<workgroup> — shared fast memory | Không có |
| Resource binding | @group(n) @binding(m) — explicit | Uniform location — implicit |
| Compilation target | SPIR-V, MSL, HLSL, DXIL | GLSL → driver |

## 5. Render Pipeline vs Compute Pipeline

WebGPU cung cấp hai loại pipeline hoàn toàn tách biệt, phục vụ hai mục đích khác nhau:

#### 🎨 Render Pipeline

**Mục đích:** Vẽ đồ họa lên screen hoặc texture.

**Các stage:**

- **Vertex Stage** — biến đổi tọa độ đỉnh
- **Rasterization** (fixed-function) — chuyển primitive thành fragment
- **Fragment Stage** — tính màu cho mỗi pixel

**Use case:** 3D rendering, game, data visualization có hình ảnh.

#### ⚡ Compute Pipeline

**Mục đích:** Tính toán tổng quát trên GPU — không liên quan đến render.

**Các stage:**

- **Compute Stage** — duy nhất 1 stage lập trình được
- Không có fixed-function stage
- Chạy trên workgroup 3D (x, y, z)

**Use case:** ML inference, image processing, physics simulation, data pipeline.

```
graph LR
    subgraph Render["Render Pipeline"]
        V["Vertex  
Shader"] --> R["Rasterizer  
(fixed)"] --> F["Fragment  
Shader"] --> O["Output  
Texture"]
    end

subgraph Compute["Compute Pipeline"]
        I["Input  
Buffers"] --> C["Compute  
Shader"] --> OB["Output  
Buffers"]
    end

style V fill:#e94560,stroke:#fff,color:#fff
    style F fill:#e94560,stroke:#fff,color:#fff
    style R fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style O fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#4CAF50,stroke:#fff,color:#fff
    style I fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style OB fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

```
Render Pipeline (3 stage + fixed-function) vs Compute Pipeline (1 stage, thuần tính toán)

### 5.1. Workgroup — Đơn vị tính toán song song

Compute shader trong WebGPU chạy trên một lưới 3D các **workgroup**. Mỗi workgroup chứa một số cố định thread (invocation), và các thread trong cùng workgroup có thể chia sẻ bộ nhớ nhanh `var<workgroup>`.

```wgsl
// Workgroup size 16x16 = 256 threads per workgroup
@compute @workgroup_size(16, 16, 1)
fn main(
    @builtin(global_invocation_id) global_id: vec3u,
    @builtin(local_invocation_id) local_id: vec3u,
    @builtin(workgroup_id) wg_id: vec3u
) {
    // global_id = wg_id * workgroup_size + local_id
    // Tổng threads = dispatch(nx, ny, nz) * workgroup_size(16,16,1)
}

```

#### Best practice: chọn workgroup_size

Khuyến nghị chung là **64 threads** (ví dụ 64×1×1 hoặc 8×8×1) — đây là bội chung của wavefront size trên AMD (64) và warp size trên NVIDIA (32). WebGPU giới hạn tối đa 256 invocations per workgroup. Nếu dữ liệu có dạng 2D (ảnh, matrix), dùng layout vuông (16×16) sẽ tối ưu cache locality.

## 6. Benchmark: WebGPU vs WebGL — Số liệu thực tế

Các benchmark thực tế năm 2026 cho thấy WebGPU vượt trội ở hầu hết mọi khía cạnh, đặc biệt khi workload tận dụng compute pipeline.

| Benchmark | WebGL 2.0 | WebGPU | Tỷ lệ cải thiện |
| --- | --- | --- | --- |
| Matrix multiply 2048×2048 | ~340ms | ~45ms | **7.5x** |
| Particle system 100K | ~12 FPS | ~58 FPS | **~5x** |
| ML inference (Phi-3-mini, per token) | ~300ms | ~80ms | **3.7x** |
| Data visualization 1M points | Không render nổi | 60 FPS | **∞** |
| Image filter 4K (Gaussian blur) | ~28ms | ~3ms | **9.3x** |
| Battery life (liên tục render) | ~2 giờ | ~3 giờ | **+50%** |

#### Lưu ý về benchmark

### 6.1. Tại sao WebGPU nhanh hơn?

Sự vượt trội không đến từ "GPU mạnh hơn" — cùng một GPU, cùng một trình duyệt. Lý do nằm ở kiến trúc API:

- **Command buffer batching:** WebGL gửi từng lệnh, CPU phải chờ. WebGPU batch hàng nghìn lệnh rồi submit một lần → giảm CPU overhead 10-50x.
- **Compute pipeline thay vì hack fragment shader:** WebGL phải encode matrix thành texture, chạy fragment shader để "tính toán", rồi readPixels kết quả. WebGPU dùng storage buffer — đọc/ghi trực tiếp, zero encoding overhead.
- **Async by design:** Mọi thao tác nặng (buffer mapping, shader compilation, pipeline creation) đều async — CPU không bao giờ bị block.
- **Pipeline State Object:** WebGL phải validate toàn bộ GL state trước mỗi draw call. WebGPU bake state vào pipeline object một lần, reuse mãi — zero validation overhead per draw.

## 7. AI/ML Inference ngay trên trình duyệt

```
graph TB
    subgraph Client["Client Browser"]
        Model["ONNX / GGUF Model  
(cached trong IndexedDB)"]
        RT["Runtime  
Transformers.js v3 / WebLLM / ONNX Runtime Web"]
        WG["WebGPU Compute Pipeline  
Matrix Multiply + Attention + Softmax"]
        Result["Kết quả inference"]
    end

subgraph Server["Không cần Server"]
        S["❌ Không API call  
❌ Không GPU server cost  
❌ Không latency mạng"]
    end

Model --> RT
    RT --> WG
    WG --> Result

style Model fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style RT fill:#e94560,stroke:#fff,color:#fff
    style WG fill:#4CAF50,stroke:#fff,color:#fff
    style Result fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style S fill:#f8f9fa,stroke:#e0e0e0,color:#888

```
Luồng AI inference hoàn toàn trên client — zero server cost

### 7.1. Các framework hỗ trợ

| Framework | Mô tả | Performance (so với native) |
| --- | --- | --- |
| **Transformers.js v3** | Port Hugging Face cho browser, hỗ trợ 100+ model. | ~70-80% native |
| **WebLLM** | Chạy LLM (Llama, Phi, Gemma) hoàn toàn trên browser. | ~80% native |
| **ONNX Runtime Web** | Microsoft ONNX Runtime với WebGPU backend. | ~75% native |
| **MediaPipe** | Google ML tasks (hand tracking, pose, segmentation). | ~85% native |

### 7.2. Ví dụ: Chạy text inference với WebLLM

```javascript
import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Phi-3.5-mini-instruct-q4f16_1-MLC", {
  initProgressCallback: (progress) => {
    console.log(`Loading: ${progress.text}`);
  }
});

const response = await engine.chat.completions.create({
  messages: [
    { role: "system", content: "Bạn là trợ lý AI hữu ích." },
    { role: "user", content: "Giải thích WebGPU trong 3 câu." }
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

```

#### Chi phí = 0

## 8. Data Visualization và Game Development

### 8.1. Data Visualization quy mô lớn

```javascript
// Compute pass: aggregate 1M points thành heatmap bins
const computePass = encoder.beginComputePass();
computePass.setPipeline(aggregatePipeline);
computePass.setBindGroup(0, bindGroup);
// 1,000,000 points / 256 threads per workgroup = 3907 workgroups
computePass.dispatchWorkgroups(3907);
computePass.end();

// Render pass: vẽ heatmap từ aggregated bins
const renderPass = encoder.beginRenderPass(renderPassDescriptor);
renderPass.setPipeline(renderPipeline);
renderPass.setBindGroup(0, renderBindGroup);
renderPass.draw(6, binCount); // Instanced drawing
renderPass.end();

device.queue.submit([encoder.finish()]);

```

### 8.2. Game Development

Với WebGPU, browser game đạt chất lượng gần console:

- **Babylon.js 7+** — hỗ trợ WebGPU với PBR rendering, realtime shadows, post-processing pipeline.
- **Three.js r171+** — WebGPU renderer tích hợp sẵn, auto-fallback WebGL 2.
- **Unreal Engine 5** — đã thêm WebGPU backend (từ 04/2024).
- **PlayCanvas** — engine WebGPU-first, tối ưu cho mobile browser game.

#### WebGPU + WebAssembly = Native-like performance

## 9. Tích hợp với Framework phổ biến

### 9.1. Three.js — Chuyển đổi chỉ 2 dòng code

```javascript
// Trước (WebGL)
import * as THREE from 'three';
const renderer = new THREE.WebGLRenderer({ canvas });

// Sau (WebGPU) — chỉ đổi 2 dòng
import * as THREE from 'three';
import WebGPURenderer from 'three/addons/renderers/webgpu/WebGPURenderer.js';
const renderer = new WebGPURenderer({ canvas });
await renderer.init();

```

### 9.2. Babylon.js — Async initialization

```javascript
const engine = new BABYLON.WebGPUEngine(canvas);
await engine.initAsync();

// Tất cả API Babylon.js hoạt động bình thường
const scene = new BABYLON.Scene(engine);
const camera = new BABYLON.ArcRotateCamera("cam", 0, 0, 10, BABYLON.Vector3.Zero(), scene);

```

### 9.3. Vue.js + WebGPU — Component pattern

```javascript
// composable: useWebGPU.ts
export function useWebGPU(canvasRef: Ref<HTMLCanvasElement | null>) {
  const device = shallowRef<GPUDevice | null>(null);
  const context = shallowRef<GPUCanvasContext | null>(null);

onMounted(async () => {
    if (!navigator.gpu) {
      console.warn('WebGPU not supported, falling back to WebGL');
      return;
    }
    const adapter = await navigator.gpu.requestAdapter();
    if (!adapter) return;

device.value = await adapter.requestDevice();
    context.value = canvasRef.value!.getContext('webgpu')!;
    context.value.configure({
      device: device.value,
      format: navigator.gpu.getPreferredCanvasFormat(),
      alphaMode: 'premultiplied',
    });
  });

onUnmounted(() => {
    device.value?.destroy();
  });

return { device, context };
}

```

## 10. Hướng dẫn bắt đầu với WebGPU

### 10.1. Khởi tạo cơ bản

```javascript
async function initWebGPU() {
  // 1. Kiểm tra hỗ trợ
  if (!navigator.gpu) {
    throw new Error('WebGPU is not supported in this browser');
  }

// 2. Request adapter (GPU vật lý)
  const adapter = await navigator.gpu.requestAdapter({
    powerPreference: 'high-performance' // hoặc 'low-power'
  });
  if (!adapter) throw new Error('No GPU adapter found');

// 3. Request device (kết nối logic)
  const device = await adapter.requestDevice({
    requiredFeatures: ['timestamp-query'], // optional features
    requiredLimits: {
      maxStorageBufferBindingSize: 256 * 1024 * 1024, // 256MB
    }
  });

// 4. Handle device loss
  device.lost.then((info) => {
    console.error(`GPU device lost: ${info.reason} - ${info.message}`);
    if (info.reason !== 'destroyed') {
      initWebGPU(); // retry
    }
  });

// 5. Configure canvas context
  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const format = navigator.gpu.getPreferredCanvasFormat();
  context.configure({ device, format, alphaMode: 'premultiplied' });

return { device, context, format };
}

```

### 10.2. Ví dụ hoàn chỉnh: Compute Pipeline

```javascript
async function runCompute(device) {
  // Dữ liệu input
  const data = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8]);

// Tạo buffer
  const inputBuffer = device.createBuffer({
    size: data.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });
  device.queue.writeBuffer(inputBuffer, 0, data);

const outputBuffer = device.createBuffer({
    size: data.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC,
  });

const readBuffer = device.createBuffer({
    size: data.byteLength,
    usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
  });

// Shader: nhân đôi mỗi phần tử
  const shaderModule = device.createShaderModule({
    code: `
      @group(0) @binding(0) var<storage, read> input: array<f32>;
      @group(0) @binding(1) var<storage, read_write> output: array<f32>;

@compute @workgroup_size(64)
      fn main(@builtin(global_invocation_id) gid: vec3u) {
        if (gid.x < arrayLength(&input)) {
          output[gid.x] = input[gid.x] * 2.0;
        }
      }
    `
  });

// Tạo pipeline
  const pipeline = device.createComputePipeline({
    layout: 'auto',
    compute: { module: shaderModule, entryPoint: 'main' }
  });

// Bind group
  const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: inputBuffer } },
      { binding: 1, resource: { buffer: outputBuffer } },
    ]
  });

// Encode và submit
  const encoder = device.createCommandEncoder();
  const pass = encoder.beginComputePass();
  pass.setPipeline(pipeline);
  pass.setBindGroup(0, bindGroup);
  pass.dispatchWorkgroups(Math.ceil(data.length / 64));
  pass.end();

encoder.copyBufferToBuffer(outputBuffer, 0, readBuffer, 0, data.byteLength);
  device.queue.submit([encoder.finish()]);

// Đọc kết quả
  await readBuffer.mapAsync(GPUMapMode.READ);
  const result = new Float32Array(readBuffer.getMappedRange());
  console.log('Result:', Array.from(result));
  // Output: [2, 4, 6, 8, 10, 12, 14, 16]
  readBuffer.unmap();
}

```

## 11. Thách thức và hướng đi tương lai

### 11.1. Thách thức hiện tại

| Thách thức | Chi tiết | Giải pháp / Hướng đi |
| --- | --- | --- |
| **Device compatibility** | 45% thiết bị cũ thiếu hỗ trợ storage buffer trong vertex shader. | Feature detection + WebGL fallback. Three.js đã tự động fallback. |
| **Driver issues** | NVIDIA 572.xx crash, AMD Radeon HD 7700 artifacts, Intel iGPU hangs. | Browser blocklist + driver update. Chrome có list deny cụ thể. |
| **Learning curve** | API phức tạp hơn WebGL nhiều — explicit resource management, pipeline creation. | Dùng framework (Three.js, Babylon.js) thay vì raw API cho đa số use case. |
| **30% thiết bị chưa hỗ trợ** | Chủ yếu thiết bị Android cũ, iOS < 26. | Progressive enhancement: WebGPU khi có, WebGL khi không. |

### 11.2. WebGPU v2 — Đang phát triển

Nhóm W3C GPU for the Web đang thiết kế phiên bản tiếp theo với các tính năng quan trọng:

- **Subgroup operations** — cho phép thread trong cùng subgroup (warp/wavefront) communicate trực tiếp, tăng tốc reduction và scan operations 2-4x.
- **Bindless resources** — truy cập resource qua index thay vì bind group cố định, giảm overhead khi scene có nhiều material/texture.
- **Multi-draw indirect** — submit hàng nghìn draw call trong một lệnh duy nhất, GPU tự quyết định draw gì.
- **Ray tracing** — expose hardware RT cores cho real-time ray tracing trên web.
- **64-bit atomics** — cần thiết cho các thuật toán scientific computing chính xác cao.

#### Lời khuyên cho developer

**Nguồn tham khảo:**

- [W3C WebGPU Specification](https://www.w3.org/TR/webgpu/)
- [W3C WGSL Specification](https://www.w3.org/TR/WGSL/)
- [MDN Web Docs — WebGPU API](https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API)
- [WebGPU Fundamentals](https://webgpufundamentals.org/)
- [Google Codelabs — Your first WebGPU app](https://codelabs.developers.google.com/your-first-webgpu-app)
- [WWDC25 — Unlock GPU computing with WebGPU](https://developer.apple.com/videos/play/wwdc2025/236/)
- [WebGPU 2026: 70% Browser Support, 15x Performance Gains](https://byteiota.com/webgpu-2026-70-browser-support-15x-performance-gains/)

BFF Pattern — Bảo mật SPA hiện đại với ASP.NET Core và YARP

Distributed Locking — Giải quyết Race Condition trong hệ thống phân tán với Redis và .NET 10

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.