Load Balancing: Nghệ thuật Phân tải cho Hệ thống Triệu Request

Posted on: 4/20/2026 7:40:51 AM

Table of contents

Load Balancing là gì và tại sao quan trọng?
1. Khi nào cần Load Balancer?
Layer 4 vs Layer 7: Hai trường phái phân tải
1. Thực tế triển khai
6 thuật toán Load Balancing phổ biến nhất
So sánh toàn diện các thuật toán
Công cụ Load Balancing phổ biến
Health Check và Failover — Bộ đệm an toàn
Kiến trúc triển khai thực tế
Các lỗi thường gặp và cách khắc phục
Load Balancing trong Kubernetes
Checklist triển khai Load Balancer
1. Production Readiness Checklist
Tổng kết
Tham khảo

Load Balancing là gì và tại sao quan trọng?

Hãy tưởng tượng bạn điều hành một nhà hàng có 10 bàn phục vụ. Nếu tất cả khách đều ngồi vào 1 bàn trong khi 9 bàn còn lại trống, trải nghiệm sẽ cực kỳ tồi tệ. Load Balancer chính là người quản lý nhà hàng — phân bổ khách hàng đều vào các bàn, đảm bảo không ai phải chờ quá lâu.

Trong kiến trúc phần mềm, Load Balancer là thành phần phân phối traffic đến từ client vào nhiều server backend, nhằm tối ưu hiệu suất, tăng tính sẵn sàng (availability), và đảm bảo không server nào bị quá tải.

<1ms Cold start trung bình của L4 LB

99.99% Uptime SLA của cloud LB

10M+ Requests/s xử lý bởi NGINX

330+ PoPs của Cloudflare LB

Khi nào cần Load Balancer?

Ngay khi hệ thống có từ 2 instance trở lên, bạn cần Load Balancer. Nhưng vai trò của nó không chỉ là "chia đều request" — nó còn thực hiện health check, SSL termination, rate limiting, và là tuyến phòng thủ đầu tiên trước DDoS.

Layer 4 vs Layer 7: Hai trường phái phân tải

Đây là quyết định kiến trúc đầu tiên khi chọn Load Balancer. Sự khác biệt nằm ở tầng OSI mà LB hoạt động, và điều đó ảnh hưởng trực tiếp đến khả năng routing, hiệu năng và chi phí.

graph TB
    subgraph L4["Layer 4 — Transport"]
        A[Client Request] -->|TCP/UDP| B[L4 Load Balancer]
        B -->|IP + Port| C[Server A]
        B -->|IP + Port| D[Server B]
        B -->|IP + Port| E[Server C]
    end

    subgraph L7["Layer 7 — Application"]
        F[Client Request] -->|HTTP/gRPC| G[L7 Load Balancer]
        G -->|/api/*| H[API Server]
        G -->|/static/*| I[CDN/Static Server]
        G -->|/ws/*| J[WebSocket Server]
    end

    style L4 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L7 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style G fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#2c3e50,stroke:#fff,color:#fff
    style I fill:#2c3e50,stroke:#fff,color:#fff
    style J fill:#2c3e50,stroke:#fff,color:#fff

So sánh luồng xử lý của Layer 4 và Layer 7 Load Balancer

Tiêu chí	Layer 4 (Transport)	Layer 7 (Application)
Hoạt động tại	TCP/UDP — chỉ thấy IP và Port	HTTP/HTTPS/gRPC — đọc được header, URL, cookie
Routing	Dựa trên IP nguồn/đích, port	Dựa trên URL path, hostname, header, query string
Hiệu năng	Cực nhanh — không parse payload	Chậm hơn — phải decrypt TLS, parse HTTP
SSL Termination	TLS passthrough (không decrypt)	Decrypt tại LB, re-encrypt nếu cần
Connection Pooling	Không — forward trực tiếp TCP stream	Có — multiplex nhiều client qua ít backend connection
Use case	Database, game server, IoT, streaming	Web app, API, microservices, gRPC
Ví dụ	AWS NLB, HAProxy TCP mode, IPVS	AWS ALB, NGINX, HAProxy HTTP mode, Envoy

Thực tế triển khai

Hầu hết kiến trúc production dùng cả hai tầng: L4 ở edge để phân tải nhanh vào các cụm L7, sau đó L7 thực hiện content-based routing chi tiết. Ví dụ: AWS NLB (L4) → ALB (L7), hay Google Maglev (L4) → Envoy (L7).

6 thuật toán Load Balancing phổ biến nhất

1. Round Robin — Đơn giản nhưng hiệu quả

Round Robin

Độ phức tạp: O(1) | Stateless | Mặc định của NGINX và HAProxy

Phân phối request tuần tự theo vòng tròn: Server A → B → C → A → B → C... Không cần theo dõi trạng thái server, cực kỳ đơn giản và hiệu quả khi các server có cấu hình đồng đều.

✓ Ưu điểm

Đơn giản, không cần state
Phân bổ đều theo thời gian
Hiệu năng O(1)

✗ Nhược điểm

Bỏ qua khác biệt tải thực tế
Không phù hợp request có thời gian xử lý chênh lệch lớn

2. Weighted Round Robin — Khi server không đều

Weighted Round Robin

Độ phức tạp: O(1) | Semi-stateless | Cần cấu hình trọng số

Gán trọng số cho mỗi server dựa trên năng lực. Server mạnh (weight=5) nhận gấp 5 lần request so với server yếu (weight=1). Phù hợp khi fleet gồm nhiều loại máy khác nhau.

# NGINX config
upstream backend {
    server app1.example.com weight=5;  # 16 CPU, 64GB RAM
    server app2.example.com weight=3;  # 8 CPU, 32GB RAM
    server app3.example.com weight=1;  # 2 CPU, 8GB RAM
}

3. Least Connections — Thích ứng theo tải thực

Least Connections

Độ phức tạp: O(n) hoặc O(log n) với heap | Stateful

Gửi request đến server có ít connection đang hoạt động nhất. Thuật toán này thông minh hơn Round Robin vì nó phản ứng theo tải thực tế — server đang bận sẽ tự nhiên nhận ít request hơn.

✓ Ưu điểm

Thích ứng với request có thời gian xử lý khác nhau
Tự cân bằng khi server chậm
Lý tưởng cho WebSocket, long-polling

✗ Nhược điểm

Cần theo dõi trạng thái mỗi connection
Overhead cao hơn Round Robin

4. IP Hash — Session Affinity đơn giản

IP Hash

Độ phức tạp: O(1) | Deterministic

Hash IP của client để xác định server đích. Cùng một IP luôn được chuyển đến cùng một server — đảm bảo session stickiness mà không cần cookie hay shared session store.

# NGINX config
upstream backend {
    ip_hash;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}

Cẩn thận với NAT

Nếu nhiều client chia sẻ cùng IP (qua NAT/proxy), tất cả sẽ đổ vào 1 server → phân tải không đều. Trong enterprise network, đây là vấn đề thường gặp.

5. Consistent Hashing — Vua của Distributed Cache

Consistent Hashing

Độ phức tạp: O(log n) lookup | Virtual Nodes cải thiện phân bổ

Sử dụng hash ring — cả server và request key đều được hash lên một vòng tròn. Request được gửi đến server gần nhất theo chiều kim đồng hồ. Khi thêm/bớt server, chỉ ~1/n key bị ảnh hưởng thay vì phải remap toàn bộ.

graph TB
    subgraph Ring["Hash Ring — Consistent Hashing"]
        direction TB
        N1["Server A
position: 0°"]
        N2["Server B
position: 120°"]
        N3["Server C
position: 240°"]
        K1["Key 'user:42'
→ Server A"]
        K2["Key 'session:99'
→ Server B"]
        K3["Key 'cart:17'
→ Server C"]
    end

    K1 -.->|hash → 35°| N1
    K2 -.->|hash → 155°| N2
    K3 -.->|hash → 280°| N3

    style Ring fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style N1 fill:#e94560,stroke:#fff,color:#fff
    style N2 fill:#e94560,stroke:#fff,color:#fff
    style N3 fill:#e94560,stroke:#fff,color:#fff
    style K1 fill:#2c3e50,stroke:#fff,color:#fff
    style K2 fill:#2c3e50,stroke:#fff,color:#fff
    style K3 fill:#2c3e50,stroke:#fff,color:#fff

Hash Ring với 3 server — key được route đến server gần nhất theo chiều kim đồng hồ

Virtual Nodes là kỹ thuật quan trọng để cải thiện phân bổ đều. Thay vì mỗi server chỉ có 1 vị trí trên ring, ta tạo 100-200 vị trí ảo (virtual node) cho mỗi server vật lý. Điều này giúp:

Phân bổ key đều hơn đáng kể
Khi 1 server fail, tải phân tán đều sang nhiều server thay vì dồn hết vào 1 server kế tiếp
Amazon DynamoDB, Apache Cassandra, và ScyllaDB đều sử dụng kỹ thuật này

// Consistent Hashing với virtual nodes — pseudo code
class ConsistentHash {
    private ring: SortedMap<int, string> = new SortedMap();
    private virtualNodes: int = 150;

    addServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            let hash = md5(`${server}:${i}`);
            this.ring.set(hash, server);
        }
    }

    getServer(key: string): string {
        let hash = md5(key);
        // Tìm node gần nhất theo chiều kim đồng hồ
        let entry = this.ring.ceilingEntry(hash);
        return entry ? entry.value : this.ring.firstEntry().value;
    }

    removeServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            this.ring.delete(md5(`${server}:${i}`));
        }
        // Chỉ ~1/n key bị remap — không ảnh hưởng toàn bộ
    }
}

6. Random Two Choices — Thuật toán "vừa đủ thông minh"

Power of Two Random Choices

Độ phức tạp: O(1) | Near-optimal distribution

Chọn ngẫu nhiên 2 server, rồi gửi request đến server có ít connection hơn. Nghe đơn giản nhưng theo lý thuyết xác suất, thuật toán này đạt phân bổ gần tối ưu — từ O(log n) connection tối đa xuống O(log log n) so với random thuần.

# NGINX Plus config
upstream backend {
    random two least_conn;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
    server app4.example.com;
}

So sánh toàn diện các thuật toán

Thuật toán	Stateful?	Session Sticky?	Use case chính	Khi nào KHÔNG dùng
Round Robin	Không	Không	Stateless API, microservices đồng đều	Server cấu hình khác nhau
Weighted RR	Không	Không	Fleet hỗn hợp (on-prem + cloud)	Tải biến động mạnh
Least Connections	Có	Không	WebSocket, long-running request	Request cực ngắn và đồng đều
IP Hash	Không	Có	Legacy app cần session affinity	Nhiều client sau NAT
Consistent Hash	Không	Có	Distributed cache, sharded DB	Stateless service đơn giản
Random Two Choices	Có (nhẹ)	Không	Cluster lớn, cần near-optimal	Cluster nhỏ (<4 server)

Công cụ Load Balancing phổ biến

NGINX — Reverse Proxy kiêm Load Balancer

NGINX là lựa chọn phổ biến nhất cho L7 Load Balancing nhờ hiệu năng cao (xử lý hàng triệu concurrent connection), cấu hình đơn giản, và hệ sinh thái module phong phú.

# nginx.conf — Load Balancing hoàn chỉnh
http {
    upstream api_servers {
        least_conn;
        server 10.0.1.10:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:8080 weight=1 backup;  # Chỉ dùng khi 2 server trên fail
    }

    server {
        listen 443 ssl http2;
        server_name api.example.com;

        # SSL Termination
        ssl_certificate     /etc/ssl/certs/api.crt;
        ssl_certificate_key /etc/ssl/private/api.key;

        # Health check implicit qua max_fails/fail_timeout
        location /api/ {
            proxy_pass http://api_servers;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $host;

            # Timeout config
            proxy_connect_timeout 5s;
            proxy_read_timeout 60s;
            proxy_send_timeout 10s;
        }
    }
}

HAProxy — Load Balancer chuyên dụng

HAProxy nổi tiếng với khả năng xử lý L4 và L7, được thiết kế chuyên biệt cho load balancing với active health check, chi tiết metrics, và hiệu năng ấn tượng.

# haproxy.cfg
frontend http_front
    bind *:443 ssl crt /etc/ssl/api.pem
    mode http

    # Content-based routing
    acl is_api path_beg /api
    acl is_ws  hdr(Upgrade) -i websocket

    use_backend api_servers if is_api
    use_backend ws_servers  if is_ws
    default_backend static_servers

backend api_servers
    mode http
    balance leastconn
    option httpchk GET /health
    http-check expect status 200

    server api1 10.0.1.10:8080 check inter 5s fall 3 rise 2
    server api2 10.0.1.11:8080 check inter 5s fall 3 rise 2

backend ws_servers
    mode http
    balance source   # IP Hash cho WebSocket sticky
    timeout tunnel 3600s
    server ws1 10.0.2.10:8080 check
    server ws2 10.0.2.11:8080 check

Cloud Load Balancer — Managed và Auto-scaling

Dịch vụ	Layer	Free Tier	Điểm mạnh
AWS ALB	L7	750h/tháng (12 tháng đầu)	Path-based routing, gRPC, WebSocket
AWS NLB	L4	750h/tháng (12 tháng đầu)	Ultra-low latency, static IP, TLS passthrough
Azure Load Balancer	L4	Basic SKU miễn phí	Zone-redundant, HA Ports
Azure App Gateway	L7	Không (từ ~$18/tháng)	WAF tích hợp, SSL offloading, URL rewrite
Cloudflare LB	L7	Không (từ $5/tháng)	330+ PoPs, Geo-steering, health check toàn cầu

Health Check và Failover — Bộ đệm an toàn

Load Balancer mà không có health check thì chẳng khác gì giao thông mà không có đèn tín hiệu. Health check cho phép LB tự động loại bỏ server chết và đưa lại khi server phục hồi.

sequenceDiagram
    participant LB as Load Balancer
    participant S1 as Server A (healthy)
    participant S2 as Server B (failing)
    participant S3 as Server C (healthy)

    loop Health Check (mỗi 5s)
        LB->>S1: GET /health
        S1-->>LB: 200 OK ✓
        LB->>S2: GET /health
        S2-->>LB: 503 Error ✗
        LB->>S3: GET /health
        S3-->>LB: 200 OK ✓
    end

    Note over LB,S2: Server B fail 3 lần liên tiếp → đánh dấu DOWN

    LB->>S1: Route traffic (50%)
    LB->>S3: Route traffic (50%)
    Note over S2: Không nhận traffic

    S2-->>LB: 200 OK ✓ (sau 2 lần check thành công)
    Note over LB,S2: Server B phục hồi → đưa lại vào pool

Luồng Health Check: phát hiện server fail → loại khỏi pool → tự động phục hồi

Có 3 loại health check phổ biến:

Active Health Check: LB chủ động gửi request kiểm tra (HTTP GET /health, TCP connect, hoặc custom script). HAProxy và cloud LB hỗ trợ mặc định.
Passive Health Check: LB theo dõi response từ traffic thực — nếu server trả lỗi liên tục (ví dụ 5 lần 5xx trong 30s), tự động đánh dấu down. NGINX Open Source chỉ hỗ trợ loại này.
Deep Health Check: Kiểm tra cả dependency (database connection, disk space, memory). Trả về chi tiết qua endpoint /health/detailed.

// ASP.NET — Deep Health Check
// Program.cs
builder.Services.AddHealthChecks()
    .AddSqlServer(connectionString, name: "database")
    .AddRedis(redisConnection, name: "cache")
    .AddCheck("disk-space", () =>
    {
        var drive = new DriveInfo("C");
        return drive.AvailableFreeSpace > 1_073_741_824  // > 1GB
            ? HealthCheckResult.Healthy()
            : HealthCheckResult.Degraded("Low disk space");
    });

app.MapHealthChecks("/health", new HealthCheckOptions
{
    Predicate = _ => true,
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

Kiến trúc triển khai thực tế

Pattern 1: Single-tier LB — Cho hệ thống vừa và nhỏ

graph LR
    Client[Client] --> LB[NGINX / HAProxy
L7 Load Balancer]
    LB --> S1[App Server 1]
    LB --> S2[App Server 2]
    LB --> S3[App Server 3]
    S1 --> DB[(Database)]
    S2 --> DB
    S3 --> DB

    style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LB fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style DB fill:#4CAF50,stroke:#fff,color:#fff

Single-tier: đơn giản, dễ vận hành, phù hợp ~10K RPS

Pattern 2: Two-tier LB — Cho hệ thống lớn

graph TB
    Client[Client] --> DNS[DNS / GeoDNS]
    DNS --> L4A[NLB - L4
Region A]
    DNS --> L4B[NLB - L4
Region B]

    L4A --> L7A1[ALB/NGINX - L7]
    L4A --> L7A2[ALB/NGINX - L7]

    L4B --> L7B1[ALB/NGINX - L7]

    L7A1 --> API1[API Pods]
    L7A2 --> WEB1[Web Pods]
    L7B1 --> API2[API Pods]

    style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DNS fill:#ff9800,stroke:#fff,color:#fff
    style L4A fill:#e94560,stroke:#fff,color:#fff
    style L4B fill:#e94560,stroke:#fff,color:#fff
    style L7A1 fill:#2c3e50,stroke:#fff,color:#fff
    style L7A2 fill:#2c3e50,stroke:#fff,color:#fff
    style L7B1 fill:#2c3e50,stroke:#fff,color:#fff
    style API1 fill:#4CAF50,stroke:#fff,color:#fff
    style WEB1 fill:#4CAF50,stroke:#fff,color:#fff
    style API2 fill:#4CAF50,stroke:#fff,color:#fff

Two-tier: L4 edge → L7 routing, hỗ trợ multi-region

Pattern 3: Global Load Balancing — Cho hệ thống toàn cầu

Khi user phân tán khắp thế giới, bạn cần phân tải ở tầng DNS. GeoDNS hoặc Anycast routing sẽ đưa user đến datacenter gần nhất. Cloudflare, AWS Route 53, và Azure Traffic Manager đều hỗ trợ pattern này.

3 chiến lược Global LB

Geo-proximity: Route đến datacenter gần nhất về mặt địa lý. Đơn giản, giảm latency hiệu quả.
Latency-based: Đo latency thực tế từ user đến mỗi region, route đến region nhanh nhất. Chính xác hơn geo-proximity.
Failover: Active-passive — 100% traffic vào primary region. Khi primary down, chuyển toàn bộ sang secondary.

Các lỗi thường gặp và cách khắc phục

1. Thundering Herd khi server phục hồi

Khi server vừa được đưa lại vào pool sau health check pass, nếu dùng Least Connections, tất cả request mới sẽ đổ dồn vào server đó (vì nó có 0 connection). Giải pháp: Slow Start — tăng dần trọng số của server mới phục hồi trong 30-60 giây.

# HAProxy slow start
backend api_servers
    server api1 10.0.1.10:8080 check slowstart 60s

2. Session Affinity làm mất cân bằng

Sticky session (qua cookie hoặc IP hash) có thể khiến 1 server nhận phần lớn traffic nếu "heavy user" tập trung vào cùng server. Giải pháp: chuyển sang stateless architecture — lưu session vào shared store (database hoặc in-memory cache) thay vì giữ trên server.

3. Health check quá nhạy hoặc quá chậm

Quá nhạy (interval=1s, fall=1): server bị loại chỉ vì 1 request timeout → flapping liên tục. Quá chậm (interval=30s, fall=5): mất 2.5 phút mới phát hiện server chết. Khuyến nghị: interval=5s, fall=3, rise=2 — phát hiện trong 15s, xác nhận phục hồi trong 10s.

Load Balancing trong Kubernetes

Kubernetes có hệ thống load balancing riêng qua Service và Ingress. Hiểu được cách chúng hoạt động giúp tránh trùng lặp hoặc xung đột với external LB.

Component	Layer	Phạm vi	Thuật toán mặc định
kube-proxy (iptables)	L4	Trong cluster	Random (probability-based)
kube-proxy (IPVS)	L4	Trong cluster	Round Robin (hỗ trợ Least Conn, Source Hash...)
Ingress Controller	L7	External → cluster	Tùy controller (NGINX, Traefik, Envoy)
Service type LoadBalancer	L4	External → cluster	Cloud provider LB (ALB, NLB...)
Gateway API	L4/L7	External → cluster	Tùy implementation, linh hoạt hơn Ingress

Checklist triển khai Load Balancer

Production Readiness Checklist

Chọn thuật toán phù hợp workload — Round Robin cho stateless, Least Connections cho variable latency, Consistent Hash cho cache
Cấu hình health check — active check với endpoint /health, interval 5s, fail threshold 3
SSL Termination — terminate TLS tại LB để giảm tải cho backend
Logging & Monitoring — track request count, latency p50/p95/p99, error rate, active connections per backend
High Availability cho LB — LB cũng cần redundancy: VRRP (keepalived), hoặc dùng managed cloud LB
Rate Limiting — bảo vệ backend khỏi traffic spike bất thường
Connection Draining — khi loại server khỏi pool, chờ request đang xử lý hoàn thành (graceful shutdown)
Timeout hợp lý — connect timeout ngắn (5s), read timeout phù hợp SLA (30-60s)

Tổng kết

Load Balancing không chỉ là "chia đều request" — đó là nghệ thuật phân phối tải sao cho hệ thống vừa nhanh, vừa ổn định, vừa chịu được sự cố. Không có thuật toán "tốt nhất" — chỉ có thuật toán phù hợp nhất cho bối cảnh cụ thể:

Stateless API → Round Robin hoặc Random Two Choices
WebSocket / Long-running → Least Connections
Distributed Cache → Consistent Hashing
Legacy session-based app → IP Hash (tạm thời, nên migrate sang stateless)
Multi-region → Global LB (GeoDNS) + Regional L4/L7

Hãy bắt đầu đơn giản với Round Robin, thêm health check, và chỉ phức tạp hóa khi có nhu cầu thực sự. Over-engineering load balancing từ đầu là một trong những lỗi phổ biến nhất trong System Design.

Tham khảo

#Load Balancing #system design #NGINX #HAProxy #High Availability #Microservices #Cloud Architecture

# Load Balancing: Nghệ thuật Phân tải cho Hệ thống Triệu Request

## Load Balancing là gì và tại sao quan trọng?

<1ms Cold start trung bình của L4 LB

99.99% Uptime SLA của cloud LB

10M+ Requests/s xử lý bởi NGINX

330+ PoPs của Cloudflare LB

#### Khi nào cần Load Balancer?

## Layer 4 vs Layer 7: Hai trường phái phân tải

subgraph L7["Layer 7 — Application"]
        F[Client Request] -->|HTTP/gRPC| G[L7 Load Balancer]
        G -->|/api/*| H[API Server]
        G -->|/static/*| I[CDN/Static Server]
        G -->|/ws/*| J[WebSocket Server]
    end

style L4 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L7 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style G fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#2c3e50,stroke:#fff,color:#fff
    style I fill:#2c3e50,stroke:#fff,color:#fff
    style J fill:#2c3e50,stroke:#fff,color:#fff

```
So sánh luồng xử lý của Layer 4 và Layer 7 Load Balancer

| Tiêu chí | Layer 4 (Transport) | Layer 7 (Application) |
| --- | --- | --- |
| **Hoạt động tại** | TCP/UDP — chỉ thấy IP và Port | HTTP/HTTPS/gRPC — đọc được header, URL, cookie |
| **Routing** | Dựa trên IP nguồn/đích, port | Dựa trên URL path, hostname, header, query string |
| **Hiệu năng** | Cực nhanh — không parse payload | Chậm hơn — phải decrypt TLS, parse HTTP |
| **SSL Termination** | TLS passthrough (không decrypt) | Decrypt tại LB, re-encrypt nếu cần |
| **Connection Pooling** | Không — forward trực tiếp TCP stream | Có — multiplex nhiều client qua ít backend connection |
| **Use case** | Database, game server, IoT, streaming | Web app, API, microservices, gRPC |
| **Ví dụ** | AWS NLB, HAProxy TCP mode, IPVS | AWS ALB, NGINX, HAProxy HTTP mode, Envoy |

#### Thực tế triển khai

Hầu hết kiến trúc production dùng **cả hai tầng**: L4 ở edge để phân tải nhanh vào các cụm L7, sau đó L7 thực hiện content-based routing chi tiết. Ví dụ: AWS NLB (L4) → ALB (L7), hay Google Maglev (L4) → Envoy (L7).

## 6 thuật toán Load Balancing phổ biến nhất

### 1. Round Robin — Đơn giản nhưng hiệu quả

#### Round Robin

Độ phức tạp: O(1) | Stateless | Mặc định của NGINX và HAProxy

**✓ Ưu điểm**

- Đơn giản, không cần state
- Phân bổ đều theo thời gian
- Hiệu năng O(1)

**✗ Nhược điểm**

- Bỏ qua khác biệt tải thực tế
- Không phù hợp request có thời gian xử lý chênh lệch lớn

### 2. Weighted Round Robin — Khi server không đều

#### Weighted Round Robin

Độ phức tạp: O(1) | Semi-stateless | Cần cấu hình trọng số

```
# NGINX config
upstream backend {
    server app1.example.com weight=5;  # 16 CPU, 64GB RAM
    server app2.example.com weight=3;  # 8 CPU, 32GB RAM
    server app3.example.com weight=1;  # 2 CPU, 8GB RAM
}
```

### 3. Least Connections — Thích ứng theo tải thực

#### Least Connections

Độ phức tạp: O(n) hoặc O(log n) với heap | Stateful

Gửi request đến server có ít connection đang hoạt động nhất. Thuật toán này thông minh hơn Round Robin vì nó *phản ứng theo tải thực tế* — server đang bận sẽ tự nhiên nhận ít request hơn.

**✓ Ưu điểm**

- Thích ứng với request có thời gian xử lý khác nhau
- Tự cân bằng khi server chậm
- Lý tưởng cho WebSocket, long-polling

**✗ Nhược điểm**

- Cần theo dõi trạng thái mỗi connection
- Overhead cao hơn Round Robin

### 4. IP Hash — Session Affinity đơn giản

#### IP Hash

Độ phức tạp: O(1) | Deterministic

```
# NGINX config
upstream backend {
    ip_hash;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}
```

#### Cẩn thận với NAT

Nếu nhiều client chia sẻ cùng IP (qua NAT/proxy), tất cả sẽ đổ vào 1 server → phân tải không đều. Trong enterprise network, đây là vấn đề thường gặp.

### 5. Consistent Hashing — Vua của Distributed Cache

#### Consistent Hashing

Độ phức tạp: O(log n) lookup | Virtual Nodes cải thiện phân bổ

```
graph TB
    subgraph Ring["Hash Ring — Consistent Hashing"]
        direction TB
        N1["Server A  
position: 0°"]
        N2["Server B  
position: 120°"]
        N3["Server C  
position: 240°"]
        K1["Key 'user:42'  
→ Server A"]
        K2["Key 'session:99'  
→ Server B"]
        K3["Key 'cart:17'  
→ Server C"]
    end

K1 -.->|hash → 35°| N1
    K2 -.->|hash → 155°| N2
    K3 -.->|hash → 280°| N3

style Ring fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style N1 fill:#e94560,stroke:#fff,color:#fff
    style N2 fill:#e94560,stroke:#fff,color:#fff
    style N3 fill:#e94560,stroke:#fff,color:#fff
    style K1 fill:#2c3e50,stroke:#fff,color:#fff
    style K2 fill:#2c3e50,stroke:#fff,color:#fff
    style K3 fill:#2c3e50,stroke:#fff,color:#fff

```
Hash Ring với 3 server — key được route đến server gần nhất theo chiều kim đồng hồ

**Virtual Nodes** là kỹ thuật quan trọng để cải thiện phân bổ đều. Thay vì mỗi server chỉ có 1 vị trí trên ring, ta tạo 100-200 vị trí ảo (virtual node) cho mỗi server vật lý. Điều này giúp:

- Phân bổ key đều hơn đáng kể
- Khi 1 server fail, tải phân tán đều sang nhiều server thay vì dồn hết vào 1 server kế tiếp
- Amazon DynamoDB, Apache Cassandra, và ScyllaDB đều sử dụng kỹ thuật này

```
// Consistent Hashing với virtual nodes — pseudo code
class ConsistentHash {
    private ring: SortedMap<int, string> = new SortedMap();
    private virtualNodes: int = 150;

addServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            let hash = md5(`${server}:${i}`);
            this.ring.set(hash, server);
        }
    }

getServer(key: string): string {
        let hash = md5(key);
        // Tìm node gần nhất theo chiều kim đồng hồ
        let entry = this.ring.ceilingEntry(hash);
        return entry ? entry.value : this.ring.firstEntry().value;
    }

removeServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            this.ring.delete(md5(`${server}:${i}`));
        }
        // Chỉ ~1/n key bị remap — không ảnh hưởng toàn bộ
    }
}
```

### 6. Random Two Choices — Thuật toán "vừa đủ thông minh"

#### Power of Two Random Choices

Độ phức tạp: O(1) | Near-optimal distribution

Chọn ngẫu nhiên 2 server, rồi gửi request đến server có ít connection hơn. Nghe đơn giản nhưng theo lý thuyết xác suất, thuật toán này đạt phân bổ gần tối ưu — **từ O(log n) connection tối đa xuống O(log log n)** so với random thuần.

```
# NGINX Plus config
upstream backend {
    random two least_conn;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
    server app4.example.com;
}
```

## So sánh toàn diện các thuật toán

| Thuật toán | Stateful? | Session Sticky? | Use case chính | Khi nào KHÔNG dùng |
| --- | --- | --- | --- | --- |
| **Round Robin** | Không | Không | Stateless API, microservices đồng đều | Server cấu hình khác nhau |
| **Weighted RR** | Không | Không | Fleet hỗn hợp (on-prem + cloud) | Tải biến động mạnh |
| **Least Connections** | Có | Không | WebSocket, long-running request | Request cực ngắn và đồng đều |
| **IP Hash** | Không | Có | Legacy app cần session affinity | Nhiều client sau NAT |
| **Consistent Hash** | Không | Có | Distributed cache, sharded DB | Stateless service đơn giản |
| **Random Two Choices** | Có (nhẹ) | Không | Cluster lớn, cần near-optimal | Cluster nhỏ (<4 server) |

## Công cụ Load Balancing phổ biến

### NGINX — Reverse Proxy kiêm Load Balancer

```
# nginx.conf — Load Balancing hoàn chỉnh
http {
    upstream api_servers {
        least_conn;
        server 10.0.1.10:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:8080 weight=1 backup;  # Chỉ dùng khi 2 server trên fail
    }

server {
        listen 443 ssl http2;
        server_name api.example.com;

# SSL Termination
        ssl_certificate     /etc/ssl/certs/api.crt;
        ssl_certificate_key /etc/ssl/private/api.key;

# Health check implicit qua max_fails/fail_timeout
        location /api/ {
            proxy_pass http://api_servers;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $host;

# Timeout config
            proxy_connect_timeout 5s;
            proxy_read_timeout 60s;
            proxy_send_timeout 10s;
        }
    }
}
```

### HAProxy — Load Balancer chuyên dụng

HAProxy nổi tiếng với khả năng xử lý L4 và L7, được thiết kế chuyên biệt cho load balancing với active health check, chi tiết metrics, và hiệu năng ấn tượng.

```
# haproxy.cfg
frontend http_front
    bind *:443 ssl crt /etc/ssl/api.pem
    mode http

# Content-based routing
    acl is_api path_beg /api
    acl is_ws  hdr(Upgrade) -i websocket

use_backend api_servers if is_api
    use_backend ws_servers  if is_ws
    default_backend static_servers

backend api_servers
    mode http
    balance leastconn
    option httpchk GET /health
    http-check expect status 200

server api1 10.0.1.10:8080 check inter 5s fall 3 rise 2
    server api2 10.0.1.11:8080 check inter 5s fall 3 rise 2

backend ws_servers
    mode http
    balance source   # IP Hash cho WebSocket sticky
    timeout tunnel 3600s
    server ws1 10.0.2.10:8080 check
    server ws2 10.0.2.11:8080 check
```

### Cloud Load Balancer — Managed và Auto-scaling

| Dịch vụ | Layer | Free Tier | Điểm mạnh |
| --- | --- | --- | --- |
| **AWS ALB** | L7 | 750h/tháng (12 tháng đầu) | Path-based routing, gRPC, WebSocket |
| **AWS NLB** | L4 | 750h/tháng (12 tháng đầu) | Ultra-low latency, static IP, TLS passthrough |
| **Azure Load Balancer** | L4 | Basic SKU miễn phí | Zone-redundant, HA Ports |
| **Azure App Gateway** | L7 | Không (từ ~$18/tháng) | WAF tích hợp, SSL offloading, URL rewrite |
| **Cloudflare LB** | L7 | Không (từ $5/tháng) | 330+ PoPs, Geo-steering, health check toàn cầu |

## Health Check và Failover — Bộ đệm an toàn

```
sequenceDiagram
    participant LB as Load Balancer
    participant S1 as Server A (healthy)
    participant S2 as Server B (failing)
    participant S3 as Server C (healthy)

loop Health Check (mỗi 5s)
        LB->>S1: GET /health
        S1-->>LB: 200 OK ✓
        LB->>S2: GET /health
        S2-->>LB: 503 Error ✗
        LB->>S3: GET /health
        S3-->>LB: 200 OK ✓
    end

Note over LB,S2: Server B fail 3 lần liên tiếp → đánh dấu DOWN

LB->>S1: Route traffic (50%)
    LB->>S3: Route traffic (50%)
    Note over S2: Không nhận traffic

S2-->>LB: 200 OK ✓ (sau 2 lần check thành công)
    Note over LB,S2: Server B phục hồi → đưa lại vào pool

```
Luồng Health Check: phát hiện server fail → loại khỏi pool → tự động phục hồi

Có 3 loại health check phổ biến:

- **Active Health Check**: LB chủ động gửi request kiểm tra (HTTP GET /health, TCP connect, hoặc custom script). HAProxy và cloud LB hỗ trợ mặc định.
- **Passive Health Check**: LB theo dõi response từ traffic thực — nếu server trả lỗi liên tục (ví dụ 5 lần 5xx trong 30s), tự động đánh dấu down. NGINX Open Source chỉ hỗ trợ loại này.
- **Deep Health Check**: Kiểm tra cả dependency (database connection, disk space, memory). Trả về chi tiết qua endpoint /health/detailed.

```
// ASP.NET — Deep Health Check
// Program.cs
builder.Services.AddHealthChecks()
    .AddSqlServer(connectionString, name: "database")
    .AddRedis(redisConnection, name: "cache")
    .AddCheck("disk-space", () =>
    {
        var drive = new DriveInfo("C");
        return drive.AvailableFreeSpace > 1_073_741_824  // > 1GB
            ? HealthCheckResult.Healthy()
            : HealthCheckResult.Degraded("Low disk space");
    });

app.MapHealthChecks("/health", new HealthCheckOptions
{
    Predicate = _ => true,
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
```

## Kiến trúc triển khai thực tế

### Pattern 1: Single-tier LB — Cho hệ thống vừa và nhỏ

```
graph LR
    Client[Client] --> LB[NGINX / HAProxy  
L7 Load Balancer]
    LB --> S1[App Server 1]
    LB --> S2[App Server 2]
    LB --> S3[App Server 3]
    S1 --> DB[(Database)]
    S2 --> DB
    S3 --> DB

style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LB fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style DB fill:#4CAF50,stroke:#fff,color:#fff

```
Single-tier: đơn giản, dễ vận hành, phù hợp ~10K RPS

### Pattern 2: Two-tier LB — Cho hệ thống lớn

```
graph TB
    Client[Client] --> DNS[DNS / GeoDNS]
    DNS --> L4A[NLB - L4  
Region A]
    DNS --> L4B[NLB - L4  
Region B]

L4A --> L7A1[ALB/NGINX - L7]
    L4A --> L7A2[ALB/NGINX - L7]

L4B --> L7B1[ALB/NGINX - L7]

L7A1 --> API1[API Pods]
    L7A2 --> WEB1[Web Pods]
    L7B1 --> API2[API Pods]

style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DNS fill:#ff9800,stroke:#fff,color:#fff
    style L4A fill:#e94560,stroke:#fff,color:#fff
    style L4B fill:#e94560,stroke:#fff,color:#fff
    style L7A1 fill:#2c3e50,stroke:#fff,color:#fff
    style L7A2 fill:#2c3e50,stroke:#fff,color:#fff
    style L7B1 fill:#2c3e50,stroke:#fff,color:#fff
    style API1 fill:#4CAF50,stroke:#fff,color:#fff
    style WEB1 fill:#4CAF50,stroke:#fff,color:#fff
    style API2 fill:#4CAF50,stroke:#fff,color:#fff

```
Two-tier: L4 edge → L7 routing, hỗ trợ multi-region

### Pattern 3: Global Load Balancing — Cho hệ thống toàn cầu

#### 3 chiến lược Global LB

- **Geo-proximity**: Route đến datacenter gần nhất về mặt địa lý. Đơn giản, giảm latency hiệu quả.
- **Latency-based**: Đo latency thực tế từ user đến mỗi region, route đến region nhanh nhất. Chính xác hơn geo-proximity.
- **Failover**: Active-passive — 100% traffic vào primary region. Khi primary down, chuyển toàn bộ sang secondary.

## Các lỗi thường gặp và cách khắc phục

#### 1. Thundering Herd khi server phục hồi

Khi server vừa được đưa lại vào pool sau health check pass, nếu dùng Least Connections, tất cả request mới sẽ đổ dồn vào server đó (vì nó có 0 connection). Giải pháp: **Slow Start** — tăng dần trọng số của server mới phục hồi trong 30-60 giây.

```
# HAProxy slow start
backend api_servers
    server api1 10.0.1.10:8080 check slowstart 60s
```

#### 2. Session Affinity làm mất cân bằng

Sticky session (qua cookie hoặc IP hash) có thể khiến 1 server nhận phần lớn traffic nếu "heavy user" tập trung vào cùng server. Giải pháp: chuyển sang **stateless architecture** — lưu session vào shared store (database hoặc in-memory cache) thay vì giữ trên server.

#### 3. Health check quá nhạy hoặc quá chậm

**Quá nhạy** (interval=1s, fall=1): server bị loại chỉ vì 1 request timeout → flapping liên tục. **Quá chậm** (interval=30s, fall=5): mất 2.5 phút mới phát hiện server chết. Khuyến nghị: `interval=5s, fall=3, rise=2` — phát hiện trong 15s, xác nhận phục hồi trong 10s.

## Load Balancing trong Kubernetes

Kubernetes có hệ thống load balancing riêng qua Service và Ingress. Hiểu được cách chúng hoạt động giúp tránh trùng lặp hoặc xung đột với external LB.

| Component | Layer | Phạm vi | Thuật toán mặc định |
| --- | --- | --- | --- |
| **kube-proxy (iptables)** | L4 | Trong cluster | Random (probability-based) |
| **kube-proxy (IPVS)** | L4 | Trong cluster | Round Robin (hỗ trợ Least Conn, Source Hash...) |
| **Ingress Controller** | L7 | External → cluster | Tùy controller (NGINX, Traefik, Envoy) |
| **Service type LoadBalancer** | L4 | External → cluster | Cloud provider LB (ALB, NLB...) |
| **Gateway API** | L4/L7 | External → cluster | Tùy implementation, linh hoạt hơn Ingress |

## Checklist triển khai Load Balancer

#### Production Readiness Checklist

1. **Chọn thuật toán** phù hợp workload — Round Robin cho stateless, Least Connections cho variable latency, Consistent Hash cho cache
2. **Cấu hình health check** — active check với endpoint /health, interval 5s, fail threshold 3
3. **SSL Termination** — terminate TLS tại LB để giảm tải cho backend
4. **Logging & Monitoring** — track request count, latency p50/p95/p99, error rate, active connections per backend
5. **High Availability cho LB** — LB cũng cần redundancy: VRRP (keepalived), hoặc dùng managed cloud LB
6. **Rate Limiting** — bảo vệ backend khỏi traffic spike bất thường
7. **Connection Draining** — khi loại server khỏi pool, chờ request đang xử lý hoàn thành (graceful shutdown)
8. **Timeout hợp lý** — connect timeout ngắn (5s), read timeout phù hợp SLA (30-60s)

## Tổng kết

- **Stateless API** → Round Robin hoặc Random Two Choices
- **WebSocket / Long-running** → Least Connections
- **Distributed Cache** → Consistent Hashing
- **Legacy session-based app** → IP Hash (tạm thời, nên migrate sang stateless)
- **Multi-region** → Global LB (GeoDNS) + Regional L4/L7

## Tham khảo

- [NGINX HTTP Load Balancing Documentation](https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/)
- [Understanding Load Balancing Algorithms and Strategies (2026)](https://oneuptime.com/blog/post/2026-02-20-load-balancing-algorithms/view)
- [Consistent Hashing Explained — AlgoMaster](https://blog.algomaster.io/p/consistent-hashing-explained)
- [Layer 4 vs Layer 7: Load Balancing and Why It Matters — CloudRPS](https://cloudrps.com/blog/layer-4-vs-layer-7)
- [ALB vs NLB: Which AWS Load Balancer Fits Your Needs?](https://blog.cloudcraft.co/alb-vs-nlb-which-aws-load-balancer-fits-your-needs/)
- [Edge Computing: Cloudflare Workers Development Guide 2026](https://www.digitalapplied.com/blog/edge-computing-cloudflare-workers-development-guide-2026)

gRPC và Protocol Buffers trên .NET 10 — Giao tiếp microservices hiệu năng cao

Domain-Driven Design thực chiến trên .NET 10 — Aggregate, Domain Event và Bounded Context

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.