Load Balancing: The Art of Traffic Distribution for Million-Request Systems

Posted on: 4/20/2026 7:40:51 AM

Table of contents

What Is Load Balancing and Why Does It Matter?
1. When do you need a Load Balancer?
Layer 4 vs Layer 7: Two Schools of Load Balancing
1. Production practice
The 6 Most Popular Load Balancing Algorithms
Algorithm Comparison at a Glance
Common Load Balancing Tools
Health Checks and Failover — The Safety Net
Real-World Deployment Architectures
Common Pitfalls and How to Fix Them
Load Balancing in Kubernetes
Load Balancer Deployment Checklist
1. Production Readiness Checklist
Conclusion
References

What Is Load Balancing and Why Does It Matter?

Imagine you run a restaurant with 10 tables. If every customer ended up sitting at one table while the other 9 sat empty, the experience would be terrible. A Load Balancer is the restaurant host — distributing customers evenly across tables so nobody waits too long.

In software architecture, a Load Balancer is the component that distributes client traffic across multiple backend servers to optimize performance, increase availability, and ensure no single server is overloaded.

<1ms Average cold start for an L4 LB

99.99% Cloud LB uptime SLA

10M+ Requests/s handled by NGINX

330+ PoPs for Cloudflare LB

When do you need a Load Balancer?

As soon as your system has 2 or more instances, you need a Load Balancer. But its role isn't just "splitting requests evenly" — it also performs health checks, SSL termination, rate limiting, and is the first line of defense against DDoS.

Layer 4 vs Layer 7: Two Schools of Load Balancing

This is the first architectural decision when choosing a Load Balancer. The difference lies in the OSI layer where the LB operates, and that directly affects routing capabilities, performance, and cost.

graph TB
    subgraph L4["Layer 4 — Transport"]
        A[Client Request] -->|TCP/UDP| B[L4 Load Balancer]
        B -->|IP + Port| C[Server A]
        B -->|IP + Port| D[Server B]
        B -->|IP + Port| E[Server C]
    end

    subgraph L7["Layer 7 — Application"]
        F[Client Request] -->|HTTP/gRPC| G[L7 Load Balancer]
        G -->|/api/*| H[API Server]
        G -->|/static/*| I[CDN/Static Server]
        G -->|/ws/*| J[WebSocket Server]
    end

    style L4 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L7 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style G fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#2c3e50,stroke:#fff,color:#fff
    style I fill:#2c3e50,stroke:#fff,color:#fff
    style J fill:#2c3e50,stroke:#fff,color:#fff

Comparison of Layer 4 vs Layer 7 Load Balancer request flows

Criterion	Layer 4 (Transport)	Layer 7 (Application)
Operates at	TCP/UDP — sees only IP and Port	HTTP/HTTPS/gRPC — reads headers, URLs, cookies
Routing	Based on source/destination IP and port	Based on URL path, hostname, headers, query string
Performance	Extremely fast — doesn't parse the payload	Slower — must decrypt TLS, parse HTTP
SSL Termination	TLS passthrough (no decryption)	Decrypts at the LB, re-encrypts if needed
Connection Pooling	No — forwards the TCP stream directly	Yes — multiplexes many clients over few backend connections
Use case	Databases, game servers, IoT, streaming	Web apps, APIs, microservices, gRPC
Examples	AWS NLB, HAProxy TCP mode, IPVS	AWS ALB, NGINX, HAProxy HTTP mode, Envoy

Production practice

Most production architectures use both layers: L4 at the edge to quickly distribute traffic into L7 clusters, then L7 performs detailed content-based routing. For example: AWS NLB (L4) → ALB (L7), or Google Maglev (L4) → Envoy (L7).

The 6 Most Popular Load Balancing Algorithms

1. Round Robin — Simple but Effective

Round Robin

Complexity: O(1) | Stateless | Default for NGINX and HAProxy

Distribute requests sequentially in a loop: Server A → B → C → A → B → C... No need to track server state, extremely simple and effective when servers have uniform capacity.

✓ Pros

Simple, stateless
Evenly distributed over time
O(1) performance

✗ Cons

Ignores real load differences
Not ideal for requests with very uneven processing time

2. Weighted Round Robin — When Servers Aren't Equal

Weighted Round Robin

Complexity: O(1) | Semi-stateless | Requires weight configuration

Assign weights to each server based on capacity. A powerful server (weight=5) receives 5× more requests than a weaker one (weight=1). Great for fleets with mixed machine types.

# NGINX config
upstream backend {
    server app1.example.com weight=5;  # 16 CPU, 64GB RAM
    server app2.example.com weight=3;  # 8 CPU, 32GB RAM
    server app3.example.com weight=1;  # 2 CPU, 8GB RAM
}

3. Least Connections — Adaptive to Real Load

Least Connections

Complexity: O(n) or O(log n) with a heap | Stateful

Send requests to the server with the fewest active connections. Smarter than Round Robin because it reacts to real load — busy servers naturally receive fewer new requests.

✓ Pros

Adapts to varying request processing times
Self-balancing when a server is slow
Ideal for WebSocket, long-polling

✗ Cons

Must track state per connection
Higher overhead than Round Robin

4. IP Hash — Simple Session Affinity

IP Hash

Complexity: O(1) | Deterministic

Hash the client's IP to pick the destination server. The same IP always routes to the same server — giving session stickiness without cookies or a shared session store.

# NGINX config
upstream backend {
    ip_hash;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}

Beware of NAT

If many clients share the same IP (via NAT/proxy), they'll all pile onto a single server → uneven load. In enterprise networks, this is a common issue.

5. Consistent Hashing — The King of Distributed Cache

Consistent Hashing

Complexity: O(log n) lookup | Virtual Nodes improve distribution

Uses a hash ring — both servers and request keys are hashed onto a circle. Requests go to the nearest server in the clockwise direction. When servers are added/removed, only ~1/n of keys are affected instead of remapping everything.

graph TB
    subgraph Ring["Hash Ring — Consistent Hashing"]
        direction TB
        N1["Server A
position: 0°"]
        N2["Server B
position: 120°"]
        N3["Server C
position: 240°"]
        K1["Key 'user:42'
→ Server A"]
        K2["Key 'session:99'
→ Server B"]
        K3["Key 'cart:17'
→ Server C"]
    end

    K1 -.->|hash → 35°| N1
    K2 -.->|hash → 155°| N2
    K3 -.->|hash → 280°| N3

    style Ring fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style N1 fill:#e94560,stroke:#fff,color:#fff
    style N2 fill:#e94560,stroke:#fff,color:#fff
    style N3 fill:#e94560,stroke:#fff,color:#fff
    style K1 fill:#2c3e50,stroke:#fff,color:#fff
    style K2 fill:#2c3e50,stroke:#fff,color:#fff
    style K3 fill:#2c3e50,stroke:#fff,color:#fff

A hash ring with 3 servers — each key routes to the nearest server clockwise

Virtual Nodes is an important technique to improve even distribution. Instead of each server occupying a single position on the ring, you create 100-200 virtual positions per physical server. This helps:

Distribute keys much more evenly
When one server fails, its load spreads across many servers rather than piling on the next one
Amazon DynamoDB, Apache Cassandra, and ScyllaDB all use this technique

// Consistent Hashing with virtual nodes — pseudo code
class ConsistentHash {
    private ring: SortedMap<int, string> = new SortedMap();
    private virtualNodes: int = 150;

    addServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            let hash = md5(`${server}:${i}`);
            this.ring.set(hash, server);
        }
    }

    getServer(key: string): string {
        let hash = md5(key);
        // Find the nearest node clockwise
        let entry = this.ring.ceilingEntry(hash);
        return entry ? entry.value : this.ring.firstEntry().value;
    }

    removeServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            this.ring.delete(md5(`${server}:${i}`));
        }
        // Only ~1/n of keys remap — no global impact
    }
}

6. Random Two Choices — The "Just Smart Enough" Algorithm

Power of Two Random Choices

Complexity: O(1) | Near-optimal distribution

Randomly pick 2 servers, then send the request to the one with fewer connections. Sounds simple, but probabilistic theory shows this achieves near-optimal distribution — max connections drop from O(log n) to O(log log n) compared to pure random.

# NGINX Plus config
upstream backend {
    random two least_conn;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
    server app4.example.com;
}

Algorithm Comparison at a Glance

Algorithm	Stateful?	Session Sticky?	Main use case	When NOT to use
Round Robin	No	No	Stateless APIs, uniform microservices	Heterogeneous server configs
Weighted RR	No	No	Mixed fleets (on-prem + cloud)	Highly variable load
Least Connections	Yes	No	WebSockets, long-running requests	Very short, uniform requests
IP Hash	No	Yes	Legacy apps needing session affinity	Many clients behind NAT
Consistent Hash	No	Yes	Distributed caches, sharded DBs	Simple stateless services
Random Two Choices	Yes (light)	No	Large clusters needing near-optimal	Small clusters (<4 servers)

Common Load Balancing Tools

NGINX — Reverse Proxy and Load Balancer

NGINX is the most popular choice for L7 Load Balancing thanks to its high performance (handling millions of concurrent connections), simple configuration, and rich module ecosystem.

# nginx.conf — Complete Load Balancing
http {
    upstream api_servers {
        least_conn;
        server 10.0.1.10:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:8080 weight=1 backup;  # Only used when the above 2 fail
    }

    server {
        listen 443 ssl http2;
        server_name api.example.com;

        # SSL Termination
        ssl_certificate     /etc/ssl/certs/api.crt;
        ssl_certificate_key /etc/ssl/private/api.key;

        # Implicit health check via max_fails/fail_timeout
        location /api/ {
            proxy_pass http://api_servers;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $host;

            # Timeout configuration
            proxy_connect_timeout 5s;
            proxy_read_timeout 60s;
            proxy_send_timeout 10s;
        }
    }
}

HAProxy — Dedicated Load Balancer

HAProxy is renowned for its L4 and L7 capabilities, purpose-built for load balancing with active health checks, detailed metrics, and impressive performance.

# haproxy.cfg
frontend http_front
    bind *:443 ssl crt /etc/ssl/api.pem
    mode http

    # Content-based routing
    acl is_api path_beg /api
    acl is_ws  hdr(Upgrade) -i websocket

    use_backend api_servers if is_api
    use_backend ws_servers  if is_ws
    default_backend static_servers

backend api_servers
    mode http
    balance leastconn
    option httpchk GET /health
    http-check expect status 200

    server api1 10.0.1.10:8080 check inter 5s fall 3 rise 2
    server api2 10.0.1.11:8080 check inter 5s fall 3 rise 2

backend ws_servers
    mode http
    balance source   # IP Hash to keep WebSocket sessions sticky
    timeout tunnel 3600s
    server ws1 10.0.2.10:8080 check
    server ws2 10.0.2.11:8080 check

Cloud Load Balancers — Managed and Auto-scaling

Service	Layer	Free Tier	Strengths
AWS ALB	L7	750h/month (first 12 months)	Path-based routing, gRPC, WebSocket
AWS NLB	L4	750h/month (first 12 months)	Ultra-low latency, static IP, TLS passthrough
Azure Load Balancer	L4	Basic SKU free	Zone-redundant, HA Ports
Azure App Gateway	L7	None (from ~$18/month)	Integrated WAF, SSL offloading, URL rewrite
Cloudflare LB	L7	None (from $5/month)	330+ PoPs, Geo-steering, global health checks

Health Checks and Failover — The Safety Net

A Load Balancer without health checks is like traffic without signals. Health checks let the LB automatically remove dead servers and bring them back when they recover.

sequenceDiagram
    participant LB as Load Balancer
    participant S1 as Server A (healthy)
    participant S2 as Server B (failing)
    participant S3 as Server C (healthy)

    loop Health Check (every 5s)
        LB->>S1: GET /health
        S1-->>LB: 200 OK ✓
        LB->>S2: GET /health
        S2-->>LB: 503 Error ✗
        LB->>S3: GET /health
        S3-->>LB: 200 OK ✓
    end

    Note over LB,S2: Server B fails 3 times in a row → marked DOWN

    LB->>S1: Route traffic (50%)
    LB->>S3: Route traffic (50%)
    Note over S2: Receives no traffic

    S2-->>LB: 200 OK ✓ (after 2 successful checks)
    Note over LB,S2: Server B recovers → returned to the pool

Health Check flow: detect a failing server → remove from pool → auto-recover

There are 3 common types of health checks:

Active Health Check: The LB actively sends probes (HTTP GET /health, TCP connect, or custom scripts). HAProxy and cloud LBs support this by default.
Passive Health Check: The LB observes real traffic responses — if a server returns errors continuously (e.g., 5× 5xx in 30s), mark it down automatically. NGINX Open Source only supports this.
Deep Health Check: Verify dependencies too (database connection, disk space, memory). Return details via a /health/detailed endpoint.

// ASP.NET — Deep Health Check
// Program.cs
builder.Services.AddHealthChecks()
    .AddSqlServer(connectionString, name: "database")
    .AddRedis(redisConnection, name: "cache")
    .AddCheck("disk-space", () =>
    {
        var drive = new DriveInfo("C");
        return drive.AvailableFreeSpace > 1_073_741_824  // > 1GB
            ? HealthCheckResult.Healthy()
            : HealthCheckResult.Degraded("Low disk space");
    });

app.MapHealthChecks("/health", new HealthCheckOptions
{
    Predicate = _ => true,
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

Real-World Deployment Architectures

Pattern 1: Single-tier LB — For Small and Medium Systems

graph LR
    Client[Client] --> LB[NGINX / HAProxy
L7 Load Balancer]
    LB --> S1[App Server 1]
    LB --> S2[App Server 2]
    LB --> S3[App Server 3]
    S1 --> DB[(Database)]
    S2 --> DB
    S3 --> DB

    style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LB fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style DB fill:#4CAF50,stroke:#fff,color:#fff

Single-tier: simple, easy to operate, fits ~10K RPS

Pattern 2: Two-tier LB — For Large Systems

graph TB
    Client[Client] --> DNS[DNS / GeoDNS]
    DNS --> L4A[NLB - L4
Region A]
    DNS --> L4B[NLB - L4
Region B]

    L4A --> L7A1[ALB/NGINX - L7]
    L4A --> L7A2[ALB/NGINX - L7]

    L4B --> L7B1[ALB/NGINX - L7]

    L7A1 --> API1[API Pods]
    L7A2 --> WEB1[Web Pods]
    L7B1 --> API2[API Pods]

    style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DNS fill:#ff9800,stroke:#fff,color:#fff
    style L4A fill:#e94560,stroke:#fff,color:#fff
    style L4B fill:#e94560,stroke:#fff,color:#fff
    style L7A1 fill:#2c3e50,stroke:#fff,color:#fff
    style L7A2 fill:#2c3e50,stroke:#fff,color:#fff
    style L7B1 fill:#2c3e50,stroke:#fff,color:#fff
    style API1 fill:#4CAF50,stroke:#fff,color:#fff
    style WEB1 fill:#4CAF50,stroke:#fff,color:#fff
    style API2 fill:#4CAF50,stroke:#fff,color:#fff

Two-tier: L4 edge → L7 routing, supports multi-region

Pattern 3: Global Load Balancing — For Worldwide Systems

When users are spread globally, you need load balancing at the DNS layer. GeoDNS or Anycast routing brings users to the nearest datacenter. Cloudflare, AWS Route 53, and Azure Traffic Manager all support this pattern.

3 Global LB strategies

Geo-proximity: Route to the geographically nearest datacenter. Simple, effective at reducing latency.
Latency-based: Measure actual latency from users to each region, route to the fastest. More accurate than geo-proximity.
Failover: Active-passive — 100% of traffic to the primary region. When primary is down, shift everything to secondary.

Common Pitfalls and How to Fix Them

1. Thundering Herd When a Server Recovers

When a server is brought back into the pool after passing health checks, using Least Connections means all new requests pile onto it (because it has 0 connections). Solution: Slow Start — gradually increase the recovered server's weight over 30-60 seconds.

# HAProxy slow start
backend api_servers
    server api1 10.0.1.10:8080 check slowstart 60s

2. Session Affinity Causing Imbalance

Sticky sessions (via cookies or IP hash) can make one server receive most of the traffic if "heavy users" cluster onto it. Solution: move to a stateless architecture — store sessions in a shared store (database or in-memory cache) instead of on the server.

3. Overly Sensitive or Overly Slow Health Checks

Too sensitive (interval=1s, fall=1): a server gets evicted because of one timeout → constant flapping. Too slow (interval=30s, fall=5): it takes 2.5 minutes to notice a dead server. Recommended: interval=5s, fall=3, rise=2 — detect in 15s, confirm recovery in 10s.

Load Balancing in Kubernetes

Kubernetes has its own load balancing system via Services and Ingress. Understanding how they work helps avoid duplication or conflicts with external LBs.

Component	Layer	Scope	Default Algorithm
kube-proxy (iptables)	L4	Inside the cluster	Random (probability-based)
kube-proxy (IPVS)	L4	Inside the cluster	Round Robin (also supports Least Conn, Source Hash...)
Ingress Controller	L7	External → cluster	Depends on controller (NGINX, Traefik, Envoy)
Service type LoadBalancer	L4	External → cluster	Cloud provider LB (ALB, NLB...)
Gateway API	L4/L7	External → cluster	Depends on implementation, more flexible than Ingress

Load Balancer Deployment Checklist

Production Readiness Checklist

Pick the algorithm that fits your workload — Round Robin for stateless, Least Connections for variable latency, Consistent Hash for caches
Configure health checks — active checks with a /health endpoint, 5s interval, fail threshold 3
SSL Termination — terminate TLS at the LB to offload work from backends
Logging & Monitoring — track request count, latency p50/p95/p99, error rate, active connections per backend
LB High Availability — the LB itself needs redundancy: VRRP (keepalived), or use a managed cloud LB
Rate Limiting — protect backends from unexpected traffic spikes
Connection Draining — when removing a server from the pool, let in-flight requests finish (graceful shutdown)
Sensible timeouts — short connect timeout (5s), read timeout matching SLA (30-60s)

Conclusion

Load Balancing isn't just "splitting requests evenly" — it's the art of distributing load so the system stays fast, stable, and resilient. There's no "best" algorithm — only the one that fits your specific context:

Stateless APIs → Round Robin or Random Two Choices
WebSockets / Long-running → Least Connections
Distributed Cache → Consistent Hashing
Legacy session-based apps → IP Hash (temporary; migrate to stateless)
Multi-region → Global LB (GeoDNS) + Regional L4/L7

Start simple with Round Robin, add health checks, and only add complexity when you truly need it. Over-engineering load balancing from day one is one of the most common mistakes in system design.

References

#Load Balancing #system design #NGINX #HAProxy #High Availability #Microservices #Cloud Architecture

# Load Balancing: The Art of Traffic Distribution for Million-Request Systems

## What Is Load Balancing and Why Does It Matter?

<1ms Average cold start for an L4 LB

99.99% Cloud LB uptime SLA

10M+ Requests/s handled by NGINX

330+ PoPs for Cloudflare LB

#### When do you need a Load Balancer?

## Layer 4 vs Layer 7: Two Schools of Load Balancing

subgraph L7["Layer 7 — Application"]
        F[Client Request] -->|HTTP/gRPC| G[L7 Load Balancer]
        G -->|/api/*| H[API Server]
        G -->|/static/*| I[CDN/Static Server]
        G -->|/ws/*| J[WebSocket Server]
    end

style L4 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style L7 fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style B fill:#e94560,stroke:#fff,color:#fff
    style G fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#2c3e50,stroke:#fff,color:#fff
    style E fill:#2c3e50,stroke:#fff,color:#fff
    style H fill:#2c3e50,stroke:#fff,color:#fff
    style I fill:#2c3e50,stroke:#fff,color:#fff
    style J fill:#2c3e50,stroke:#fff,color:#fff

```
Comparison of Layer 4 vs Layer 7 Load Balancer request flows

| Criterion | Layer 4 (Transport) | Layer 7 (Application) |
| --- | --- | --- |
| **Operates at** | TCP/UDP — sees only IP and Port | HTTP/HTTPS/gRPC — reads headers, URLs, cookies |
| **Routing** | Based on source/destination IP and port | Based on URL path, hostname, headers, query string |
| **Performance** | Extremely fast — doesn't parse the payload | Slower — must decrypt TLS, parse HTTP |
| **SSL Termination** | TLS passthrough (no decryption) | Decrypts at the LB, re-encrypts if needed |
| **Connection Pooling** | No — forwards the TCP stream directly | Yes — multiplexes many clients over few backend connections |
| **Use case** | Databases, game servers, IoT, streaming | Web apps, APIs, microservices, gRPC |
| **Examples** | AWS NLB, HAProxy TCP mode, IPVS | AWS ALB, NGINX, HAProxy HTTP mode, Envoy |

#### Production practice

Most production architectures use **both layers**: L4 at the edge to quickly distribute traffic into L7 clusters, then L7 performs detailed content-based routing. For example: AWS NLB (L4) → ALB (L7), or Google Maglev (L4) → Envoy (L7).

## The 6 Most Popular Load Balancing Algorithms

### 1. Round Robin — Simple but Effective

#### Round Robin

Complexity: O(1) | Stateless | Default for NGINX and HAProxy

Distribute requests sequentially in a loop: Server A → B → C → A → B → C... No need to track server state, extremely simple and effective when servers have uniform capacity.

**✓ Pros**

- Simple, stateless
- Evenly distributed over time
- O(1) performance

**✗ Cons**

- Ignores real load differences
- Not ideal for requests with very uneven processing time

### 2. Weighted Round Robin — When Servers Aren't Equal

#### Weighted Round Robin

Complexity: O(1) | Semi-stateless | Requires weight configuration

Assign weights to each server based on capacity. A powerful server (weight=5) receives 5× more requests than a weaker one (weight=1). Great for fleets with mixed machine types.

```
# NGINX config
upstream backend {
    server app1.example.com weight=5;  # 16 CPU, 64GB RAM
    server app2.example.com weight=3;  # 8 CPU, 32GB RAM
    server app3.example.com weight=1;  # 2 CPU, 8GB RAM
}
```

### 3. Least Connections — Adaptive to Real Load

#### Least Connections

Complexity: O(n) or O(log n) with a heap | Stateful

Send requests to the server with the fewest active connections. Smarter than Round Robin because it *reacts to real load* — busy servers naturally receive fewer new requests.

**✓ Pros**

- Adapts to varying request processing times
- Self-balancing when a server is slow
- Ideal for WebSocket, long-polling

**✗ Cons**

- Must track state per connection
- Higher overhead than Round Robin

### 4. IP Hash — Simple Session Affinity

#### IP Hash

Complexity: O(1) | Deterministic

Hash the client's IP to pick the destination server. The same IP always routes to the same server — giving session stickiness without cookies or a shared session store.

```
# NGINX config
upstream backend {
    ip_hash;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
}
```

#### Beware of NAT

If many clients share the same IP (via NAT/proxy), they'll all pile onto a single server → uneven load. In enterprise networks, this is a common issue.

### 5. Consistent Hashing — The King of Distributed Cache

#### Consistent Hashing

Complexity: O(log n) lookup | Virtual Nodes improve distribution

```
graph TB
    subgraph Ring["Hash Ring — Consistent Hashing"]
        direction TB
        N1["Server A  
position: 0°"]
        N2["Server B  
position: 120°"]
        N3["Server C  
position: 240°"]
        K1["Key 'user:42'  
→ Server A"]
        K2["Key 'session:99'  
→ Server B"]
        K3["Key 'cart:17'  
→ Server C"]
    end

K1 -.->|hash → 35°| N1
    K2 -.->|hash → 155°| N2
    K3 -.->|hash → 280°| N3

style Ring fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
    style N1 fill:#e94560,stroke:#fff,color:#fff
    style N2 fill:#e94560,stroke:#fff,color:#fff
    style N3 fill:#e94560,stroke:#fff,color:#fff
    style K1 fill:#2c3e50,stroke:#fff,color:#fff
    style K2 fill:#2c3e50,stroke:#fff,color:#fff
    style K3 fill:#2c3e50,stroke:#fff,color:#fff

```
A hash ring with 3 servers — each key routes to the nearest server clockwise

**Virtual Nodes** is an important technique to improve even distribution. Instead of each server occupying a single position on the ring, you create 100-200 virtual positions per physical server. This helps:

- Distribute keys much more evenly
- When one server fails, its load spreads across many servers rather than piling on the next one
- Amazon DynamoDB, Apache Cassandra, and ScyllaDB all use this technique

```
// Consistent Hashing with virtual nodes — pseudo code
class ConsistentHash {
    private ring: SortedMap<int, string> = new SortedMap();
    private virtualNodes: int = 150;

addServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            let hash = md5(`${server}:${i}`);
            this.ring.set(hash, server);
        }
    }

getServer(key: string): string {
        let hash = md5(key);
        // Find the nearest node clockwise
        let entry = this.ring.ceilingEntry(hash);
        return entry ? entry.value : this.ring.firstEntry().value;
    }

removeServer(server: string) {
        for (let i = 0; i < this.virtualNodes; i++) {
            this.ring.delete(md5(`${server}:${i}`));
        }
        // Only ~1/n of keys remap — no global impact
    }
}
```

### 6. Random Two Choices — The "Just Smart Enough" Algorithm

#### Power of Two Random Choices

Complexity: O(1) | Near-optimal distribution

Randomly pick 2 servers, then send the request to the one with fewer connections. Sounds simple, but probabilistic theory shows this achieves near-optimal distribution — **max connections drop from O(log n) to O(log log n)** compared to pure random.

```
# NGINX Plus config
upstream backend {
    random two least_conn;
    server app1.example.com;
    server app2.example.com;
    server app3.example.com;
    server app4.example.com;
}
```

## Algorithm Comparison at a Glance

| Algorithm | Stateful? | Session Sticky? | Main use case | When NOT to use |
| --- | --- | --- | --- | --- |
| **Round Robin** | No | No | Stateless APIs, uniform microservices | Heterogeneous server configs |
| **Weighted RR** | No | No | Mixed fleets (on-prem + cloud) | Highly variable load |
| **Least Connections** | Yes | No | WebSockets, long-running requests | Very short, uniform requests |
| **IP Hash** | No | Yes | Legacy apps needing session affinity | Many clients behind NAT |
| **Consistent Hash** | No | Yes | Distributed caches, sharded DBs | Simple stateless services |
| **Random Two Choices** | Yes (light) | No | Large clusters needing near-optimal | Small clusters (<4 servers) |

## Common Load Balancing Tools

### NGINX — Reverse Proxy and Load Balancer

NGINX is the most popular choice for L7 Load Balancing thanks to its high performance (handling millions of concurrent connections), simple configuration, and rich module ecosystem.

```
# nginx.conf — Complete Load Balancing
http {
    upstream api_servers {
        least_conn;
        server 10.0.1.10:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:8080 weight=3 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:8080 weight=1 backup;  # Only used when the above 2 fail
    }

server {
        listen 443 ssl http2;
        server_name api.example.com;

# SSL Termination
        ssl_certificate     /etc/ssl/certs/api.crt;
        ssl_certificate_key /etc/ssl/private/api.key;

# Implicit health check via max_fails/fail_timeout
        location /api/ {
            proxy_pass http://api_servers;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $host;

# Timeout configuration
            proxy_connect_timeout 5s;
            proxy_read_timeout 60s;
            proxy_send_timeout 10s;
        }
    }
}
```

### HAProxy — Dedicated Load Balancer

HAProxy is renowned for its L4 and L7 capabilities, purpose-built for load balancing with active health checks, detailed metrics, and impressive performance.

```
# haproxy.cfg
frontend http_front
    bind *:443 ssl crt /etc/ssl/api.pem
    mode http

# Content-based routing
    acl is_api path_beg /api
    acl is_ws  hdr(Upgrade) -i websocket

use_backend api_servers if is_api
    use_backend ws_servers  if is_ws
    default_backend static_servers

backend api_servers
    mode http
    balance leastconn
    option httpchk GET /health
    http-check expect status 200

server api1 10.0.1.10:8080 check inter 5s fall 3 rise 2
    server api2 10.0.1.11:8080 check inter 5s fall 3 rise 2

backend ws_servers
    mode http
    balance source   # IP Hash to keep WebSocket sessions sticky
    timeout tunnel 3600s
    server ws1 10.0.2.10:8080 check
    server ws2 10.0.2.11:8080 check
```

### Cloud Load Balancers — Managed and Auto-scaling

| Service | Layer | Free Tier | Strengths |
| --- | --- | --- | --- |
| **AWS ALB** | L7 | 750h/month (first 12 months) | Path-based routing, gRPC, WebSocket |
| **AWS NLB** | L4 | 750h/month (first 12 months) | Ultra-low latency, static IP, TLS passthrough |
| **Azure Load Balancer** | L4 | Basic SKU free | Zone-redundant, HA Ports |
| **Azure App Gateway** | L7 | None (from ~$18/month) | Integrated WAF, SSL offloading, URL rewrite |
| **Cloudflare LB** | L7 | None (from $5/month) | 330+ PoPs, Geo-steering, global health checks |

## Health Checks and Failover — The Safety Net

A Load Balancer without health checks is like traffic without signals. Health checks let the LB automatically remove dead servers and bring them back when they recover.

```
sequenceDiagram
    participant LB as Load Balancer
    participant S1 as Server A (healthy)
    participant S2 as Server B (failing)
    participant S3 as Server C (healthy)

loop Health Check (every 5s)
        LB->>S1: GET /health
        S1-->>LB: 200 OK ✓
        LB->>S2: GET /health
        S2-->>LB: 503 Error ✗
        LB->>S3: GET /health
        S3-->>LB: 200 OK ✓
    end

Note over LB,S2: Server B fails 3 times in a row → marked DOWN

LB->>S1: Route traffic (50%)
    LB->>S3: Route traffic (50%)
    Note over S2: Receives no traffic

S2-->>LB: 200 OK ✓ (after 2 successful checks)
    Note over LB,S2: Server B recovers → returned to the pool

```
Health Check flow: detect a failing server → remove from pool → auto-recover

There are 3 common types of health checks:

- **Active Health Check**: The LB actively sends probes (HTTP GET /health, TCP connect, or custom scripts). HAProxy and cloud LBs support this by default.
- **Passive Health Check**: The LB observes real traffic responses — if a server returns errors continuously (e.g., 5× 5xx in 30s), mark it down automatically. NGINX Open Source only supports this.
- **Deep Health Check**: Verify dependencies too (database connection, disk space, memory). Return details via a /health/detailed endpoint.

```
// ASP.NET — Deep Health Check
// Program.cs
builder.Services.AddHealthChecks()
    .AddSqlServer(connectionString, name: "database")
    .AddRedis(redisConnection, name: "cache")
    .AddCheck("disk-space", () =>
    {
        var drive = new DriveInfo("C");
        return drive.AvailableFreeSpace > 1_073_741_824  // > 1GB
            ? HealthCheckResult.Healthy()
            : HealthCheckResult.Degraded("Low disk space");
    });

app.MapHealthChecks("/health", new HealthCheckOptions
{
    Predicate = _ => true,
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
```

## Real-World Deployment Architectures

### Pattern 1: Single-tier LB — For Small and Medium Systems

```
graph LR
    Client[Client] --> LB[NGINX / HAProxy  
L7 Load Balancer]
    LB --> S1[App Server 1]
    LB --> S2[App Server 2]
    LB --> S3[App Server 3]
    S1 --> DB[(Database)]
    S2 --> DB
    S3 --> DB

style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LB fill:#e94560,stroke:#fff,color:#fff
    style S1 fill:#2c3e50,stroke:#fff,color:#fff
    style S2 fill:#2c3e50,stroke:#fff,color:#fff
    style S3 fill:#2c3e50,stroke:#fff,color:#fff
    style DB fill:#4CAF50,stroke:#fff,color:#fff

```
Single-tier: simple, easy to operate, fits ~10K RPS

### Pattern 2: Two-tier LB — For Large Systems

```
graph TB
    Client[Client] --> DNS[DNS / GeoDNS]
    DNS --> L4A[NLB - L4  
Region A]
    DNS --> L4B[NLB - L4  
Region B]

L4A --> L7A1[ALB/NGINX - L7]
    L4A --> L7A2[ALB/NGINX - L7]

L4B --> L7B1[ALB/NGINX - L7]

L7A1 --> API1[API Pods]
    L7A2 --> WEB1[Web Pods]
    L7B1 --> API2[API Pods]

style Client fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DNS fill:#ff9800,stroke:#fff,color:#fff
    style L4A fill:#e94560,stroke:#fff,color:#fff
    style L4B fill:#e94560,stroke:#fff,color:#fff
    style L7A1 fill:#2c3e50,stroke:#fff,color:#fff
    style L7A2 fill:#2c3e50,stroke:#fff,color:#fff
    style L7B1 fill:#2c3e50,stroke:#fff,color:#fff
    style API1 fill:#4CAF50,stroke:#fff,color:#fff
    style WEB1 fill:#4CAF50,stroke:#fff,color:#fff
    style API2 fill:#4CAF50,stroke:#fff,color:#fff

```
Two-tier: L4 edge → L7 routing, supports multi-region

### Pattern 3: Global Load Balancing — For Worldwide Systems

#### 3 Global LB strategies

- **Geo-proximity**: Route to the geographically nearest datacenter. Simple, effective at reducing latency.
- **Latency-based**: Measure actual latency from users to each region, route to the fastest. More accurate than geo-proximity.
- **Failover**: Active-passive — 100% of traffic to the primary region. When primary is down, shift everything to secondary.

## Common Pitfalls and How to Fix Them

#### 1. Thundering Herd When a Server Recovers

When a server is brought back into the pool after passing health checks, using Least Connections means all new requests pile onto it (because it has 0 connections). Solution: **Slow Start** — gradually increase the recovered server's weight over 30-60 seconds.

```
# HAProxy slow start
backend api_servers
    server api1 10.0.1.10:8080 check slowstart 60s
```

#### 2. Session Affinity Causing Imbalance

Sticky sessions (via cookies or IP hash) can make one server receive most of the traffic if "heavy users" cluster onto it. Solution: move to a **stateless architecture** — store sessions in a shared store (database or in-memory cache) instead of on the server.

#### 3. Overly Sensitive or Overly Slow Health Checks

**Too sensitive** (interval=1s, fall=1): a server gets evicted because of one timeout → constant flapping. **Too slow** (interval=30s, fall=5): it takes 2.5 minutes to notice a dead server. Recommended: `interval=5s, fall=3, rise=2` — detect in 15s, confirm recovery in 10s.

## Load Balancing in Kubernetes

Kubernetes has its own load balancing system via Services and Ingress. Understanding how they work helps avoid duplication or conflicts with external LBs.

| Component | Layer | Scope | Default Algorithm |
| --- | --- | --- | --- |
| **kube-proxy (iptables)** | L4 | Inside the cluster | Random (probability-based) |
| **kube-proxy (IPVS)** | L4 | Inside the cluster | Round Robin (also supports Least Conn, Source Hash...) |
| **Ingress Controller** | L7 | External → cluster | Depends on controller (NGINX, Traefik, Envoy) |
| **Service type LoadBalancer** | L4 | External → cluster | Cloud provider LB (ALB, NLB...) |
| **Gateway API** | L4/L7 | External → cluster | Depends on implementation, more flexible than Ingress |

## Load Balancer Deployment Checklist

#### Production Readiness Checklist

1. **Pick the algorithm** that fits your workload — Round Robin for stateless, Least Connections for variable latency, Consistent Hash for caches
2. **Configure health checks** — active checks with a /health endpoint, 5s interval, fail threshold 3
3. **SSL Termination** — terminate TLS at the LB to offload work from backends
4. **Logging & Monitoring** — track request count, latency p50/p95/p99, error rate, active connections per backend
5. **LB High Availability** — the LB itself needs redundancy: VRRP (keepalived), or use a managed cloud LB
6. **Rate Limiting** — protect backends from unexpected traffic spikes
7. **Connection Draining** — when removing a server from the pool, let in-flight requests finish (graceful shutdown)
8. **Sensible timeouts** — short connect timeout (5s), read timeout matching SLA (30-60s)

## Conclusion

- **Stateless APIs** → Round Robin or Random Two Choices
- **WebSockets / Long-running** → Least Connections
- **Distributed Cache** → Consistent Hashing
- **Legacy session-based apps** → IP Hash (temporary; migrate to stateless)
- **Multi-region** → Global LB (GeoDNS) + Regional L4/L7

Start simple with Round Robin, add health checks, and only add complexity when you truly need it. Over-engineering load balancing from day one is one of the most common mistakes in system design.

## References

- [NGINX HTTP Load Balancing Documentation](https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/)
- [Understanding Load Balancing Algorithms and Strategies (2026)](https://oneuptime.com/blog/post/2026-02-20-load-balancing-algorithms/view)
- [Consistent Hashing Explained — AlgoMaster](https://blog.algomaster.io/p/consistent-hashing-explained)
- [Layer 4 vs Layer 7: Load Balancing and Why It Matters — CloudRPS](https://cloudrps.com/blog/layer-4-vs-layer-7)
- [ALB vs NLB: Which AWS Load Balancer Fits Your Needs?](https://blog.cloudcraft.co/alb-vs-nlb-which-aws-load-balancer-fits-your-needs/)
- [Edge Computing: Cloudflare Workers Development Guide 2026](https://www.digitalapplied.com/blog/edge-computing-cloudflare-workers-development-guide-2026)

gRPC and Protocol Buffers on .NET 10 — High-Performance Microservice Communication

Domain-Driven Design in Practice on .NET 10 — Aggregate, Domain Event, and Bounded Context

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.