Distributed Caching: Designing a Distributed Cache System from A to Z

Posted on: 4/21/2026 2:12:49 AM

Table of contents

1. Why do we need distributed caching?
2. The four core caching patterns
3. Comparing the four patterns
1. In real production
4. Cache invalidation — the hardest problem
5. Consistent hashing — distributing data across a cluster
1. Virtual nodes fix uneven distribution
6. Handling cache stampede (thundering herd)
7. Multi-layer caching — real-world architecture
1. L2 cache: why must TTL be shorter than L3?
8. Redis vs Memcached — pick the right tool
1. Pick Redis when
2. Pick Memcached when
9. Eviction policies — when the cache fills up
1. What does Redis use by default?
10. Monitoring — measuring cache effectiveness
Conclusion

<1msLocal cache read latency

1-5msDistributed cache read latency

90%+Target cache hit rate

10-100xDatabase load reduction

Caching is one of the most important techniques in system design. But the gap between "drop Redis between the app and the database" and "design a truly reliable distributed cache system" is huge. This post digs into distributed cache architecture from a system-design perspective: the core patterns, invalidation strategies, data distribution via consistent hashing, cache stampede mitigation, and multi-layer caching design for production systems.

1. Why do we need distributed caching?

Before jumping into patterns, let's understand the problem that distributed caching actually solves:

Local cache (in-process, e.g., MemoryCache in .NET) has sub-microsecond latency, but each app instance keeps its own copy — leading to inconsistency when data changes
Distributed cache (Redis, Memcached) adds 1–5ms of network latency but provides a single consistent view across all instances
A typical database query costs 10–100ms, so a cache hit at 1–5ms is still 10–100x faster

graph LR
    subgraph Local Cache
    A1[App Instance 1] --> LC1[MemoryCache A]
    A2[App Instance 2] --> LC2[MemoryCache B]
    A3[App Instance 3] --> LC3[MemoryCache C]
    LC1 -.->|Different| LC2
    LC2 -.->|Different| LC3
    end

    subgraph Distributed Cache
    B1[App Instance 1] --> DC[Redis Cluster]
    B2[App Instance 2] --> DC
    B3[App Instance 3] --> DC
    end

    style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LC1 fill:#ff9800,stroke:#fff,color:#fff
    style LC2 fill:#ff9800,stroke:#fff,color:#fff
    style LC3 fill:#ff9800,stroke:#fff,color:#fff
    style B1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DC fill:#4CAF50,stroke:#fff,color:#fff

Local cache: each instance has its own copy (inconsistent) — Distributed cache: single source of truth

2. The four core caching patterns

Every caching system is built on one of these four patterns. Each has its own trade-offs between consistency, latency, and complexity.

2.1 Cache-Aside (Lazy Loading)

The most common pattern. The application is in full control: on reads, check the cache first; on a miss, fetch from the database and write to the cache. On writes, write directly to the database, then invalidate the cache.

sequenceDiagram
    participant App
    participant Cache
    participant DB

    App->>Cache: GET user:123
    Cache-->>App: MISS
    App->>DB: SELECT * FROM users WHERE id=123
    DB-->>App: {name: "Alice", ...}
    App->>Cache: SET user:123 (TTL 5min)
    App-->>App: Return data

    Note over App,DB: Next read
    App->>Cache: GET user:123
    Cache-->>App: HIT -> {name: "Alice", ...}

Cache-Aside: the application controls both the read and write paths

// Cache-Aside in .NET with IDistributedCache
public async Task<UserProfile?> GetUserProfile(int userId)
{
    var cacheKey = $"user:{userId}";

    // 1. Try the cache
    var cached = await _cache.GetStringAsync(cacheKey);
    if (cached != null)
        return JsonSerializer.Deserialize<UserProfile>(cached);

    // 2. Cache miss -> fetch from DB
    var user = await _dbContext.Users.FindAsync(userId);
    if (user == null) return null;

    // 3. Write to cache with TTL
    await _cache.SetStringAsync(cacheKey,
        JsonSerializer.Serialize(user),
        new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
        });

    return user;
}

Pros and cons

Pros: only caches data that's actually read (avoids wasted memory), the application has full control of the logic, easy to implement. Cons: the first cache miss is always slow (cold start), and there's a staleness window between a DB update and cache invalidation.

2.2 Read-Through

Similar to Cache-Aside, but the cache layer is responsible for fetching from the database on a miss. The application only talks to the cache and doesn't need to know about the database.

// Read-Through abstraction
public class ReadThroughCache<T>
{
    private readonly IDistributedCache _cache;
    private readonly Func<string, Task<T?>> _loader;

    public async Task<T?> Get(string key)
    {
        var cached = await _cache.GetStringAsync(key);
        if (cached != null)
            return JsonSerializer.Deserialize<T>(cached);

        // Cache loads from data source itself
        var value = await _loader(key);
        if (value != null)
        {
            await _cache.SetStringAsync(key,
                JsonSerializer.Serialize(value),
                new DistributedCacheEntryOptions
                {
                    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
                });
        }
        return value;
    }
}

Key difference: business code doesn't need to know about hits/misses — the cache layer encapsulates that completely. Fits when you want to keep cache logic separate from business logic.

2.3 Write-Through

Every write passes through the cache before going to the database. The cache always contains the latest data, but write latency increases because both writes happen synchronously.

sequenceDiagram
    participant App
    participant Cache
    participant DB

    App->>Cache: SET user:123 = {updated data}
    Cache->>DB: UPDATE users SET ... WHERE id=123
    DB-->>Cache: OK
    Cache-->>App: OK

    Note over App,DB: Reads are always consistent
    App->>Cache: GET user:123
    Cache-->>App: HIT -> {updated data}

Write-Through: the cache proxies every write, ensuring consistency

When to use: when consistency is the top priority — session management, user profiles, shopping carts. Not a fit for write-heavy workloads because every write incurs extra latency.

2.4 Write-Behind (Write-Back)

Writes only go to the cache; the database is updated asynchronously later (batch, periodic flush). This is the fastest write pattern, but risks data loss if a cache node dies before flush.

sequenceDiagram
    participant App
    participant Cache
    participant Queue
    participant DB

    App->>Cache: SET counter:page_views = 15042
    Cache-->>App: OK (immediately)

    Note over Cache,DB: Async flush every 10s
    Cache->>Queue: Batch updates
    Queue->>DB: BULK UPDATE counters
    DB-->>Queue: OK

Write-Behind: write cache first, flush DB later — fast, but risks data loss

When to use: page-view counters, analytics events, log aggregation — write-heavy workloads where losing a few seconds of trailing data is acceptable.

3. Comparing the four patterns

Pattern	Read latency	Write latency	Consistency	Complexity
Cache-Aside	Slow miss, fast hit	Low (direct DB write)	Eventual (staleness window)	Low
Read-Through	Slow miss, fast hit	Low	Eventual	Medium
Write-Through	Always fast (cache hit)	High (sync writes to both)	Strong	Medium
Write-Behind	Always fast	Very low (cache only)	Weak (possible data loss)	High

In real production

Most large systems combine multiple patterns. Example: Cache-Aside for user profiles (read-heavy), Write-Behind for analytics counters (write-heavy), Write-Through for shopping carts (needs consistency).

4. Cache invalidation — the hardest problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

When the database changes, the cache needs to know so it doesn't serve stale data. Three main strategies:

4.1 TTL-Based (Time To Live)

Each cache entry expires after a fixed time. Simplest, but creates the longest staleness window.

// Fixed TTL
await _cache.SetStringAsync("product:456", data,
    new DistributedCacheEntryOptions
    {
        AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
    });

// Sliding TTL — resets every read
await _cache.SetStringAsync("session:abc", sessionData,
    new DistributedCacheEntryOptions
    {
        SlidingExpiration = TimeSpan.FromMinutes(30)
    });

TTL rule of thumb: short TTLs (30s–2min) for rapidly changing data (stock prices, live scores). Medium TTLs (5–15min) for slow-changing data (product catalogs, user profiles). Long TTLs (1–24h) for near-static data (config, translations).

4.2 Event-Based Invalidation

When data in the database changes, an event is emitted to invalidate the corresponding cache entry. Most accurate, but more complex.

graph LR
    A[App: UPDATE user] --> B[Database]
    B --> C[CDC / Change Event]
    C --> D[Message Bus]
    D --> E[Cache Invalidator]
    E --> F[Redis: DEL user:123]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff

Event-based invalidation via Change Data Capture (CDC)

// Event-based invalidation in .NET
public class UserUpdatedHandler : INotificationHandler<UserUpdatedEvent>
{
    private readonly IDistributedCache _cache;

    public async Task Handle(UserUpdatedEvent notification,
        CancellationToken cancellationToken)
    {
        // Invalidate every related cache key
        await _cache.RemoveAsync($"user:{notification.UserId}");
        await _cache.RemoveAsync($"user-profile:{notification.UserId}");
        await _cache.RemoveAsync($"user-permissions:{notification.UserId}");
    }
}

4.3 Version-Based Invalidation

Instead of deleting the cache entry, embed a version number in the cache key. When data changes, increment the version — clients automatically miss on the old key.

// Version-based: no invalidation, just a new version
var version = await _cache.GetStringAsync("product:456:version") ?? "0";
var cacheKey = $"product:456:v{version}";

var cached = await _cache.GetStringAsync(cacheKey);
if (cached != null) return cached;

// On product update
var newVersion = int.Parse(version) + 1;
await _cache.SetStringAsync("product:456:version", newVersion.ToString());
// Old keys expire naturally via TTL — no DEL needed

Pros: old and new versions can coexist, no race conditions during multi-instance deploys. Cons: one extra lookup for the version key.

5. Consistent hashing — distributing data across a cluster

When a cache cluster has multiple nodes, you need to decide which key lives where. Consistent hashing solves this with an important property: adding/removing a node only moves ~1/N of the data (N = number of nodes) instead of rehashing everything.

graph TD
    subgraph Hash Ring
    N1[Node A
0-120 deg]
    N2[Node B
120-240 deg]
    N3[Node C
240-360 deg]
    end

    K1[Key user:1
hash=85] --> N1
    K2[Key user:2
hash=190] --> N2
    K3[Key user:3
hash=310] --> N3
    K4[Key user:4
hash=45] --> N1

    style N1 fill:#e94560,stroke:#fff,color:#fff
    style N2 fill:#2c3e50,stroke:#fff,color:#fff
    style N3 fill:#4CAF50,stroke:#fff,color:#fff
    style K1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style K2 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style K3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style K4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Consistent hashing ring: each key maps to the nearest node clockwise

In practice, Redis Cluster uses hash slots (16,384 slots) rather than a pure consistent-hashing ring, but the principle is the same: each node owns a slot range, and scaling only migrates part of the slots.

Virtual nodes fix uneven distribution

With just 3 physical nodes on the hash ring, distribution can be very skewed (node A gets 60% of keys, node C only 10%). Solution: each physical node is represented by 100–200 virtual nodes spread evenly on the ring. Result: near-even distribution, under 5% variance.

6. Handling cache stampede (thundering herd)

Cache stampede happens when a popular cache entry expires and hundreds of concurrent requests all miss the cache and hit the database at the same time. It's one of the most common failures when deploying a cache without guards.

sequenceDiagram
    participant R1 as Request 1
    participant R2 as Request 2
    participant R3 as Request N...
    participant Cache
    participant DB

    Note over Cache: Key "hot:product" just expired!

    R1->>Cache: GET hot:product
    Cache-->>R1: MISS
    R2->>Cache: GET hot:product
    Cache-->>R2: MISS
    R3->>Cache: GET hot:product
    Cache-->>R3: MISS

    R1->>DB: SELECT * FROM products...
    R2->>DB: SELECT * FROM products...
    R3->>DB: SELECT * FROM products...

    Note over DB: Database overload!

Cache stampede: hundreds of concurrent misses overwhelm the database

Three mitigation techniques:

6.1 Mutex lock (singleflight)

Allow only one request to rebuild the cache; others wait for the result:

public async Task<T?> GetWithLock<T>(string key,
    Func<Task<T?>> factory, TimeSpan ttl)
{
    var cached = await _cache.GetStringAsync(key);
    if (cached != null)
        return JsonSerializer.Deserialize<T>(cached);

    var lockKey = $"lock:{key}";
    var lockAcquired = await _redis.StringSetAsync(
        lockKey, "1", TimeSpan.FromSeconds(10), When.NotExists);

    if (lockAcquired)
    {
        try
        {
            // Only 1 request rebuilds
            var value = await factory();
            if (value != null)
            {
                await _cache.SetStringAsync(key,
                    JsonSerializer.Serialize(value),
                    new DistributedCacheEntryOptions
                    {
                        AbsoluteExpirationRelativeToNow = ttl
                    });
            }
            return value;
        }
        finally
        {
            await _redis.KeyDeleteAsync(lockKey);
        }
    }

    // Other requests: wait and retry
    await Task.Delay(100);
    return await GetWithLock(key, factory, ttl);
}

6.2 Randomized TTL (TTL jitter)

Add a random offset to TTL so keys don't expire simultaneously:

// Instead of a fixed 5-minute TTL for every key
var baseTtl = TimeSpan.FromMinutes(5);
var jitter = TimeSpan.FromSeconds(Random.Shared.Next(0, 60));
var ttl = baseTtl + jitter; // 5:00 to 6:00 minutes

6.3 Early recompute (probabilistic)

Before a key expires, a random request proactively rebuilds it. The probability of rebuilding grows as the TTL nears zero:

// Check whether to rebuild early
var remainingTtl = await _redis.KeyTimeToLiveAsync(key);
if (remainingTtl.HasValue)
{
    var totalTtl = TimeSpan.FromMinutes(5);
    var ratio = remainingTtl.Value / totalTtl;
    // When TTL is under 20%, 30% chance to rebuild
    if (ratio < 0.2 && Random.Shared.NextDouble() < 0.3)
    {
        _ = Task.Run(() => RebuildCacheAsync(key)); // Fire-and-forget
    }
}

7. Multi-layer caching — real-world architecture

Production systems typically stack multiple cache layers, each with its own characteristics:

graph TD
    U[User Request] --> CDN[L1: CDN / Edge Cache
Cloudflare, CloudFront]
    CDN -->|MISS| LB[Load Balancer]
    LB --> APP[Application Server]
    APP --> L2[L2: In-Process Cache
MemoryCache, ConcurrentDict]
    L2 -->|MISS| L3[L3: Distributed Cache
Redis Cluster]
    L3 -->|MISS| DB[Database
SQL Server, PostgreSQL]

    DB -->|Populate| L3
    L3 -->|Populate| L2
    APP -->|Set Cache-Control| CDN

    style U fill:#e94560,stroke:#fff,color:#fff
    style CDN fill:#2c3e50,stroke:#fff,color:#fff
    style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style L2 fill:#4CAF50,stroke:#fff,color:#fff
    style L3 fill:#4CAF50,stroke:#fff,color:#fff
    style DB fill:#2c3e50,stroke:#fff,color:#fff

Multi-layer caching: CDN -> In-Process -> Distributed Cache -> Database

Layer	Latency	Capacity	Consistency	Scope
L1: CDN/Edge	<10ms (from edge)	Very large (distributed)	Eventual (TTL)	Public, static content
L2: In-Process	<0.1ms	Small (instance RAM)	Inconsistent across instances	Hot data, per-instance
L3: Distributed	1-5ms	Large (cluster RAM)	Consistent (single source)	Shared state
Database	10-100ms	Very large (disk)	Strong (ACID)	Source of truth

// Multi-layer cache in .NET
public class MultiLayerCache<T>
{
    private readonly IMemoryCache _l2;
    private readonly IDistributedCache _l3;

    public async Task<T?> Get(string key, Func<Task<T?>> factory)
    {
        // L2: In-process cache (sub-microsecond)
        if (_l2.TryGetValue(key, out T? l2Value))
            return l2Value;

        // L3: Distributed cache (1-5ms)
        var l3Data = await _l3.GetStringAsync(key);
        if (l3Data != null)
        {
            var value = JsonSerializer.Deserialize<T>(l3Data);
            _l2.Set(key, value, TimeSpan.FromMinutes(1));
            return value;
        }

        // Database (10-100ms)
        var dbValue = await factory();
        if (dbValue != null)
        {
            var json = JsonSerializer.Serialize(dbValue);
            await _l3.SetStringAsync(key, json,
                new DistributedCacheEntryOptions
                {
                    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
                });
            _l2.Set(key, dbValue, TimeSpan.FromMinutes(1));
        }
        return dbValue;
    }
}

L2 cache: why must TTL be shorter than L3?

When data changes, you can invalidate L3 (Redis) via events. But L2 (in-process) on every instance doesn't receive those events — it must wait for TTL expiration. Therefore, L2 TTL should be 10–20% of L3 TTL to reduce the staleness window. Example: L3 = 10 minutes -> L2 = 1–2 minutes.

8. Redis vs Memcached — pick the right tool

The two most popular distributed cache solutions, each optimized for different use cases:

Criterion	Redis	Memcached
Data structures	String, Hash, List, Set, Sorted Set, Stream, JSON, Vector...	String only (key-value)
Persistence	RDB + AOF (optional)	None
Replication	Master-Replica, Redis Cluster	None (client-side sharding)
Pub/Sub	Yes	No
Lua scripting	Yes	No
Memory efficiency	Overhead due to data structures	Better for pure key-value
Multi-thread	Single-threaded (I/O threads since v6)	Natively multi-threaded
Max value size	512MB	1MB (default)

Pick Redis when

You need rich data structures (sorted sets for leaderboards, streams for event logs)
You need persistence (cache recovery after restart)
You need pub/sub for cache invalidation across services
You need complex atomic operations (Lua scripts)
The cache also serves as session store, rate limiter, or queue

Pick Memcached when

You only need pure key-value caching, no data structures
You need maximum memory efficiency for a large working set
Your workload is simple: GET/SET/DELETE
You want native multi-threading (Memcached handles 200K+ ops/sec on multi-core better)

9. Eviction policies — when the cache fills up

When the cache hits memory limits, you must decide which entries to remove. Common eviction policies:

Policy	Logic	Fits when
LRU (Least Recently Used)	Evict the entry not accessed for the longest time	Workload has temporal locality — recent access predicts future access
LFU (Least Frequently Used)	Evict the least frequently accessed entry	Workload has a stable hot set — certain keys are always "hot"
Random	Evict randomly	No clear access pattern, simplest approach
TTL-based	Evict expired entries first	When every entry has TTL and you want to prioritize fresh data

What does Redis use by default?

Redis uses approximated LRU (samples 5 random keys, evicts the oldest). Since Redis 4.0, you can enable maxmemory-policy allkeys-lfu for LFU. In practice, LFU typically yields 5–10% higher hit rates than LRU on workloads with clearly hot keys.

10. Monitoring — measuring cache effectiveness

Caches don't prove their value on their own — you need to monitor these metrics:

Hit rate

Fraction of requests served from cache. Target: >90% for most workloads. If below 80%, check: TTL too short? Working set larger than cache capacity? Key pattern too unique (a per-user key)?

hit_rate = cache_hits / (cache_hits + cache_misses)

Eviction rate

Entries evicted per second due to memory pressure. A high eviction rate = cache too small or TTL too long. Fix: add memory, shorten TTL, or only cache hot data.

Latency (P50/P99)

Cache read P50 should be <1ms (local) or <5ms (distributed). P99 >10ms signals network issues, cluster overload, or oversized keys.

Memory usage

Redis INFO memory reports used_memory vs maxmemory. Keep usage under 80% of maxmemory to absorb peak traffic.

Conclusion

Designing a distributed cache system isn't just "add Redis" — it's a set of architectural decisions: pick the right pattern (Cache-Aside, Write-Through, Write-Behind), design a matching invalidation strategy (TTL, event-based, version-based), distribute data efficiently via consistent hashing, mitigate cache stampedes, and set up multi-layer caching for each tier of the system.

There's no one-size-fits-all answer. Every system has its own workload pattern — read-heavy or write-heavy, strong or eventual consistency, large or small working set. Understanding the trade-offs helps you make the right design choices.

References:

#system design #Distributed Cache #redis #Caching Patterns #kiến trúc hệ thống #.NET

# Distributed Caching: Designing a Distributed Cache System from A to Z

<1msLocal cache read latency

1-5msDistributed cache read latency

90%+Target cache hit rate

10-100xDatabase load reduction

## 1. Why do we need distributed caching?

Before jumping into patterns, let's understand the problem that distributed caching actually solves:

- **Local cache** (in-process, e.g., `MemoryCache` in .NET) has sub-microsecond latency, but each app instance keeps its own copy — leading to inconsistency when data changes
- **Distributed cache** (Redis, Memcached) adds 1–5ms of network latency but provides **a single consistent view** across all instances
- A typical database query costs 10–100ms, so a cache hit at 1–5ms is still 10–100x faster

```
graph LR
    subgraph Local Cache
    A1[App Instance 1] --> LC1[MemoryCache A]
    A2[App Instance 2] --> LC2[MemoryCache B]
    A3[App Instance 3] --> LC3[MemoryCache C]
    LC1 -.->|Different| LC2
    LC2 -.->|Different| LC3
    end

subgraph Distributed Cache
    B1[App Instance 1] --> DC[Redis Cluster]
    B2[App Instance 2] --> DC
    B3[App Instance 3] --> DC
    end

style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LC1 fill:#ff9800,stroke:#fff,color:#fff
    style LC2 fill:#ff9800,stroke:#fff,color:#fff
    style LC3 fill:#ff9800,stroke:#fff,color:#fff
    style B1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DC fill:#4CAF50,stroke:#fff,color:#fff

```
Local cache: each instance has its own copy (inconsistent) — Distributed cache: single source of truth

## 2. The four core caching patterns

Every caching system is built on one of these four patterns. Each has its own trade-offs between consistency, latency, and complexity.

### 2.1 Cache-Aside (Lazy Loading)

```
sequenceDiagram
    participant App
    participant Cache
    participant DB

App->>Cache: GET user:123
    Cache-->>App: MISS
    App->>DB: SELECT * FROM users WHERE id=123
    DB-->>App: {name: "Alice", ...}
    App->>Cache: SET user:123 (TTL 5min)
    App-->>App: Return data

Note over App,DB: Next read
    App->>Cache: GET user:123
    Cache-->>App: HIT -> {name: "Alice", ...}

```
Cache-Aside: the application controls both the read and write paths

```csharp
// Cache-Aside in .NET with IDistributedCache
public async Task<UserProfile?> GetUserProfile(int userId)
{
    var cacheKey = $"user:{userId}";

// 1. Try the cache
    var cached = await _cache.GetStringAsync(cacheKey);
    if (cached != null)
        return JsonSerializer.Deserialize<UserProfile>(cached);

// 2. Cache miss -> fetch from DB
    var user = await _dbContext.Users.FindAsync(userId);
    if (user == null) return null;

// 3. Write to cache with TTL
    await _cache.SetStringAsync(cacheKey,
        JsonSerializer.Serialize(user),
        new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
        });

return user;
}

```

#### Pros and cons

**Pros:** only caches data that's actually read (avoids wasted memory), the application has full control of the logic, easy to implement. **Cons:** the first cache miss is always slow (cold start), and there's a staleness window between a DB update and cache invalidation.

### 2.2 Read-Through

Similar to Cache-Aside, but the **cache layer is responsible** for fetching from the database on a miss. The application only talks to the cache and doesn't need to know about the database.

```csharp
// Read-Through abstraction
public class ReadThroughCache<T>
{
    private readonly IDistributedCache _cache;
    private readonly Func<string, Task<T?>> _loader;

public async Task<T?> Get(string key)
    {
        var cached = await _cache.GetStringAsync(key);
        if (cached != null)
            return JsonSerializer.Deserialize<T>(cached);

// Cache loads from data source itself
        var value = await _loader(key);
        if (value != null)
        {
            await _cache.SetStringAsync(key,
                JsonSerializer.Serialize(value),
                new DistributedCacheEntryOptions
                {
                    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
                });
        }
        return value;
    }
}

```
**Key difference:** business code doesn't need to know about hits/misses — the cache layer encapsulates that completely. Fits when you want to keep cache logic separate from business logic.

### 2.3 Write-Through

Every write passes through the cache before going to the database. The cache always contains the latest data, but write latency increases because both writes happen synchronously.

```
sequenceDiagram
    participant App
    participant Cache
    participant DB

App->>Cache: SET user:123 = {updated data}
    Cache->>DB: UPDATE users SET ... WHERE id=123
    DB-->>Cache: OK
    Cache-->>App: OK

Note over App,DB: Reads are always consistent
    App->>Cache: GET user:123
    Cache-->>App: HIT -> {updated data}

```
Write-Through: the cache proxies every write, ensuring consistency

**When to use:** when consistency is the top priority — session management, user profiles, shopping carts. Not a fit for write-heavy workloads because every write incurs extra latency.

### 2.4 Write-Behind (Write-Back)

Writes only go to the cache; the database is updated **asynchronously** later (batch, periodic flush). This is the fastest write pattern, but risks data loss if a cache node dies before flush.

```
sequenceDiagram
    participant App
    participant Cache
    participant Queue
    participant DB

App->>Cache: SET counter:page_views = 15042
    Cache-->>App: OK (immediately)

Note over Cache,DB: Async flush every 10s
    Cache->>Queue: Batch updates
    Queue->>DB: BULK UPDATE counters
    DB-->>Queue: OK

```
Write-Behind: write cache first, flush DB later — fast, but risks data loss

**When to use:** page-view counters, analytics events, log aggregation — write-heavy workloads where losing a few seconds of trailing data is acceptable.

## 3. Comparing the four patterns

| Pattern | Read latency | Write latency | Consistency | Complexity |
| --- | --- | --- | --- | --- |
| **Cache-Aside** | Slow miss, fast hit | Low (direct DB write) | Eventual (staleness window) | Low |
| **Read-Through** | Slow miss, fast hit | Low | Eventual | Medium |
| **Write-Through** | Always fast (cache hit) | High (sync writes to both) | Strong | Medium |
| **Write-Behind** | Always fast | Very low (cache only) | Weak (possible data loss) | High |

#### In real production

Most large systems **combine multiple patterns**. Example: Cache-Aside for user profiles (read-heavy), Write-Behind for analytics counters (write-heavy), Write-Through for shopping carts (needs consistency).

## 4. Cache invalidation — the hardest problem
> "There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

When the database changes, the cache needs to know so it doesn't serve stale data. Three main strategies:

### 4.1 TTL-Based (Time To Live)

Each cache entry expires after a fixed time. Simplest, but creates the longest staleness window.

```csharp
// Fixed TTL
await _cache.SetStringAsync("product:456", data,
    new DistributedCacheEntryOptions
    {
        AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
    });

// Sliding TTL — resets every read
await _cache.SetStringAsync("session:abc", sessionData,
    new DistributedCacheEntryOptions
    {
        SlidingExpiration = TimeSpan.FromMinutes(30)
    });

```
**TTL rule of thumb:** short TTLs (30s–2min) for rapidly changing data (stock prices, live scores). Medium TTLs (5–15min) for slow-changing data (product catalogs, user profiles). Long TTLs (1–24h) for near-static data (config, translations).

### 4.2 Event-Based Invalidation

When data in the database changes, an event is emitted to invalidate the corresponding cache entry. Most accurate, but more complex.

```
graph LR
    A[App: UPDATE user] --> B[Database]
    B --> C[CDC / Change Event]
    C --> D[Message Bus]
    D --> E[Cache Invalidator]
    E --> F[Redis: DEL user:123]

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff

```
Event-based invalidation via Change Data Capture (CDC)

```csharp
// Event-based invalidation in .NET
public class UserUpdatedHandler : INotificationHandler<UserUpdatedEvent>
{
    private readonly IDistributedCache _cache;

public async Task Handle(UserUpdatedEvent notification,
        CancellationToken cancellationToken)
    {
        // Invalidate every related cache key
        await _cache.RemoveAsync($"user:{notification.UserId}");
        await _cache.RemoveAsync($"user-profile:{notification.UserId}");
        await _cache.RemoveAsync($"user-permissions:{notification.UserId}");
    }
}

```

### 4.3 Version-Based Invalidation

Instead of deleting the cache entry, embed a version number in the cache key. When data changes, increment the version — clients automatically miss on the old key.

```csharp
// Version-based: no invalidation, just a new version
var version = await _cache.GetStringAsync("product:456:version") ?? "0";
var cacheKey = $"product:456:v{version}";

var cached = await _cache.GetStringAsync(cacheKey);
if (cached != null) return cached;

// On product update
var newVersion = int.Parse(version) + 1;
await _cache.SetStringAsync("product:456:version", newVersion.ToString());
// Old keys expire naturally via TTL — no DEL needed

```
**Pros:** old and new versions can coexist, no race conditions during multi-instance deploys. **Cons:** one extra lookup for the version key.

## 5. Consistent hashing — distributing data across a cluster

When a cache cluster has multiple nodes, you need to decide which key lives where. **Consistent hashing** solves this with an important property: adding/removing a node only moves ~1/N of the data (N = number of nodes) instead of rehashing everything.

```
graph TD
    subgraph Hash Ring
    N1[Node A  
0-120 deg]
    N2[Node B  
120-240 deg]
    N3[Node C  
240-360 deg]
    end

K1[Key user:1  
hash=85] --> N1
    K2[Key user:2  
hash=190] --> N2
    K3[Key user:3  
hash=310] --> N3
    K4[Key user:4  
hash=45] --> N1

style N1 fill:#e94560,stroke:#fff,color:#fff
    style N2 fill:#2c3e50,stroke:#fff,color:#fff
    style N3 fill:#4CAF50,stroke:#fff,color:#fff
    style K1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style K2 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
    style K3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
    style K4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
Consistent hashing ring: each key maps to the nearest node clockwise

In practice, Redis Cluster uses **hash slots** (16,384 slots) rather than a pure consistent-hashing ring, but the principle is the same: each node owns a slot range, and scaling only migrates part of the slots.

#### Virtual nodes fix uneven distribution

With just 3 physical nodes on the hash ring, distribution can be very skewed (node A gets 60% of keys, node C only 10%). Solution: each physical node is represented by **100–200 virtual nodes** spread evenly on the ring. Result: near-even distribution, under 5% variance.

## 6. Handling cache stampede (thundering herd)

```
sequenceDiagram
    participant R1 as Request 1
    participant R2 as Request 2
    participant R3 as Request N...
    participant Cache
    participant DB

Note over Cache: Key "hot:product" just expired!

R1->>Cache: GET hot:product
    Cache-->>R1: MISS
    R2->>Cache: GET hot:product
    Cache-->>R2: MISS
    R3->>Cache: GET hot:product
    Cache-->>R3: MISS

R1->>DB: SELECT * FROM products...
    R2->>DB: SELECT * FROM products...
    R3->>DB: SELECT * FROM products...

Note over DB: Database overload!

```
Cache stampede: hundreds of concurrent misses overwhelm the database

Three mitigation techniques:

### 6.1 Mutex lock (singleflight)

Allow **only one request** to rebuild the cache; others wait for the result:

```csharp
public async Task<T?> GetWithLock<T>(string key,
    Func<Task<T?>> factory, TimeSpan ttl)
{
    var cached = await _cache.GetStringAsync(key);
    if (cached != null)
        return JsonSerializer.Deserialize<T>(cached);

var lockKey = $"lock:{key}";
    var lockAcquired = await _redis.StringSetAsync(
        lockKey, "1", TimeSpan.FromSeconds(10), When.NotExists);

if (lockAcquired)
    {
        try
        {
            // Only 1 request rebuilds
            var value = await factory();
            if (value != null)
            {
                await _cache.SetStringAsync(key,
                    JsonSerializer.Serialize(value),
                    new DistributedCacheEntryOptions
                    {
                        AbsoluteExpirationRelativeToNow = ttl
                    });
            }
            return value;
        }
        finally
        {
            await _redis.KeyDeleteAsync(lockKey);
        }
    }

// Other requests: wait and retry
    await Task.Delay(100);
    return await GetWithLock(key, factory, ttl);
}

```

### 6.2 Randomized TTL (TTL jitter)

Add a random offset to TTL so keys don't expire simultaneously:

```csharp
// Instead of a fixed 5-minute TTL for every key
var baseTtl = TimeSpan.FromMinutes(5);
var jitter = TimeSpan.FromSeconds(Random.Shared.Next(0, 60));
var ttl = baseTtl + jitter; // 5:00 to 6:00 minutes

```

### 6.3 Early recompute (probabilistic)

Before a key expires, a random request proactively rebuilds it. The probability of rebuilding grows as the TTL nears zero:

```csharp
// Check whether to rebuild early
var remainingTtl = await _redis.KeyTimeToLiveAsync(key);
if (remainingTtl.HasValue)
{
    var totalTtl = TimeSpan.FromMinutes(5);
    var ratio = remainingTtl.Value / totalTtl;
    // When TTL is under 20%, 30% chance to rebuild
    if (ratio < 0.2 && Random.Shared.NextDouble() &lt; 0.3)
    {
        _ = Task.Run(() => RebuildCacheAsync(key)); // Fire-and-forget
    }
}

```

## 7. Multi-layer caching — real-world architecture

Production systems typically stack multiple cache layers, each with its own characteristics:

```
graph TD
    U[User Request] --> CDN[L1: CDN / Edge Cache  
Cloudflare, CloudFront]
    CDN -->|MISS| LB[Load Balancer]
    LB --> APP[Application Server]
    APP --> L2[L2: In-Process Cache  
MemoryCache, ConcurrentDict]
    L2 -->|MISS| L3[L3: Distributed Cache  
Redis Cluster]
    L3 -->|MISS| DB[Database  
SQL Server, PostgreSQL]

DB -->|Populate| L3
    L3 -->|Populate| L2
    APP -->|Set Cache-Control| CDN

style U fill:#e94560,stroke:#fff,color:#fff
    style CDN fill:#2c3e50,stroke:#fff,color:#fff
    style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style L2 fill:#4CAF50,stroke:#fff,color:#fff
    style L3 fill:#4CAF50,stroke:#fff,color:#fff
    style DB fill:#2c3e50,stroke:#fff,color:#fff

```
Multi-layer caching: CDN -> In-Process -> Distributed Cache -> Database

| Layer | Latency | Capacity | Consistency | Scope |
| --- | --- | --- | --- | --- |
| **L1: CDN/Edge** | <10ms (from edge) | Very large (distributed) | Eventual (TTL) | Public, static content |
| **L2: In-Process** | <0.1ms | Small (instance RAM) | Inconsistent across instances | Hot data, per-instance |
| **L3: Distributed** | 1-5ms | Large (cluster RAM) | Consistent (single source) | Shared state |
| **Database** | 10-100ms | Very large (disk) | Strong (ACID) | Source of truth |

```csharp
// Multi-layer cache in .NET
public class MultiLayerCache<T>
{
    private readonly IMemoryCache _l2;
    private readonly IDistributedCache _l3;

public async Task<T?> Get(string key, Func<Task<T?>> factory)
    {
        // L2: In-process cache (sub-microsecond)
        if (_l2.TryGetValue(key, out T? l2Value))
            return l2Value;

// L3: Distributed cache (1-5ms)
        var l3Data = await _l3.GetStringAsync(key);
        if (l3Data != null)
        {
            var value = JsonSerializer.Deserialize<T>(l3Data);
            _l2.Set(key, value, TimeSpan.FromMinutes(1));
            return value;
        }

// Database (10-100ms)
        var dbValue = await factory();
        if (dbValue != null)
        {
            var json = JsonSerializer.Serialize(dbValue);
            await _l3.SetStringAsync(key, json,
                new DistributedCacheEntryOptions
                {
                    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
                });
            _l2.Set(key, dbValue, TimeSpan.FromMinutes(1));
        }
        return dbValue;
    }
}

```

#### L2 cache: why must TTL be shorter than L3?

## 8. Redis vs Memcached — pick the right tool

The two most popular distributed cache solutions, each optimized for different use cases:

| Criterion | Redis | Memcached |
| --- | --- | --- |
| Data structures | String, Hash, List, Set, Sorted Set, Stream, JSON, Vector... | String only (key-value) |
| Persistence | RDB + AOF (optional) | None |
| Replication | Master-Replica, Redis Cluster | None (client-side sharding) |
| Pub/Sub | Yes | No |
| Lua scripting | Yes | No |
| Memory efficiency | Overhead due to data structures | Better for pure key-value |
| Multi-thread | Single-threaded (I/O threads since v6) | Natively multi-threaded |
| Max value size | 512MB | 1MB (default) |

#### Pick Redis when

- You need rich data structures (sorted sets for leaderboards, streams for event logs)
- You need persistence (cache recovery after restart)
- You need pub/sub for cache invalidation across services
- You need complex atomic operations (Lua scripts)
- The cache also serves as session store, rate limiter, or queue

#### Pick Memcached when

- You only need pure key-value caching, no data structures
- You need maximum memory efficiency for a large working set
- Your workload is simple: GET/SET/DELETE
- You want native multi-threading (Memcached handles 200K+ ops/sec on multi-core better)

## 9. Eviction policies — when the cache fills up

When the cache hits memory limits, you must decide which entries to remove. Common eviction policies:

| Policy | Logic | Fits when |
| --- | --- | --- |
| **LRU** (Least Recently Used) | Evict the entry not accessed for the longest time | Workload has temporal locality — recent access predicts future access |
| **LFU** (Least Frequently Used) | Evict the least frequently accessed entry | Workload has a stable hot set — certain keys are always "hot" |
| **Random** | Evict randomly | No clear access pattern, simplest approach |
| **TTL-based** | Evict expired entries first | When every entry has TTL and you want to prioritize fresh data |

#### What does Redis use by default?

Redis uses **approximated LRU** (samples 5 random keys, evicts the oldest). Since Redis 4.0, you can enable `maxmemory-policy allkeys-lfu` for LFU. In practice, LFU typically yields 5–10% higher hit rates than LRU on workloads with clearly hot keys.

## 10. Monitoring — measuring cache effectiveness

Caches don't prove their value on their own — you need to monitor these metrics:

#### Hit rate

Fraction of requests served from cache. Target: **>90%** for most workloads. If below 80%, check: TTL too short? Working set larger than cache capacity? Key pattern too unique (a per-user key)?

```
hit_rate = cache_hits / (cache_hits + cache_misses)
```

#### Eviction rate

Entries evicted per second due to memory pressure. A high eviction rate = cache too small or TTL too long. Fix: add memory, shorten TTL, or only cache hot data.

#### Latency (P50/P99)

Cache read P50 should be <1ms (local) or <5ms (distributed). P99 >10ms signals network issues, cluster overload, or oversized keys.

#### Memory usage

Redis `INFO memory` reports used_memory vs maxmemory. Keep usage under 80% of maxmemory to absorb peak traffic.

## Conclusion

**References:**

- [AWS — Database Caching Strategies Using Redis](https://docs.aws.amazon.com/whitepapers/latest/database-caching-strategies-using-redis/caching-patterns.html)
- [Microsoft Learn — Cache-Aside Pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/cache-aside)
- [Distributed System Authority — Distributed Caching Patterns](https://distributedsystemauthority.com/distributed-caching)
- [Hello Interview — Caching for System Design](https://www.hellointerview.com/learn/system-design/core-concepts/caching)
- [Design Gurus — Caching in System Design Interviews](https://www.designgurus.io/blog/caching-system-design-interview)
- [Vinta Software — Cache Consistency in Distributed Environments](https://www.vintasoftware.com/blog/scaling-software-how-to-manage-cache-consistency)

AI Agent Orchestration — 6 Patterns for Production Agent Coordination in 2026

OpenTelemetry — The Observability Standard for Distributed Systems

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.