Distributed Caching: Designing a Distributed Cache System from A to Z

Posted on: 4/21/2026 2:12:49 AM

<1msLocal cache read latency
1-5msDistributed cache read latency
90%+Target cache hit rate
10-100xDatabase load reduction

Caching is one of the most important techniques in system design. But the gap between "drop Redis between the app and the database" and "design a truly reliable distributed cache system" is huge. This post digs into distributed cache architecture from a system-design perspective: the core patterns, invalidation strategies, data distribution via consistent hashing, cache stampede mitigation, and multi-layer caching design for production systems.

1. Why do we need distributed caching?

Before jumping into patterns, let's understand the problem that distributed caching actually solves:

  • Local cache (in-process, e.g., MemoryCache in .NET) has sub-microsecond latency, but each app instance keeps its own copy — leading to inconsistency when data changes
  • Distributed cache (Redis, Memcached) adds 1–5ms of network latency but provides a single consistent view across all instances
  • A typical database query costs 10–100ms, so a cache hit at 1–5ms is still 10–100x faster
graph LR
    subgraph Local Cache
    A1[App Instance 1] --> LC1[MemoryCache A]
    A2[App Instance 2] --> LC2[MemoryCache B]
    A3[App Instance 3] --> LC3[MemoryCache C]
    LC1 -.->|Different| LC2
    LC2 -.->|Different| LC3
    end

    subgraph Distributed Cache
    B1[App Instance 1] --> DC[Redis Cluster]
    B2[App Instance 2] --> DC
    B3[App Instance 3] --> DC
    end

    style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style LC1 fill:#ff9800,stroke:#fff,color:#fff
    style LC2 fill:#ff9800,stroke:#fff,color:#fff
    style LC3 fill:#ff9800,stroke:#fff,color:#fff
    style B1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style DC fill:#4CAF50,stroke:#fff,color:#fff

Local cache: each instance has its own copy (inconsistent) — Distributed cache: single source of truth

2. The four core caching patterns

Every caching system is built on one of these four patterns. Each has its own trade-offs between consistency, latency, and complexity.

2.1 Cache-Aside (Lazy Loading)

The most common pattern. The application is in full control: on reads, check the cache first; on a miss, fetch from the database and write to the cache. On writes, write directly to the database, then invalidate the cache.

sequenceDiagram
    participant App
    participant Cache
    participant DB

    App->>Cache: GET user:123
    Cache-->>App: MISS
    App->>DB: SELECT * FROM users WHERE id=123
    DB-->>App: {name: "Alice", ...}
    App->>Cache: SET user:123 (TTL 5min)
    App-->>App: Return data

    Note over App,DB: Next read
    App->>Cache: GET user:123
    Cache-->>App: HIT -> {name: "Alice", ...}

Cache-Aside: the application controls both the read and write paths

// Cache-Aside in .NET with IDistributedCache
public async Task<UserProfile?> GetUserProfile(int userId)
{
    var cacheKey = $"user:{userId}";

    // 1. Try the cache
    var cached = await _cache.GetStringAsync(cacheKey);
    if (cached != null)
        return JsonSerializer.Deserialize<UserProfile>(cached);

    // 2. Cache miss -> fetch from DB
    var user = await _dbContext.Users.FindAsync(userId);
    if (user == null) return null;

    // 3. Write to cache with TTL
    await _cache.SetStringAsync(cacheKey,
        JsonSerializer.Serialize(user),
        new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
        });

    return user;
}

Pros and cons

Pros: only caches data that's actually read (avoids wasted memory), the application has full control of the logic, easy to implement. Cons: the first cache miss is always slow (cold start), and there's a staleness window between a DB update and cache invalidation.

2.2 Read-Through

Similar to Cache-Aside, but the cache layer is responsible for fetching from the database on a miss. The application only talks to the cache and doesn't need to know about the database.

// Read-Through abstraction
public class ReadThroughCache<T>
{
    private readonly IDistributedCache _cache;
    private readonly Func<string, Task<T?>> _loader;

    public async Task<T?> Get(string key)
    {
        var cached = await _cache.GetStringAsync(key);
        if (cached != null)
            return JsonSerializer.Deserialize<T>(cached);

        // Cache loads from data source itself
        var value = await _loader(key);
        if (value != null)
        {
            await _cache.SetStringAsync(key,
                JsonSerializer.Serialize(value),
                new DistributedCacheEntryOptions
                {
                    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
                });
        }
        return value;
    }
}

Key difference: business code doesn't need to know about hits/misses — the cache layer encapsulates that completely. Fits when you want to keep cache logic separate from business logic.

2.3 Write-Through

Every write passes through the cache before going to the database. The cache always contains the latest data, but write latency increases because both writes happen synchronously.

sequenceDiagram
    participant App
    participant Cache
    participant DB

    App->>Cache: SET user:123 = {updated data}
    Cache->>DB: UPDATE users SET ... WHERE id=123
    DB-->>Cache: OK
    Cache-->>App: OK

    Note over App,DB: Reads are always consistent
    App->>Cache: GET user:123
    Cache-->>App: HIT -> {updated data}

Write-Through: the cache proxies every write, ensuring consistency

When to use: when consistency is the top priority — session management, user profiles, shopping carts. Not a fit for write-heavy workloads because every write incurs extra latency.

2.4 Write-Behind (Write-Back)

Writes only go to the cache; the database is updated asynchronously later (batch, periodic flush). This is the fastest write pattern, but risks data loss if a cache node dies before flush.

sequenceDiagram
    participant App
    participant Cache
    participant Queue
    participant DB

    App->>Cache: SET counter:page_views = 15042
    Cache-->>App: OK (immediately)

    Note over Cache,DB: Async flush every 10s
    Cache->>Queue: Batch updates
    Queue->>DB: BULK UPDATE counters
    DB-->>Queue: OK

Write-Behind: write cache first, flush DB later — fast, but risks data loss

When to use: page-view counters, analytics events, log aggregation — write-heavy workloads where losing a few seconds of trailing data is acceptable.

3. Comparing the four patterns

PatternRead latencyWrite latencyConsistencyComplexity
Cache-AsideSlow miss, fast hitLow (direct DB write)Eventual (staleness window)Low
Read-ThroughSlow miss, fast hitLowEventualMedium
Write-ThroughAlways fast (cache hit)High (sync writes to both)StrongMedium
Write-BehindAlways fastVery low (cache only)Weak (possible data loss)High

In real production

Most large systems combine multiple patterns. Example: Cache-Aside for user profiles (read-heavy), Write-Behind for analytics counters (write-heavy), Write-Through for shopping carts (needs consistency).

4. Cache invalidation — the hardest problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

When the database changes, the cache needs to know so it doesn't serve stale data. Three main strategies:

4.1 TTL-Based (Time To Live)

Each cache entry expires after a fixed time. Simplest, but creates the longest staleness window.

// Fixed TTL
await _cache.SetStringAsync("product:456", data,
    new DistributedCacheEntryOptions
    {
        AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
    });

// Sliding TTL — resets every read
await _cache.SetStringAsync("session:abc", sessionData,
    new DistributedCacheEntryOptions
    {
        SlidingExpiration = TimeSpan.FromMinutes(30)
    });

TTL rule of thumb: short TTLs (30s–2min) for rapidly changing data (stock prices, live scores). Medium TTLs (5–15min) for slow-changing data (product catalogs, user profiles). Long TTLs (1–24h) for near-static data (config, translations).

4.2 Event-Based Invalidation

When data in the database changes, an event is emitted to invalidate the corresponding cache entry. Most accurate, but more complex.

graph LR
    A[App: UPDATE user] --> B[Database]
    B --> C[CDC / Change Event]
    C --> D[Message Bus]
    D --> E[Cache Invalidator]
    E --> F[Redis: DEL user:123]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#2c3e50,stroke:#fff,color:#fff
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#4CAF50,stroke:#fff,color:#fff

Event-based invalidation via Change Data Capture (CDC)

// Event-based invalidation in .NET
public class UserUpdatedHandler : INotificationHandler<UserUpdatedEvent>
{
    private readonly IDistributedCache _cache;

    public async Task Handle(UserUpdatedEvent notification,
        CancellationToken cancellationToken)
    {
        // Invalidate every related cache key
        await _cache.RemoveAsync($"user:{notification.UserId}");
        await _cache.RemoveAsync($"user-profile:{notification.UserId}");
        await _cache.RemoveAsync($"user-permissions:{notification.UserId}");
    }
}

4.3 Version-Based Invalidation

Instead of deleting the cache entry, embed a version number in the cache key. When data changes, increment the version — clients automatically miss on the old key.

// Version-based: no invalidation, just a new version
var version = await _cache.GetStringAsync("product:456:version") ?? "0";
var cacheKey = $"product:456:v{version}";

var cached = await _cache.GetStringAsync(cacheKey);
if (cached != null) return cached;

// On product update
var newVersion = int.Parse(version) + 1;
await _cache.SetStringAsync("product:456:version", newVersion.ToString());
// Old keys expire naturally via TTL — no DEL needed

Pros: old and new versions can coexist, no race conditions during multi-instance deploys. Cons: one extra lookup for the version key.

5. Consistent hashing — distributing data across a cluster

When a cache cluster has multiple nodes, you need to decide which key lives where. Consistent hashing solves this with an important property: adding/removing a node only moves ~1/N of the data (N = number of nodes) instead of rehashing everything.

graph TD
    subgraph Hash Ring
    N1[Node A
0-120 deg] N2[Node B
120-240 deg] N3[Node C
240-360 deg] end K1[Key user:1
hash=85] --> N1 K2[Key user:2
hash=190] --> N2 K3[Key user:3
hash=310] --> N3 K4[Key user:4
hash=45] --> N1 style N1 fill:#e94560,stroke:#fff,color:#fff style N2 fill:#2c3e50,stroke:#fff,color:#fff style N3 fill:#4CAF50,stroke:#fff,color:#fff style K1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style K2 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50 style K3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style K4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Consistent hashing ring: each key maps to the nearest node clockwise

In practice, Redis Cluster uses hash slots (16,384 slots) rather than a pure consistent-hashing ring, but the principle is the same: each node owns a slot range, and scaling only migrates part of the slots.

Virtual nodes fix uneven distribution

With just 3 physical nodes on the hash ring, distribution can be very skewed (node A gets 60% of keys, node C only 10%). Solution: each physical node is represented by 100–200 virtual nodes spread evenly on the ring. Result: near-even distribution, under 5% variance.

6. Handling cache stampede (thundering herd)

Cache stampede happens when a popular cache entry expires and hundreds of concurrent requests all miss the cache and hit the database at the same time. It's one of the most common failures when deploying a cache without guards.

sequenceDiagram
    participant R1 as Request 1
    participant R2 as Request 2
    participant R3 as Request N...
    participant Cache
    participant DB

    Note over Cache: Key "hot:product" just expired!

    R1->>Cache: GET hot:product
    Cache-->>R1: MISS
    R2->>Cache: GET hot:product
    Cache-->>R2: MISS
    R3->>Cache: GET hot:product
    Cache-->>R3: MISS

    R1->>DB: SELECT * FROM products...
    R2->>DB: SELECT * FROM products...
    R3->>DB: SELECT * FROM products...

    Note over DB: Database overload!

Cache stampede: hundreds of concurrent misses overwhelm the database

Three mitigation techniques:

6.1 Mutex lock (singleflight)

Allow only one request to rebuild the cache; others wait for the result:

public async Task<T?> GetWithLock<T>(string key,
    Func<Task<T?>> factory, TimeSpan ttl)
{
    var cached = await _cache.GetStringAsync(key);
    if (cached != null)
        return JsonSerializer.Deserialize<T>(cached);

    var lockKey = $"lock:{key}";
    var lockAcquired = await _redis.StringSetAsync(
        lockKey, "1", TimeSpan.FromSeconds(10), When.NotExists);

    if (lockAcquired)
    {
        try
        {
            // Only 1 request rebuilds
            var value = await factory();
            if (value != null)
            {
                await _cache.SetStringAsync(key,
                    JsonSerializer.Serialize(value),
                    new DistributedCacheEntryOptions
                    {
                        AbsoluteExpirationRelativeToNow = ttl
                    });
            }
            return value;
        }
        finally
        {
            await _redis.KeyDeleteAsync(lockKey);
        }
    }

    // Other requests: wait and retry
    await Task.Delay(100);
    return await GetWithLock(key, factory, ttl);
}

6.2 Randomized TTL (TTL jitter)

Add a random offset to TTL so keys don't expire simultaneously:

// Instead of a fixed 5-minute TTL for every key
var baseTtl = TimeSpan.FromMinutes(5);
var jitter = TimeSpan.FromSeconds(Random.Shared.Next(0, 60));
var ttl = baseTtl + jitter; // 5:00 to 6:00 minutes

6.3 Early recompute (probabilistic)

Before a key expires, a random request proactively rebuilds it. The probability of rebuilding grows as the TTL nears zero:

// Check whether to rebuild early
var remainingTtl = await _redis.KeyTimeToLiveAsync(key);
if (remainingTtl.HasValue)
{
    var totalTtl = TimeSpan.FromMinutes(5);
    var ratio = remainingTtl.Value / totalTtl;
    // When TTL is under 20%, 30% chance to rebuild
    if (ratio < 0.2 && Random.Shared.NextDouble() < 0.3)
    {
        _ = Task.Run(() => RebuildCacheAsync(key)); // Fire-and-forget
    }
}

7. Multi-layer caching — real-world architecture

Production systems typically stack multiple cache layers, each with its own characteristics:

graph TD
    U[User Request] --> CDN[L1: CDN / Edge Cache
Cloudflare, CloudFront] CDN -->|MISS| LB[Load Balancer] LB --> APP[Application Server] APP --> L2[L2: In-Process Cache
MemoryCache, ConcurrentDict] L2 -->|MISS| L3[L3: Distributed Cache
Redis Cluster] L3 -->|MISS| DB[Database
SQL Server, PostgreSQL] DB -->|Populate| L3 L3 -->|Populate| L2 APP -->|Set Cache-Control| CDN style U fill:#e94560,stroke:#fff,color:#fff style CDN fill:#2c3e50,stroke:#fff,color:#fff style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style L2 fill:#4CAF50,stroke:#fff,color:#fff style L3 fill:#4CAF50,stroke:#fff,color:#fff style DB fill:#2c3e50,stroke:#fff,color:#fff

Multi-layer caching: CDN -> In-Process -> Distributed Cache -> Database

LayerLatencyCapacityConsistencyScope
L1: CDN/Edge<10ms (from edge)Very large (distributed)Eventual (TTL)Public, static content
L2: In-Process<0.1msSmall (instance RAM)Inconsistent across instancesHot data, per-instance
L3: Distributed1-5msLarge (cluster RAM)Consistent (single source)Shared state
Database10-100msVery large (disk)Strong (ACID)Source of truth
// Multi-layer cache in .NET
public class MultiLayerCache<T>
{
    private readonly IMemoryCache _l2;
    private readonly IDistributedCache _l3;

    public async Task<T?> Get(string key, Func<Task<T?>> factory)
    {
        // L2: In-process cache (sub-microsecond)
        if (_l2.TryGetValue(key, out T? l2Value))
            return l2Value;

        // L3: Distributed cache (1-5ms)
        var l3Data = await _l3.GetStringAsync(key);
        if (l3Data != null)
        {
            var value = JsonSerializer.Deserialize<T>(l3Data);
            _l2.Set(key, value, TimeSpan.FromMinutes(1));
            return value;
        }

        // Database (10-100ms)
        var dbValue = await factory();
        if (dbValue != null)
        {
            var json = JsonSerializer.Serialize(dbValue);
            await _l3.SetStringAsync(key, json,
                new DistributedCacheEntryOptions
                {
                    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
                });
            _l2.Set(key, dbValue, TimeSpan.FromMinutes(1));
        }
        return dbValue;
    }
}

L2 cache: why must TTL be shorter than L3?

When data changes, you can invalidate L3 (Redis) via events. But L2 (in-process) on every instance doesn't receive those events — it must wait for TTL expiration. Therefore, L2 TTL should be 10–20% of L3 TTL to reduce the staleness window. Example: L3 = 10 minutes -> L2 = 1–2 minutes.

8. Redis vs Memcached — pick the right tool

The two most popular distributed cache solutions, each optimized for different use cases:

CriterionRedisMemcached
Data structuresString, Hash, List, Set, Sorted Set, Stream, JSON, Vector...String only (key-value)
PersistenceRDB + AOF (optional)None
ReplicationMaster-Replica, Redis ClusterNone (client-side sharding)
Pub/SubYesNo
Lua scriptingYesNo
Memory efficiencyOverhead due to data structuresBetter for pure key-value
Multi-threadSingle-threaded (I/O threads since v6)Natively multi-threaded
Max value size512MB1MB (default)

Pick Redis when

  • You need rich data structures (sorted sets for leaderboards, streams for event logs)
  • You need persistence (cache recovery after restart)
  • You need pub/sub for cache invalidation across services
  • You need complex atomic operations (Lua scripts)
  • The cache also serves as session store, rate limiter, or queue

Pick Memcached when

  • You only need pure key-value caching, no data structures
  • You need maximum memory efficiency for a large working set
  • Your workload is simple: GET/SET/DELETE
  • You want native multi-threading (Memcached handles 200K+ ops/sec on multi-core better)

9. Eviction policies — when the cache fills up

When the cache hits memory limits, you must decide which entries to remove. Common eviction policies:

PolicyLogicFits when
LRU (Least Recently Used)Evict the entry not accessed for the longest timeWorkload has temporal locality — recent access predicts future access
LFU (Least Frequently Used)Evict the least frequently accessed entryWorkload has a stable hot set — certain keys are always "hot"
RandomEvict randomlyNo clear access pattern, simplest approach
TTL-basedEvict expired entries firstWhen every entry has TTL and you want to prioritize fresh data

What does Redis use by default?

Redis uses approximated LRU (samples 5 random keys, evicts the oldest). Since Redis 4.0, you can enable maxmemory-policy allkeys-lfu for LFU. In practice, LFU typically yields 5–10% higher hit rates than LRU on workloads with clearly hot keys.

10. Monitoring — measuring cache effectiveness

Caches don't prove their value on their own — you need to monitor these metrics:

Hit rate

Fraction of requests served from cache. Target: >90% for most workloads. If below 80%, check: TTL too short? Working set larger than cache capacity? Key pattern too unique (a per-user key)?

hit_rate = cache_hits / (cache_hits + cache_misses)

Eviction rate

Entries evicted per second due to memory pressure. A high eviction rate = cache too small or TTL too long. Fix: add memory, shorten TTL, or only cache hot data.

Latency (P50/P99)

Cache read P50 should be <1ms (local) or <5ms (distributed). P99 >10ms signals network issues, cluster overload, or oversized keys.

Memory usage

Redis INFO memory reports used_memory vs maxmemory. Keep usage under 80% of maxmemory to absorb peak traffic.

Conclusion

Designing a distributed cache system isn't just "add Redis" — it's a set of architectural decisions: pick the right pattern (Cache-Aside, Write-Through, Write-Behind), design a matching invalidation strategy (TTL, event-based, version-based), distribute data efficiently via consistent hashing, mitigate cache stampedes, and set up multi-layer caching for each tier of the system.

There's no one-size-fits-all answer. Every system has its own workload pattern — read-heavy or write-heavy, strong or eventual consistency, large or small working set. Understanding the trade-offs helps you make the right design choices.

References: