Distributed Caching: Designing a Distributed Cache System from A to Z
Posted on: 4/21/2026 2:12:49 AM
Table of contents
- 1. Why do we need distributed caching?
- 2. The four core caching patterns
- 3. Comparing the four patterns
- 4. Cache invalidation — the hardest problem
- 5. Consistent hashing — distributing data across a cluster
- 6. Handling cache stampede (thundering herd)
- 7. Multi-layer caching — real-world architecture
- 8. Redis vs Memcached — pick the right tool
- 9. Eviction policies — when the cache fills up
- 10. Monitoring — measuring cache effectiveness
- Conclusion
Caching is one of the most important techniques in system design. But the gap between "drop Redis between the app and the database" and "design a truly reliable distributed cache system" is huge. This post digs into distributed cache architecture from a system-design perspective: the core patterns, invalidation strategies, data distribution via consistent hashing, cache stampede mitigation, and multi-layer caching design for production systems.
1. Why do we need distributed caching?
Before jumping into patterns, let's understand the problem that distributed caching actually solves:
- Local cache (in-process, e.g.,
MemoryCachein .NET) has sub-microsecond latency, but each app instance keeps its own copy — leading to inconsistency when data changes - Distributed cache (Redis, Memcached) adds 1–5ms of network latency but provides a single consistent view across all instances
- A typical database query costs 10–100ms, so a cache hit at 1–5ms is still 10–100x faster
graph LR
subgraph Local Cache
A1[App Instance 1] --> LC1[MemoryCache A]
A2[App Instance 2] --> LC2[MemoryCache B]
A3[App Instance 3] --> LC3[MemoryCache C]
LC1 -.->|Different| LC2
LC2 -.->|Different| LC3
end
subgraph Distributed Cache
B1[App Instance 1] --> DC[Redis Cluster]
B2[App Instance 2] --> DC
B3[App Instance 3] --> DC
end
style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style LC1 fill:#ff9800,stroke:#fff,color:#fff
style LC2 fill:#ff9800,stroke:#fff,color:#fff
style LC3 fill:#ff9800,stroke:#fff,color:#fff
style B1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style DC fill:#4CAF50,stroke:#fff,color:#fff
Local cache: each instance has its own copy (inconsistent) — Distributed cache: single source of truth
2. The four core caching patterns
Every caching system is built on one of these four patterns. Each has its own trade-offs between consistency, latency, and complexity.
2.1 Cache-Aside (Lazy Loading)
The most common pattern. The application is in full control: on reads, check the cache first; on a miss, fetch from the database and write to the cache. On writes, write directly to the database, then invalidate the cache.
sequenceDiagram
participant App
participant Cache
participant DB
App->>Cache: GET user:123
Cache-->>App: MISS
App->>DB: SELECT * FROM users WHERE id=123
DB-->>App: {name: "Alice", ...}
App->>Cache: SET user:123 (TTL 5min)
App-->>App: Return data
Note over App,DB: Next read
App->>Cache: GET user:123
Cache-->>App: HIT -> {name: "Alice", ...}
Cache-Aside: the application controls both the read and write paths
// Cache-Aside in .NET with IDistributedCache
public async Task<UserProfile?> GetUserProfile(int userId)
{
var cacheKey = $"user:{userId}";
// 1. Try the cache
var cached = await _cache.GetStringAsync(cacheKey);
if (cached != null)
return JsonSerializer.Deserialize<UserProfile>(cached);
// 2. Cache miss -> fetch from DB
var user = await _dbContext.Users.FindAsync(userId);
if (user == null) return null;
// 3. Write to cache with TTL
await _cache.SetStringAsync(cacheKey,
JsonSerializer.Serialize(user),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
});
return user;
}
Pros and cons
Pros: only caches data that's actually read (avoids wasted memory), the application has full control of the logic, easy to implement. Cons: the first cache miss is always slow (cold start), and there's a staleness window between a DB update and cache invalidation.
2.2 Read-Through
Similar to Cache-Aside, but the cache layer is responsible for fetching from the database on a miss. The application only talks to the cache and doesn't need to know about the database.
// Read-Through abstraction
public class ReadThroughCache<T>
{
private readonly IDistributedCache _cache;
private readonly Func<string, Task<T?>> _loader;
public async Task<T?> Get(string key)
{
var cached = await _cache.GetStringAsync(key);
if (cached != null)
return JsonSerializer.Deserialize<T>(cached);
// Cache loads from data source itself
var value = await _loader(key);
if (value != null)
{
await _cache.SetStringAsync(key,
JsonSerializer.Serialize(value),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
});
}
return value;
}
}
Key difference: business code doesn't need to know about hits/misses — the cache layer encapsulates that completely. Fits when you want to keep cache logic separate from business logic.
2.3 Write-Through
Every write passes through the cache before going to the database. The cache always contains the latest data, but write latency increases because both writes happen synchronously.
sequenceDiagram
participant App
participant Cache
participant DB
App->>Cache: SET user:123 = {updated data}
Cache->>DB: UPDATE users SET ... WHERE id=123
DB-->>Cache: OK
Cache-->>App: OK
Note over App,DB: Reads are always consistent
App->>Cache: GET user:123
Cache-->>App: HIT -> {updated data}
Write-Through: the cache proxies every write, ensuring consistency
When to use: when consistency is the top priority — session management, user profiles, shopping carts. Not a fit for write-heavy workloads because every write incurs extra latency.
2.4 Write-Behind (Write-Back)
Writes only go to the cache; the database is updated asynchronously later (batch, periodic flush). This is the fastest write pattern, but risks data loss if a cache node dies before flush.
sequenceDiagram
participant App
participant Cache
participant Queue
participant DB
App->>Cache: SET counter:page_views = 15042
Cache-->>App: OK (immediately)
Note over Cache,DB: Async flush every 10s
Cache->>Queue: Batch updates
Queue->>DB: BULK UPDATE counters
DB-->>Queue: OK
Write-Behind: write cache first, flush DB later — fast, but risks data loss
When to use: page-view counters, analytics events, log aggregation — write-heavy workloads where losing a few seconds of trailing data is acceptable.
3. Comparing the four patterns
| Pattern | Read latency | Write latency | Consistency | Complexity |
|---|---|---|---|---|
| Cache-Aside | Slow miss, fast hit | Low (direct DB write) | Eventual (staleness window) | Low |
| Read-Through | Slow miss, fast hit | Low | Eventual | Medium |
| Write-Through | Always fast (cache hit) | High (sync writes to both) | Strong | Medium |
| Write-Behind | Always fast | Very low (cache only) | Weak (possible data loss) | High |
In real production
Most large systems combine multiple patterns. Example: Cache-Aside for user profiles (read-heavy), Write-Behind for analytics counters (write-heavy), Write-Through for shopping carts (needs consistency).
4. Cache invalidation — the hardest problem
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
When the database changes, the cache needs to know so it doesn't serve stale data. Three main strategies:
4.1 TTL-Based (Time To Live)
Each cache entry expires after a fixed time. Simplest, but creates the longest staleness window.
// Fixed TTL
await _cache.SetStringAsync("product:456", data,
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
});
// Sliding TTL — resets every read
await _cache.SetStringAsync("session:abc", sessionData,
new DistributedCacheEntryOptions
{
SlidingExpiration = TimeSpan.FromMinutes(30)
});
TTL rule of thumb: short TTLs (30s–2min) for rapidly changing data (stock prices, live scores). Medium TTLs (5–15min) for slow-changing data (product catalogs, user profiles). Long TTLs (1–24h) for near-static data (config, translations).
4.2 Event-Based Invalidation
When data in the database changes, an event is emitted to invalidate the corresponding cache entry. Most accurate, but more complex.
graph LR
A[App: UPDATE user] --> B[Database]
B --> C[CDC / Change Event]
C --> D[Message Bus]
D --> E[Cache Invalidator]
E --> F[Redis: DEL user:123]
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style F fill:#4CAF50,stroke:#fff,color:#fff
Event-based invalidation via Change Data Capture (CDC)
// Event-based invalidation in .NET
public class UserUpdatedHandler : INotificationHandler<UserUpdatedEvent>
{
private readonly IDistributedCache _cache;
public async Task Handle(UserUpdatedEvent notification,
CancellationToken cancellationToken)
{
// Invalidate every related cache key
await _cache.RemoveAsync($"user:{notification.UserId}");
await _cache.RemoveAsync($"user-profile:{notification.UserId}");
await _cache.RemoveAsync($"user-permissions:{notification.UserId}");
}
}
4.3 Version-Based Invalidation
Instead of deleting the cache entry, embed a version number in the cache key. When data changes, increment the version — clients automatically miss on the old key.
// Version-based: no invalidation, just a new version
var version = await _cache.GetStringAsync("product:456:version") ?? "0";
var cacheKey = $"product:456:v{version}";
var cached = await _cache.GetStringAsync(cacheKey);
if (cached != null) return cached;
// On product update
var newVersion = int.Parse(version) + 1;
await _cache.SetStringAsync("product:456:version", newVersion.ToString());
// Old keys expire naturally via TTL — no DEL needed
Pros: old and new versions can coexist, no race conditions during multi-instance deploys. Cons: one extra lookup for the version key.
5. Consistent hashing — distributing data across a cluster
When a cache cluster has multiple nodes, you need to decide which key lives where. Consistent hashing solves this with an important property: adding/removing a node only moves ~1/N of the data (N = number of nodes) instead of rehashing everything.
graph TD
subgraph Hash Ring
N1[Node A
0-120 deg]
N2[Node B
120-240 deg]
N3[Node C
240-360 deg]
end
K1[Key user:1
hash=85] --> N1
K2[Key user:2
hash=190] --> N2
K3[Key user:3
hash=310] --> N3
K4[Key user:4
hash=45] --> N1
style N1 fill:#e94560,stroke:#fff,color:#fff
style N2 fill:#2c3e50,stroke:#fff,color:#fff
style N3 fill:#4CAF50,stroke:#fff,color:#fff
style K1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style K2 fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
style K3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style K4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Consistent hashing ring: each key maps to the nearest node clockwise
In practice, Redis Cluster uses hash slots (16,384 slots) rather than a pure consistent-hashing ring, but the principle is the same: each node owns a slot range, and scaling only migrates part of the slots.
Virtual nodes fix uneven distribution
With just 3 physical nodes on the hash ring, distribution can be very skewed (node A gets 60% of keys, node C only 10%). Solution: each physical node is represented by 100–200 virtual nodes spread evenly on the ring. Result: near-even distribution, under 5% variance.
6. Handling cache stampede (thundering herd)
Cache stampede happens when a popular cache entry expires and hundreds of concurrent requests all miss the cache and hit the database at the same time. It's one of the most common failures when deploying a cache without guards.
sequenceDiagram
participant R1 as Request 1
participant R2 as Request 2
participant R3 as Request N...
participant Cache
participant DB
Note over Cache: Key "hot:product" just expired!
R1->>Cache: GET hot:product
Cache-->>R1: MISS
R2->>Cache: GET hot:product
Cache-->>R2: MISS
R3->>Cache: GET hot:product
Cache-->>R3: MISS
R1->>DB: SELECT * FROM products...
R2->>DB: SELECT * FROM products...
R3->>DB: SELECT * FROM products...
Note over DB: Database overload!
Cache stampede: hundreds of concurrent misses overwhelm the database
Three mitigation techniques:
6.1 Mutex lock (singleflight)
Allow only one request to rebuild the cache; others wait for the result:
public async Task<T?> GetWithLock<T>(string key,
Func<Task<T?>> factory, TimeSpan ttl)
{
var cached = await _cache.GetStringAsync(key);
if (cached != null)
return JsonSerializer.Deserialize<T>(cached);
var lockKey = $"lock:{key}";
var lockAcquired = await _redis.StringSetAsync(
lockKey, "1", TimeSpan.FromSeconds(10), When.NotExists);
if (lockAcquired)
{
try
{
// Only 1 request rebuilds
var value = await factory();
if (value != null)
{
await _cache.SetStringAsync(key,
JsonSerializer.Serialize(value),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = ttl
});
}
return value;
}
finally
{
await _redis.KeyDeleteAsync(lockKey);
}
}
// Other requests: wait and retry
await Task.Delay(100);
return await GetWithLock(key, factory, ttl);
}
6.2 Randomized TTL (TTL jitter)
Add a random offset to TTL so keys don't expire simultaneously:
// Instead of a fixed 5-minute TTL for every key
var baseTtl = TimeSpan.FromMinutes(5);
var jitter = TimeSpan.FromSeconds(Random.Shared.Next(0, 60));
var ttl = baseTtl + jitter; // 5:00 to 6:00 minutes
6.3 Early recompute (probabilistic)
Before a key expires, a random request proactively rebuilds it. The probability of rebuilding grows as the TTL nears zero:
// Check whether to rebuild early
var remainingTtl = await _redis.KeyTimeToLiveAsync(key);
if (remainingTtl.HasValue)
{
var totalTtl = TimeSpan.FromMinutes(5);
var ratio = remainingTtl.Value / totalTtl;
// When TTL is under 20%, 30% chance to rebuild
if (ratio < 0.2 && Random.Shared.NextDouble() < 0.3)
{
_ = Task.Run(() => RebuildCacheAsync(key)); // Fire-and-forget
}
}
7. Multi-layer caching — real-world architecture
Production systems typically stack multiple cache layers, each with its own characteristics:
graph TD
U[User Request] --> CDN[L1: CDN / Edge Cache
Cloudflare, CloudFront]
CDN -->|MISS| LB[Load Balancer]
LB --> APP[Application Server]
APP --> L2[L2: In-Process Cache
MemoryCache, ConcurrentDict]
L2 -->|MISS| L3[L3: Distributed Cache
Redis Cluster]
L3 -->|MISS| DB[Database
SQL Server, PostgreSQL]
DB -->|Populate| L3
L3 -->|Populate| L2
APP -->|Set Cache-Control| CDN
style U fill:#e94560,stroke:#fff,color:#fff
style CDN fill:#2c3e50,stroke:#fff,color:#fff
style LB fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style L2 fill:#4CAF50,stroke:#fff,color:#fff
style L3 fill:#4CAF50,stroke:#fff,color:#fff
style DB fill:#2c3e50,stroke:#fff,color:#fff
Multi-layer caching: CDN -> In-Process -> Distributed Cache -> Database
| Layer | Latency | Capacity | Consistency | Scope |
|---|---|---|---|---|
| L1: CDN/Edge | <10ms (from edge) | Very large (distributed) | Eventual (TTL) | Public, static content |
| L2: In-Process | <0.1ms | Small (instance RAM) | Inconsistent across instances | Hot data, per-instance |
| L3: Distributed | 1-5ms | Large (cluster RAM) | Consistent (single source) | Shared state |
| Database | 10-100ms | Very large (disk) | Strong (ACID) | Source of truth |
// Multi-layer cache in .NET
public class MultiLayerCache<T>
{
private readonly IMemoryCache _l2;
private readonly IDistributedCache _l3;
public async Task<T?> Get(string key, Func<Task<T?>> factory)
{
// L2: In-process cache (sub-microsecond)
if (_l2.TryGetValue(key, out T? l2Value))
return l2Value;
// L3: Distributed cache (1-5ms)
var l3Data = await _l3.GetStringAsync(key);
if (l3Data != null)
{
var value = JsonSerializer.Deserialize<T>(l3Data);
_l2.Set(key, value, TimeSpan.FromMinutes(1));
return value;
}
// Database (10-100ms)
var dbValue = await factory();
if (dbValue != null)
{
var json = JsonSerializer.Serialize(dbValue);
await _l3.SetStringAsync(key, json,
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
});
_l2.Set(key, dbValue, TimeSpan.FromMinutes(1));
}
return dbValue;
}
}
L2 cache: why must TTL be shorter than L3?
When data changes, you can invalidate L3 (Redis) via events. But L2 (in-process) on every instance doesn't receive those events — it must wait for TTL expiration. Therefore, L2 TTL should be 10–20% of L3 TTL to reduce the staleness window. Example: L3 = 10 minutes -> L2 = 1–2 minutes.
8. Redis vs Memcached — pick the right tool
The two most popular distributed cache solutions, each optimized for different use cases:
| Criterion | Redis | Memcached |
|---|---|---|
| Data structures | String, Hash, List, Set, Sorted Set, Stream, JSON, Vector... | String only (key-value) |
| Persistence | RDB + AOF (optional) | None |
| Replication | Master-Replica, Redis Cluster | None (client-side sharding) |
| Pub/Sub | Yes | No |
| Lua scripting | Yes | No |
| Memory efficiency | Overhead due to data structures | Better for pure key-value |
| Multi-thread | Single-threaded (I/O threads since v6) | Natively multi-threaded |
| Max value size | 512MB | 1MB (default) |
Pick Redis when
- You need rich data structures (sorted sets for leaderboards, streams for event logs)
- You need persistence (cache recovery after restart)
- You need pub/sub for cache invalidation across services
- You need complex atomic operations (Lua scripts)
- The cache also serves as session store, rate limiter, or queue
Pick Memcached when
- You only need pure key-value caching, no data structures
- You need maximum memory efficiency for a large working set
- Your workload is simple: GET/SET/DELETE
- You want native multi-threading (Memcached handles 200K+ ops/sec on multi-core better)
9. Eviction policies — when the cache fills up
When the cache hits memory limits, you must decide which entries to remove. Common eviction policies:
| Policy | Logic | Fits when |
|---|---|---|
| LRU (Least Recently Used) | Evict the entry not accessed for the longest time | Workload has temporal locality — recent access predicts future access |
| LFU (Least Frequently Used) | Evict the least frequently accessed entry | Workload has a stable hot set — certain keys are always "hot" |
| Random | Evict randomly | No clear access pattern, simplest approach |
| TTL-based | Evict expired entries first | When every entry has TTL and you want to prioritize fresh data |
What does Redis use by default?
Redis uses approximated LRU (samples 5 random keys, evicts the oldest). Since Redis 4.0, you can enable maxmemory-policy allkeys-lfu for LFU. In practice, LFU typically yields 5–10% higher hit rates than LRU on workloads with clearly hot keys.
10. Monitoring — measuring cache effectiveness
Caches don't prove their value on their own — you need to monitor these metrics:
Hit rate
Fraction of requests served from cache. Target: >90% for most workloads. If below 80%, check: TTL too short? Working set larger than cache capacity? Key pattern too unique (a per-user key)?
hit_rate = cache_hits / (cache_hits + cache_misses)
Eviction rate
Entries evicted per second due to memory pressure. A high eviction rate = cache too small or TTL too long. Fix: add memory, shorten TTL, or only cache hot data.
Latency (P50/P99)
Cache read P50 should be <1ms (local) or <5ms (distributed). P99 >10ms signals network issues, cluster overload, or oversized keys.
Memory usage
Redis INFO memory reports used_memory vs maxmemory. Keep usage under 80% of maxmemory to absorb peak traffic.
Conclusion
Designing a distributed cache system isn't just "add Redis" — it's a set of architectural decisions: pick the right pattern (Cache-Aside, Write-Through, Write-Behind), design a matching invalidation strategy (TTL, event-based, version-based), distribute data efficiently via consistent hashing, mitigate cache stampedes, and set up multi-layer caching for each tier of the system.
There's no one-size-fits-all answer. Every system has its own workload pattern — read-heavy or write-heavy, strong or eventual consistency, large or small working set. Understanding the trade-offs helps you make the right design choices.
References:
- AWS — Database Caching Strategies Using Redis
- Microsoft Learn — Cache-Aside Pattern
- Distributed System Authority — Distributed Caching Patterns
- Hello Interview — Caching for System Design
- Design Gurus — Caching in System Design Interviews
- Vinta Software — Cache Consistency in Distributed Environments
AI Agent Orchestration — 6 Patterns for Production Agent Coordination in 2026
OpenTelemetry — The Observability Standard for Distributed Systems
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.