Database Sharding — Data Partitioning Strategies When Your System Hits the Ceiling
Posted on: 4/18/2026 1:11:32 AM
Contents
1. Why do we need Database Sharding?
As an app grows from thousands to millions of users, a single-node database hits hard limits in storage, throughput, and latency. Vertical scaling (bigger hardware) only takes you so far — you can't buy infinite CPU for a single machine.
Database Sharding (a.k.a. horizontal partitioning) splits one large database into multiple smaller pieces called shards, each running on its own server. Every shard holds a subset of the full dataset but uses the same schema.
graph TB
Client[👤 Client Application] --> Router[🔀 Shard Router / Proxy]
Router --> S1[📦 Shard 1
User ID 1-1M]
Router --> S2[📦 Shard 2
User ID 1M-2M]
Router --> S3[📦 Shard 3
User ID 2M-3M]
Router --> SN[📦 Shard N
User ID ...]
S1 --> R1[🔄 Replica 1a]
S2 --> R2[🔄 Replica 2a]
S3 --> R3[🔄 Replica 3a]
style Client fill:#e94560,stroke:#fff,color:#fff
style Router fill:#2c3e50,stroke:#fff,color:#fff
style S1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style S2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style S3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style SN fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style R1 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style R2 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style R3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
A basic Database Sharding architecture with a shard router and read replicas
2. Sharding strategies
2.1 Hash-based Sharding
Apply a hash function to the shard key to pick the destination shard. This is the most common strategy because it spreads data evenly.
shard_id = hash(shard_key) % number_of_shards
-- Example: user_id = 12345
-- hash(12345) = 7829431
-- shard_id = 7829431 % 4 = 3 → Shard 3
✅ Strengths
Even data distribution, no hotspots. PlanetScale recommends hash-based as the default strategy for most use cases because it's simple and effective.
⚠️ Weaknesses
Range queries are inefficient (they scan every shard). Adding/removing shards forces data to be rehashed — which is exactly why Consistent Hashing exists.
2.2 Range-based Sharding
Split data into contiguous ranges of the shard key. A natural fit for data with inherent ordering (timestamps, sequential IDs, geography).
Pros: very fast range queries (only 1-2 shards touched). Easy to reason about and operate.
Cons: prone to hotspots — the shard holding the newest data is hammered the most. For example, the "current month" order shard may absorb 90% of traffic.
2.3 Directory-based Sharding
Use a separate lookup table (directory) to map each shard key → shard location. The most flexible option, but the directory becomes a single point of failure.
graph LR
App[Application] --> Dir[(📋 Directory
Lookup Table)]
Dir --> |"user_id 1-500K"| S1[Shard 1]
Dir --> |"user_id 500K-800K"| S2[Shard 2]
Dir --> |"user_id 800K-2M"| S3[Shard 3]
Dir --> |"VIP users"| S4[Shard VIP]
style App fill:#e94560,stroke:#fff,color:#fff
style Dir fill:#2c3e50,stroke:#fff,color:#fff
style S1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style S2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style S3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style S4 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
Directory-based sharding enables flexible mapping — even special logic (a VIP shard)
Directory-based is often paired with a cache (Redis, Memcached) to keep lookup latency low. Moving data between shards simply updates the directory — no rehashing required.
2.4 Geographic Sharding
Distribute data by the user's geographic location — keep it near where it's accessed. Especially important for global apps and data-residency regulations (GDPR, PDPA).
| Strategy | Even distribution | Range query | Resharding | Typical use case |
|---|---|---|---|---|
| Hash-based | ⭐⭐⭐ | ⭐ | ⭐⭐ | User data, session store |
| Range-based | ⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Time-series, log data |
| Directory-based | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Multi-tenant SaaS |
| Geographic | ⭐⭐ | ⭐⭐ | ⭐⭐ | Global apps, data residency |
3. Consistent Hashing & Virtual Nodes
The biggest issue with naïve hash-based sharding (hash(key) % N) is that adding or removing a shard forces most keys to move (rehash). Going from N = 4 to N = 5, about 80% of data must migrate — unacceptable in production.
Consistent Hashing addresses this by arranging the hash space into a circle (the hash ring). Each shard occupies a position on the ring, and each key is mapped to the nearest shard clockwise.
graph TB
subgraph Ring["🔵 Hash Ring (0 → 2³²)"]
direction TB
N1["🟢 Node A
position: 0°"]
N2["🔴 Node B
position: 90°"]
N3["🟡 Node C
position: 180°"]
N4["🟣 Node D
position: 270°"]
end
K1["Key X → hash: 45°"] -.->|"nearest clockwise"| N2
K2["Key Y → hash: 200°"] -.->|"nearest clockwise"| N4
K3["Key Z → hash: 350°"] -.->|"nearest clockwise"| N1
style N1 fill:#4CAF50,stroke:#fff,color:#fff
style N2 fill:#e94560,stroke:#fff,color:#fff
style N3 fill:#ff9800,stroke:#fff,color:#fff
style N4 fill:#9c27b0,stroke:#fff,color:#fff
style K1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style K2 fill:#f8f9fa,stroke:#9c27b0,color:#2c3e50
style K3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
The Consistent Hashing Ring — adding/removing a node only affects keys on adjacent nodes
When Node E is added at position 135°, only keys in the (90°, 135°] range migrate from Node C to Node E — roughly 25% of a single node's data, not 80% of the whole system.
Virtual Nodes (VNodes)
In practice, only 4-5 physical nodes on the ring result in uneven distribution. Virtual Nodes fix this by having each physical node occupy multiple positions on the ring:
-- Node A holds 3 virtual nodes at: 0°, 120°, 240°
-- Node B holds 3 virtual nodes at: 40°, 160°, 280°
-- → distribution is far more uniform
-- When Node A is removed:
-- Data at 0° → moves to the nearest vnode (Node B at 40°)
-- Data at 120° → moves to Node C at 160°
-- Data at 240° → moves to Node B at 280°
-- → Load is evenly redistributed across the remaining nodes
💡 In practice
Cassandra uses 256 virtual nodes per physical node by default. Amazon's DynamoDB also uses consistent hashing with virtual nodes for partition management. For new designs, start with 128-256 VNodes per node to get an even distribution from day one.
4. The art of picking a Shard Key
Picking the shard key is the single most important decision when designing sharding — a bad choice creates hotspots, spreads cross-shard queries everywhere, and is extremely expensive to fix. The core criteria:
4.1 High Cardinality
The shard key must have many distinct values. user_id (millions of values) is far better than country_code (~200 values). Low cardinality means data concentrates in a handful of shards.
4.2 Even Distribution
Data must spread evenly across shards. For instance, created_date is a poor shard key for range-based sharding because new data always piles onto the last shard.
4.3 Query Pattern Alignment
The shard key should match your most common WHERE clause. If 90% of queries filter by tenant_id, that's your shard key — each query only hits a single shard.
4.4 Stability (Immutable)
The shard key should never change. If a user updates their email and email is the shard key, that row must migrate to another shard — expensive and error-prone.
| Shard Key | Cardinality | Distribution | Stability | Assessment |
|---|---|---|---|---|
user_id | Very high | Even (hash) | Immutable | ⭐ Best for multi-user apps |
tenant_id | Medium | Can be skewed | Immutable | ⭐ Good for SaaS, monitor size |
order_id | Very high | Even | Immutable | ⭐ Good for e-commerce |
created_at | High | Skewed (new = hot) | Immutable | ⚠️ Only good for archive data |
country | Low (~200) | Very skewed | Can change | ❌ Avoid using alone |
email | High | Even | Can change | ❌ Not stable |
⚠️ Compound shard keys
When no single key is perfect, use a compound key. Example: (tenant_id, user_id) keeps a tenant's data on the same shard (good for isolation) while hashing on user_id distributes load evenly within a large tenant.
5. Challenges of sharding
5.1 Cross-Shard Queries
Queries that need data across multiple shards are the biggest challenge. E.g. SELECT * FROM orders WHERE product_id = 42 when the shard key is user_id — you must scatter the query to EVERY shard and gather the results.
sequenceDiagram
participant App as Application
participant Router as Shard Router
participant S1 as Shard 1
participant S2 as Shard 2
participant S3 as Shard 3
App->>Router: SELECT * FROM orders WHERE product_id = 42
Router->>S1: Forward query
Router->>S2: Forward query
Router->>S3: Forward query
S1-->>Router: 15 rows
S2-->>Router: 8 rows
S3-->>Router: 22 rows
Router->>Router: Merge + Sort + Limit
Router-->>App: Final result (45 rows)
The scatter-gather pattern — cross-shard queries fan out to every shard
Solutions: design the shard key around your primary query pattern. If you need to query along multiple dimensions, use denormalization (duplicate data across shards) or a separate secondary-index shard.
5.2 Distributed Transactions
Transactions spanning multiple shards require distributed protocols like 2-Phase Commit (2PC) — slow and complex. Prefer designs where every transaction fits within a single shard.
-- ❌ Cross-shard transaction (slow, complex)
BEGIN DISTRIBUTED TRANSACTION
UPDATE shard_1.accounts SET balance = balance - 100 WHERE user_id = 1;
UPDATE shard_3.accounts SET balance = balance + 100 WHERE user_id = 999;
COMMIT
-- ✅ Better design: use a Saga pattern
-- Step 1: Debit user 1 (shard 1) → emit an event
-- Step 2: Credit user 999 (shard 3) → consume the event
-- Step 3: If step 2 fails → compensating transaction on shard 1
5.3 Resharding (Rebalancing)
As data grows and you need more shards, moving data between shards (resharding) is extremely expensive. The usual steps:
- Double-write: write to both old and new shards in parallel
- Backfill: copy historical data from the old shard to the new one
- Verify: compare checksums between the two sides
- Cutover: switch traffic to the new shard schema
- Cleanup: delete the now-redundant data from the old shard
✅ Pre-split
If you need 3 shards today, start with 4 (a power of 2). Consistent hashing behaves better when node counts are powers of two. Many teams start with 8-16 logical shards even when only 2-3 physical servers are needed — each server hosts multiple logical shards, and scaling just means moving a logical shard to a new machine.
5.4 Unique Constraints Across Shards
Enforcing a global UNIQUE constraint (e.g. unique email across all shards) can't use a simple database constraint. Options:
- Separate uniqueness table: a small centralized table mapping (email → shard_id)
- Globally unique IDs: Snowflake IDs, UUIDs, or ULIDs that encode shard info into the ID
- Application-level check: query before insert, accept the rare race condition
6. Comparing solutions: Vitess, Citus, CockroachDB, TiDB
As of 2026, four sharding solutions stand out — each serving a different use case:
| Criterion | Vitess | Citus | CockroachDB | TiDB |
|---|---|---|---|---|
| Foundation | MySQL middleware | PostgreSQL extension | Custom (Postgres-compatible) | Custom (MySQL-compatible) |
| Sharding model | Explicit (manual key) | Distributed tables | Automatic range-based | Automatic (region-based) |
| Resharding | Manual, online | Rebalancer tool | Automatic | Automatic (split/merge) |
| Cross-shard queries | Limited | Good | Very good | Very good |
| Consistency | Per-shard (MySQL) | Per-node (Postgres) | Strong (Raft) | Strong (Raft) |
| HTAP | No | Columnar addon | No | Yes (TiFlash) |
| Deployment | Kubernetes-native | Managed + self-host | Managed + self-host | Managed + self-host |
| Origin | YouTube/Google | Microsoft (Azure) | Cockroach Labs | PingCAP |
| Best for | Controlled MySQL scale-out | Postgres + analytics | Multi-region, Postgres-first | MySQL-first, HTAP workloads |
graph TB
subgraph Decision["🤔 Which solution to pick?"]
Start{Current database?}
Start -->|MySQL| MySQL_Q{Need auto-sharding?}
Start -->|PostgreSQL| PG_Q{Need multi-region?}
Start -->|Greenfield| New_Q{Top priority?}
MySQL_Q -->|Yes| TiDB_R[✅ TiDB]
MySQL_Q -->|No, want control| Vitess_R[✅ Vitess]
PG_Q -->|Yes| CRDB_R[✅ CockroachDB]
PG_Q -->|No| Citus_R[✅ Citus]
New_Q -->|HTAP| TiDB_R2[✅ TiDB]
New_Q -->|Multi-region| CRDB_R2[✅ CockroachDB]
New_Q -->|Postgres ecosystem| Citus_R2[✅ Citus]
end
style Start fill:#e94560,stroke:#fff,color:#fff
style MySQL_Q fill:#2c3e50,stroke:#fff,color:#fff
style PG_Q fill:#2c3e50,stroke:#fff,color:#fff
style New_Q fill:#2c3e50,stroke:#fff,color:#fff
style TiDB_R fill:#4CAF50,stroke:#fff,color:#fff
style TiDB_R2 fill:#4CAF50,stroke:#fff,color:#fff
style Vitess_R fill:#4CAF50,stroke:#fff,color:#fff
style CRDB_R fill:#4CAF50,stroke:#fff,color:#fff
style CRDB_R2 fill:#4CAF50,stroke:#fff,color:#fff
style Citus_R fill:#4CAF50,stroke:#fff,color:#fff
style Citus_R2 fill:#4CAF50,stroke:#fff,color:#fff
Decision tree for choosing the right sharding solution
Case study: Ninja Van picked TiDB over Vitess
Ninja Van — a Southeast Asian logistics platform handling millions of orders daily — migrated from MySQL to TiDB instead of choosing Vitess or CockroachDB. Key reasons:
- TiDB is MySQL-protocol-compatible → low migration cost
- Auto-sharding without having to define shard keys by hand
- TiFlash lets them run analytics directly on operational data (HTAP)
- Horizontal scaling on Kubernetes without manual resharding
7. When you should NOT shard
Sharding adds significant complexity. Before going there, make sure you've exhausted simpler options:
⚠️ Premature sharding is an anti-pattern
Many teams shard too early "just in case". In reality, a single-node PostgreSQL on modern (2026) hardware comfortably handles several TB of data and tens of thousands of transactions per second. Sharding before you need it creates operational overhead without adding value.
8. Conclusion
Database Sharding is a powerful tool — but comes with a major complexity tradeoff. Remember:
- Hash-based is the best default strategy for most use cases
- Consistent Hashing with Virtual Nodes solves the resharding problem
- The Shard Key must have high cardinality, be stable, and align with the primary query pattern
- Cross-shard queries are always expensive — design the schema to minimize them
- Modern solutions like TiDB and CockroachDB provide auto-sharding, reducing operational burden
- Always try query optimization → read replicas → caching → vertical scaling before reaching for sharding
In the distributed-systems era of 2026, understanding sharding isn't just about applying it — it's about knowing when not to. That's the mark of a mature systems architect.
References:
- PlanetScale — Sharding Strategies: Directory-based, Range-based, and Hash-based
- PingCAP — Why TiDB Beats Vitess and CockroachDB at Ninja Van
- Last9 — Database Sharding: How It Works and When You Actually Need It
- Hello Interview — Sharding in System Design Interviews
- Brandur — A Comparison of Advanced, Modern Cloud Databases
Zero-Downtime Deployment — Blue-Green, Canary, and Rolling Update
Cloudflare Agent Cloud 2026 — Building AI Agents at the Edge with Workers, Durable Objects, and Project Think
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.