Database Sharding — Data Partitioning Strategies When Your System Hits the Ceiling

Posted on: 4/18/2026 1:11:32 AM

Contents

1. Why do we need Database Sharding?

As an app grows from thousands to millions of users, a single-node database hits hard limits in storage, throughput, and latency. Vertical scaling (bigger hardware) only takes you so far — you can't buy infinite CPU for a single machine.

Database Sharding (a.k.a. horizontal partitioning) splits one large database into multiple smaller pieces called shards, each running on its own server. Every shard holds a subset of the full dataset but uses the same schema.

100x Write throughput gain with the right shards
<10ms Query latency on a small shard
Theoretical storage scalability
70% Of large systems use sharding (DB-Engines 2026)
graph TB
    Client[👤 Client Application] --> Router[🔀 Shard Router / Proxy]
    Router --> S1[📦 Shard 1
User ID 1-1M] Router --> S2[📦 Shard 2
User ID 1M-2M] Router --> S3[📦 Shard 3
User ID 2M-3M] Router --> SN[📦 Shard N
User ID ...] S1 --> R1[🔄 Replica 1a] S2 --> R2[🔄 Replica 2a] S3 --> R3[🔄 Replica 3a] style Client fill:#e94560,stroke:#fff,color:#fff style Router fill:#2c3e50,stroke:#fff,color:#fff style S1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style SN fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style R1 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style R2 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50 style R3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

A basic Database Sharding architecture with a shard router and read replicas

2. Sharding strategies

2.1 Hash-based Sharding

Apply a hash function to the shard key to pick the destination shard. This is the most common strategy because it spreads data evenly.

shard_id = hash(shard_key) % number_of_shards

-- Example: user_id = 12345
-- hash(12345) = 7829431
-- shard_id = 7829431 % 4 = 3 → Shard 3

✅ Strengths

Even data distribution, no hotspots. PlanetScale recommends hash-based as the default strategy for most use cases because it's simple and effective.

⚠️ Weaknesses

Range queries are inefficient (they scan every shard). Adding/removing shards forces data to be rehashed — which is exactly why Consistent Hashing exists.

2.2 Range-based Sharding

Split data into contiguous ranges of the shard key. A natural fit for data with inherent ordering (timestamps, sequential IDs, geography).

Shard A
orders 2024-01 → 2024-06
Shard B
orders 2024-07 → 2024-12
Shard C
orders 2025-01 → 2025-06
Shard D
orders 2025-07 → 2026-04

Pros: very fast range queries (only 1-2 shards touched). Easy to reason about and operate.

Cons: prone to hotspots — the shard holding the newest data is hammered the most. For example, the "current month" order shard may absorb 90% of traffic.

2.3 Directory-based Sharding

Use a separate lookup table (directory) to map each shard key → shard location. The most flexible option, but the directory becomes a single point of failure.

graph LR
    App[Application] --> Dir[(📋 Directory
Lookup Table)] Dir --> |"user_id 1-500K"| S1[Shard 1] Dir --> |"user_id 500K-800K"| S2[Shard 2] Dir --> |"user_id 800K-2M"| S3[Shard 3] Dir --> |"VIP users"| S4[Shard VIP] style App fill:#e94560,stroke:#fff,color:#fff style Dir fill:#2c3e50,stroke:#fff,color:#fff style S1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style S4 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

Directory-based sharding enables flexible mapping — even special logic (a VIP shard)

Directory-based is often paired with a cache (Redis, Memcached) to keep lookup latency low. Moving data between shards simply updates the directory — no rehashing required.

2.4 Geographic Sharding

Distribute data by the user's geographic location — keep it near where it's accessed. Especially important for global apps and data-residency regulations (GDPR, PDPA).

StrategyEven distributionRange queryReshardingTypical use case
Hash-based⭐⭐⭐⭐⭐User data, session store
Range-based⭐⭐⭐⭐⭐⭐Time-series, log data
Directory-based⭐⭐⭐⭐⭐⭐⭐⭐Multi-tenant SaaS
Geographic⭐⭐⭐⭐⭐⭐Global apps, data residency

3. Consistent Hashing & Virtual Nodes

The biggest issue with naïve hash-based sharding (hash(key) % N) is that adding or removing a shard forces most keys to move (rehash). Going from N = 4 to N = 5, about 80% of data must migrate — unacceptable in production.

Consistent Hashing addresses this by arranging the hash space into a circle (the hash ring). Each shard occupies a position on the ring, and each key is mapped to the nearest shard clockwise.

graph TB
    subgraph Ring["🔵 Hash Ring (0 → 2³²)"]
        direction TB
        N1["🟢 Node A
position: 0°"] N2["🔴 Node B
position: 90°"] N3["🟡 Node C
position: 180°"] N4["🟣 Node D
position: 270°"] end K1["Key X → hash: 45°"] -.->|"nearest clockwise"| N2 K2["Key Y → hash: 200°"] -.->|"nearest clockwise"| N4 K3["Key Z → hash: 350°"] -.->|"nearest clockwise"| N1 style N1 fill:#4CAF50,stroke:#fff,color:#fff style N2 fill:#e94560,stroke:#fff,color:#fff style N3 fill:#ff9800,stroke:#fff,color:#fff style N4 fill:#9c27b0,stroke:#fff,color:#fff style K1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style K2 fill:#f8f9fa,stroke:#9c27b0,color:#2c3e50 style K3 fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50

The Consistent Hashing Ring — adding/removing a node only affects keys on adjacent nodes

When Node E is added at position 135°, only keys in the (90°, 135°] range migrate from Node C to Node E — roughly 25% of a single node's data, not 80% of the whole system.

Virtual Nodes (VNodes)

In practice, only 4-5 physical nodes on the ring result in uneven distribution. Virtual Nodes fix this by having each physical node occupy multiple positions on the ring:

-- Node A holds 3 virtual nodes at:   0°, 120°, 240°
-- Node B holds 3 virtual nodes at:  40°, 160°, 280°
-- → distribution is far more uniform

-- When Node A is removed:
-- Data at   0° → moves to the nearest vnode (Node B at 40°)
-- Data at 120° → moves to Node C at 160°
-- Data at 240° → moves to Node B at 280°
-- → Load is evenly redistributed across the remaining nodes

💡 In practice

Cassandra uses 256 virtual nodes per physical node by default. Amazon's DynamoDB also uses consistent hashing with virtual nodes for partition management. For new designs, start with 128-256 VNodes per node to get an even distribution from day one.

4. The art of picking a Shard Key

Picking the shard key is the single most important decision when designing sharding — a bad choice creates hotspots, spreads cross-shard queries everywhere, and is extremely expensive to fix. The core criteria:

4.1 High Cardinality

The shard key must have many distinct values. user_id (millions of values) is far better than country_code (~200 values). Low cardinality means data concentrates in a handful of shards.

4.2 Even Distribution

Data must spread evenly across shards. For instance, created_date is a poor shard key for range-based sharding because new data always piles onto the last shard.

4.3 Query Pattern Alignment

The shard key should match your most common WHERE clause. If 90% of queries filter by tenant_id, that's your shard key — each query only hits a single shard.

4.4 Stability (Immutable)

The shard key should never change. If a user updates their email and email is the shard key, that row must migrate to another shard — expensive and error-prone.

Shard KeyCardinalityDistributionStabilityAssessment
user_idVery highEven (hash)Immutable⭐ Best for multi-user apps
tenant_idMediumCan be skewedImmutable⭐ Good for SaaS, monitor size
order_idVery highEvenImmutable⭐ Good for e-commerce
created_atHighSkewed (new = hot)Immutable⚠️ Only good for archive data
countryLow (~200)Very skewedCan change❌ Avoid using alone
emailHighEvenCan change❌ Not stable

⚠️ Compound shard keys

When no single key is perfect, use a compound key. Example: (tenant_id, user_id) keeps a tenant's data on the same shard (good for isolation) while hashing on user_id distributes load evenly within a large tenant.

5. Challenges of sharding

5.1 Cross-Shard Queries

Queries that need data across multiple shards are the biggest challenge. E.g. SELECT * FROM orders WHERE product_id = 42 when the shard key is user_id — you must scatter the query to EVERY shard and gather the results.

sequenceDiagram
    participant App as Application
    participant Router as Shard Router
    participant S1 as Shard 1
    participant S2 as Shard 2
    participant S3 as Shard 3

    App->>Router: SELECT * FROM orders WHERE product_id = 42
    Router->>S1: Forward query
    Router->>S2: Forward query
    Router->>S3: Forward query
    S1-->>Router: 15 rows
    S2-->>Router: 8 rows
    S3-->>Router: 22 rows
    Router->>Router: Merge + Sort + Limit
    Router-->>App: Final result (45 rows)

The scatter-gather pattern — cross-shard queries fan out to every shard

Solutions: design the shard key around your primary query pattern. If you need to query along multiple dimensions, use denormalization (duplicate data across shards) or a separate secondary-index shard.

5.2 Distributed Transactions

Transactions spanning multiple shards require distributed protocols like 2-Phase Commit (2PC) — slow and complex. Prefer designs where every transaction fits within a single shard.

-- ❌ Cross-shard transaction (slow, complex)
BEGIN DISTRIBUTED TRANSACTION
  UPDATE shard_1.accounts SET balance = balance - 100 WHERE user_id = 1;
  UPDATE shard_3.accounts SET balance = balance + 100 WHERE user_id = 999;
COMMIT

-- ✅ Better design: use a Saga pattern
-- Step 1: Debit user 1 (shard 1) → emit an event
-- Step 2: Credit user 999 (shard 3) → consume the event
-- Step 3: If step 2 fails → compensating transaction on shard 1

5.3 Resharding (Rebalancing)

As data grows and you need more shards, moving data between shards (resharding) is extremely expensive. The usual steps:

  1. Double-write: write to both old and new shards in parallel
  2. Backfill: copy historical data from the old shard to the new one
  3. Verify: compare checksums between the two sides
  4. Cutover: switch traffic to the new shard schema
  5. Cleanup: delete the now-redundant data from the old shard

✅ Pre-split

If you need 3 shards today, start with 4 (a power of 2). Consistent hashing behaves better when node counts are powers of two. Many teams start with 8-16 logical shards even when only 2-3 physical servers are needed — each server hosts multiple logical shards, and scaling just means moving a logical shard to a new machine.

5.4 Unique Constraints Across Shards

Enforcing a global UNIQUE constraint (e.g. unique email across all shards) can't use a simple database constraint. Options:

  • Separate uniqueness table: a small centralized table mapping (email → shard_id)
  • Globally unique IDs: Snowflake IDs, UUIDs, or ULIDs that encode shard info into the ID
  • Application-level check: query before insert, accept the rare race condition

6. Comparing solutions: Vitess, Citus, CockroachDB, TiDB

As of 2026, four sharding solutions stand out — each serving a different use case:

CriterionVitessCitusCockroachDBTiDB
FoundationMySQL middlewarePostgreSQL extensionCustom (Postgres-compatible)Custom (MySQL-compatible)
Sharding modelExplicit (manual key)Distributed tablesAutomatic range-basedAutomatic (region-based)
ReshardingManual, onlineRebalancer toolAutomaticAutomatic (split/merge)
Cross-shard queriesLimitedGoodVery goodVery good
ConsistencyPer-shard (MySQL)Per-node (Postgres)Strong (Raft)Strong (Raft)
HTAPNoColumnar addonNoYes (TiFlash)
DeploymentKubernetes-nativeManaged + self-hostManaged + self-hostManaged + self-host
OriginYouTube/GoogleMicrosoft (Azure)Cockroach LabsPingCAP
Best forControlled MySQL scale-outPostgres + analyticsMulti-region, Postgres-firstMySQL-first, HTAP workloads
graph TB
    subgraph Decision["🤔 Which solution to pick?"]
        Start{Current database?}
        Start -->|MySQL| MySQL_Q{Need auto-sharding?}
        Start -->|PostgreSQL| PG_Q{Need multi-region?}
        Start -->|Greenfield| New_Q{Top priority?}

        MySQL_Q -->|Yes| TiDB_R[✅ TiDB]
        MySQL_Q -->|No, want control| Vitess_R[✅ Vitess]

        PG_Q -->|Yes| CRDB_R[✅ CockroachDB]
        PG_Q -->|No| Citus_R[✅ Citus]

        New_Q -->|HTAP| TiDB_R2[✅ TiDB]
        New_Q -->|Multi-region| CRDB_R2[✅ CockroachDB]
        New_Q -->|Postgres ecosystem| Citus_R2[✅ Citus]
    end

    style Start fill:#e94560,stroke:#fff,color:#fff
    style MySQL_Q fill:#2c3e50,stroke:#fff,color:#fff
    style PG_Q fill:#2c3e50,stroke:#fff,color:#fff
    style New_Q fill:#2c3e50,stroke:#fff,color:#fff
    style TiDB_R fill:#4CAF50,stroke:#fff,color:#fff
    style TiDB_R2 fill:#4CAF50,stroke:#fff,color:#fff
    style Vitess_R fill:#4CAF50,stroke:#fff,color:#fff
    style CRDB_R fill:#4CAF50,stroke:#fff,color:#fff
    style CRDB_R2 fill:#4CAF50,stroke:#fff,color:#fff
    style Citus_R fill:#4CAF50,stroke:#fff,color:#fff
    style Citus_R2 fill:#4CAF50,stroke:#fff,color:#fff

Decision tree for choosing the right sharding solution

Case study: Ninja Van picked TiDB over Vitess

Ninja Van — a Southeast Asian logistics platform handling millions of orders daily — migrated from MySQL to TiDB instead of choosing Vitess or CockroachDB. Key reasons:

  • TiDB is MySQL-protocol-compatible → low migration cost
  • Auto-sharding without having to define shard keys by hand
  • TiFlash lets them run analytics directly on operational data (HTAP)
  • Horizontal scaling on Kubernetes without manual resharding

7. When you should NOT shard

Sharding adds significant complexity. Before going there, make sure you've exhausted simpler options:

Step 1: Query optimization
Proper indexes, rewrite slow queries, use EXPLAIN ANALYZE. Most DB bottlenecks come from bad queries, not missing hardware.
Step 2: Read replicas
80% of apps are read-heavy. Adding a read replica can boost throughput 3-5× without sharding. Every managed database (RDS, Cloud SQL) supports it out of the box.
Step 3: Caching layer
Redis/Memcached in front of the DB cuts 70-90% of read traffic. Pair with a CDN for static content. Low cost, fast to ship.
Step 4: Vertical scaling
Upgrade to 64 vCPU, 256 GB RAM, NVMe SSD. Cloud providers let you scale up in minutes. PostgreSQL on good hardware handles millions of rows per second.
Step 5: Table partitioning
Partitioning WITHIN a single database (not sharding). PostgreSQL native partitioning or MySQL partition by range/hash — much simpler than sharding.
Step 6: Sharding
Only when ALL the above aren't enough: a single-node DB genuinely can't sustain the write throughput, storage exceeds the TB range, or you need multi-region data locality.

⚠️ Premature sharding is an anti-pattern

Many teams shard too early "just in case". In reality, a single-node PostgreSQL on modern (2026) hardware comfortably handles several TB of data and tens of thousands of transactions per second. Sharding before you need it creates operational overhead without adding value.

8. Conclusion

Database Sharding is a powerful tool — but comes with a major complexity tradeoff. Remember:

  • Hash-based is the best default strategy for most use cases
  • Consistent Hashing with Virtual Nodes solves the resharding problem
  • The Shard Key must have high cardinality, be stable, and align with the primary query pattern
  • Cross-shard queries are always expensive — design the schema to minimize them
  • Modern solutions like TiDB and CockroachDB provide auto-sharding, reducing operational burden
  • Always try query optimization → read replicas → caching → vertical scaling before reaching for sharding

In the distributed-systems era of 2026, understanding sharding isn't just about applying it — it's about knowing when not to. That's the mark of a mature systems architect.

References: