MongoDB sharding.
Architecture
[clients] → [mongos] → [shard1 (RS)] [shard2 (RS)] [shard3 (RS)]
↓
[config servers (RS)]
Enable
sh.enableSharding("myapp")
sh.shardCollection("myapp.events", { user_id: "hashed" })
Shard key choice
Good shard key:
- High cardinality: many unique values.
- Even distribution: avoids hot shards.
- Matches common queries: queries can be targeted.
- Monotonically NOT increasing: avoid
_id: 1(all writes to one shard).
Examples:
{ user_id: "hashed" }: even distribution; bad for range queries.{ tenant_id: 1, _id: 1 }: tenant-isolated; queries within tenant fast.{ created_date: 1, user_id: 1 }: time-based + secondary.
Hashed shard key
sh.shardCollection("myapp.events", { user_id: "hashed" })
Auto-creates hashed index. Writes distributed.
Targeted vs scatter-gather
Query with shard key → routed to one shard (fast).
Query without → all shards queried (slow).
Try to include shard key in queries.
Balancer
sh.getBalancerState()
sh.startBalancer()
sh.stopBalancer()
Background process redistributes chunks. Run during off-hours if needed:
sh.setBalancerState(true)
db.settings.updateOne(
{ _id: "balancer" },
{ $set: { activeWindow: { start: "23:00", stop: "06:00" } } }
)
Chunks
sh.status()
db.adminCommand({ getShardDistribution: 1 })
Default chunk: 128MB. Splits + migrates as it grows.
Jumbo chunks
Chunk exceeds size and can’t split (bad shard key choice). Manual fix:
sh.splitFind("myapp.events", { user_id: "x" })
sh.moveChunk("myapp.events", { user_id: "x" }, "shard02")
Zones
sh.addShardToZone("shard1", "us")
sh.updateZoneKeyRange("myapp.users", { region: "us", _id: MinKey }, { region: "us", _id: MaxKey }, "us")
Geo / tenant routing.
Refining shard key
sh.reshardCollection("myapp.events", { user_id: 1, created: 1 })
Available in newer versions; expensive operation.
Connection
mongodb://mongos1,mongos2:27017/myapp
Connect to mongos, not shards.
When to shard
single-node RAM for working set.
- Write throughput exceeding single node.
- Geographic distribution.
If unsure: scale vertically first; shard last.
Common mistakes
- Sharding too early.
_id: 1shard key (all writes to one shard).- No shard key in queries → scatter-gather.
- Imbalanced data (one tenant huge).
- Forgetting hashed index for hashed shard key.
Read this next
If you want my sharding playbook, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .