MongoDB Cheatsheet 09 — Sharding

MongoDB sharding.

Architecture

[clients] → [mongos] → [shard1 (RS)] [shard2 (RS)] [shard3 (RS)]
                       ↓
                       [config servers (RS)]

Enable

sh.enableSharding("myapp")
sh.shardCollection("myapp.events", { user_id: "hashed" })

Shard key choice

Good shard key:

High cardinality: many unique values.
Even distribution: avoids hot shards.
Matches common queries: queries can be targeted.
Monotonically NOT increasing: avoid _id: 1 (all writes to one shard).

Examples:

{ user_id: "hashed" }: even distribution; bad for range queries.
{ tenant_id: 1, _id: 1 }: tenant-isolated; queries within tenant fast.
{ created_date: 1, user_id: 1 }: time-based + secondary.

Hashed shard key

sh.shardCollection("myapp.events", { user_id: "hashed" })

Auto-creates hashed index. Writes distributed.

Targeted vs scatter-gather

Query with shard key → routed to one shard (fast).
Query without → all shards queried (slow).

Try to include shard key in queries.

Balancer

sh.getBalancerState()
sh.startBalancer()
sh.stopBalancer()

Background process redistributes chunks. Run during off-hours if needed:

sh.setBalancerState(true)
db.settings.updateOne(
    { _id: "balancer" },
    { $set: { activeWindow: { start: "23:00", stop: "06:00" } } }
)

Chunks

sh.status()
db.adminCommand({ getShardDistribution: 1 })

Default chunk: 128MB. Splits + migrates as it grows.

Jumbo chunks

Chunk exceeds size and can’t split (bad shard key choice). Manual fix:

sh.splitFind("myapp.events", { user_id: "x" })
sh.moveChunk("myapp.events", { user_id: "x" }, "shard02")

Zones

sh.addShardToZone("shard1", "us")
sh.updateZoneKeyRange("myapp.users", { region: "us", _id: MinKey }, { region: "us", _id: MaxKey }, "us")

Geo / tenant routing.

Refining shard key

sh.reshardCollection("myapp.events", { user_id: 1, created: 1 })

Available in newer versions; expensive operation.

Connection

mongodb://mongos1,mongos2:27017/myapp

Connect to mongos, not shards.

When to shard

single-node RAM for working set.
Write throughput exceeding single node.
Geographic distribution.

If unsure: scale vertically first; shard last.

Common mistakes

Sharding too early.
_id: 1 shard key (all writes to one shard).
No shard key in queries → scatter-gather.
Imbalanced data (one tenant huge).
Forgetting hashed index for hashed shard key.

Architecture#

Enable#

Shard key choice#

Hashed shard key#

Targeted vs scatter-gather#

Balancer#

Chunks#

Jumbo chunks#

Zones#

Refining shard key#

Connection#

When to shard#

Common mistakes#

Read this next#