MongoDB replication.

Replica set basics

3 nodes: 1 primary + 2 secondaries (or 1 arbiter for 3rd vote).

# mongod.conf
replication:
  replSetName: rs0
rs.initiate({
    _id: "rs0",
    members: [
        { _id: 0, host: "node1:27017" },
        { _id: 1, host: "node2:27017" },
        { _id: 2, host: "node3:27017" },
    ],
})

rs.status()
rs.conf()

Connection

mongodb://node1,node2,node3/myapp?replicaSet=rs0

Driver auto-discovers primary.

Write/read concern

db.coll.insertOne({...}, { writeConcern: { w: "majority", j: true, wtimeout: 5000 } })
  • w: 1: primary only (default).
  • w: "majority": majority acked → durable on failover.
  • j: true: journaled before ack.
db.coll.find({}).readConcern("majority")
db.coll.find({}).readConcern("snapshot")

Read preference

  • primary (default)
  • primaryPreferred
  • secondary
  • secondaryPreferred
  • nearest
db.coll.with_options(read_preference=ReadPreference.SECONDARY_PREFERRED).find({})

Failover

Automatic via election when primary fails. ~5-30s typical.

rs.stepDown(60)   // step down primary

Arbiter

3rd node that votes but doesn’t store data. Avoid in production (less HA).

Hidden / delayed members

rs.reconfig({
    _id: "rs0",
    members: [
        ...,
        { _id: 3, host: "...", hidden: true, priority: 0, secondaryDelaySecs: 3600 },
    ],
})

Hidden + delayed for offline backups.

Oplog

use local
db.oplog.rs.find().sort({ ts: -1 }).limit(5)

Replication log. Size matters:

replication:
  oplogSizeMB: 10240

Bigger oplog = more catch-up tolerance.

Sync sources

rs.syncFrom("node2:27017")

Force sync source.

Resync

If replica too far behind:

# Stop, delete data, start fresh
systemctl stop mongod
rm -rf /var/lib/mongo/*
systemctl start mongod

Initial sync from primary.

Change streams

db.coll.watch([{ $match: { operationType: "insert" } }])

For CDC. Requires replica set.

Election priorities

{ _id: 0, host: "node1", priority: 10 },   // prefer
{ _id: 1, host: "node2", priority: 5 },
{ _id: 2, host: "node3", priority: 1 },

Common mistakes

  • 2-node “replica set” (no quorum).
  • w: 1 then surprised at lost writes on failover.
  • Bypassing replica set with direct connection.
  • Oplog too small → resync needed.
  • Read from secondary expecting strong consistency.

Read this next

If you want my replica set setup, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .