Level 2 · 25 min

Replication

MongoDB replication provides high availability and data durability through replica sets. Understanding the oplog, write concerns, and read preferences is essential for building systems with the right consistency guarantees.

Replica Set Architecture

A replica set is a group of MongoDB instances maintaining the same data. Typical deployment: 1 primary + 2 secondaries (minimum for automatic failover). The primary accepts all writes. Secondaries replicate from the primary via the oplog and can serve reads (with caveats). An arbiter participates in elections but holds no data — used to break ties in even-member sets. Election triggers: primary becomes unavailable (no heartbeat for 10 seconds), primary steps down (rs.stepDown()), network partition. Election algorithm: each secondary votes, majority required. Priority settings control which secondary is preferred for election. Hidden members (priority 0, hidden: true) receive replication but are invisible to applications — used for analytics and backups.

Oplog and Write Concern

The oplog (operations log) is a special capped collection (local.oplog.rs) that records all write operations in an idempotent form. Idempotent means each oplog entry can be applied any number of times and always produces the same result — essential for safe re-application after network interruptions. Secondaries tail the oplog and replay operations. Oplog is capped — older entries are overwritten when full. Key insight from MongoDB: The Definitive Guide (3rd ed., Bradshaw, Brazil, Chodorow): 'A general rule of thumb is that the oplog should provide coverage (replication window) for two to three days' worth of normal operations.' The authors are explicit: 'Your primary's oplog should be thought of as your maintenance window. If your primary has an oplog that is an hour long, then you only have one hour to fix anything that goes wrong before your secondaries fall too far behind and must be resynced from scratch.' Default election configuration: heartbeatIntervalMillis: 2000 (heartbeat every 2 seconds), heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000 — meaning a primary is declared unavailable after ~10 seconds of missed heartbeats. Write concern: w:majority requires more than half of all set members (3-member set: 2; 5-member set: 3; 7-member set: 4).

Read Preference and Consistency

Read preference controls which replica set member serves reads. primary: all reads from primary (strong consistency, default). primaryPreferred: reads from primary if available, secondary otherwise. secondary: all reads from secondaries (potential stale reads — replication lag). secondaryPreferred: reads from secondaries, primary fallback. nearest: lowest network latency member. Reading from secondaries can return stale data — the amount of staleness depends on replication lag. Causal consistency (MongoDB 3.6+) ensures that within a session, reads never go back in time: after a write, subsequent reads see that write or later. Use causally consistent sessions for operations that depend on their own previous writes.

Key Takeaways

w:majority + j:true provides the strongest durability guarantee — writes survive primary failure. w:1 is faster but risks data loss if primary fails before secondaries replicate.
Reading from secondaries can return stale data. The staleness depends on replication lag. Use causal consistency sessions if reads must see their own writes.
Oplog size must cover your replication lag. If a secondary falls behind and the oplog overwrites unread entries, the secondary must resync from scratch (expensive).

Code example

// Write with majority durability\ndb.orders.insertOne(\n  {orderId: '123', amount: 99.99},\n  {writeConcern: {w: 'majority', j: true, wtimeout: 5000}}'\n)\n\n// Read from nearest secondary (analytics)\ndb.orders.find({status: 'completed}).readPref('nearest')\n\n// Check replication lag\nrs.printReplicationInfo()\nrs.printSecondaryReplicationInfo()