Hi Simon!
To address your first question:
1) MongoDB uses a single replication algorithm, regardless of the load. At no time does MongoDB intentionally throttle the speed of the replication.
2) The replication algorithm is as follows:
- The Primary (read/write) member of the replication set writes all data modification commands into the oplog.
- Any secondary that is replicating from the primary runs a query against the primary's oplog. This query is run using a tailable cursor, so the secondary will receive updates from the primary as fast as possible
- When the secondary receives the oplog instructions, it applies them as fast as it can
3) Unfortunately, in MongoDB version 2.0.x, the performance of replication under an extreme write load is not as robust as it might be.
The problem is that in that version, MongoDB uses a single reader/writer latch to control all access to the database. The latch allows multiple concurrent readers, but only a single writer at one time. In addition, when the latch is held by a writer, all readers are blocked.
Normally, this isn't a problem. However, when the primary is handling an overload of write operations, then it may become overloaded with the write lock, and all readers will block. The problem with replication arises because the secondary is getting its data via a read operation on the oplog, and that read will block as well.
4) The good news is that this problem is largely alleviated in MongoDB 2.2, which is due for production release in the next few months. In version 2.2, the global reader/writer latch is supplemented with an additional set of reader/writer latches -- one for each database. Since the oplog is kept in a separate database from the other collections, this means that heavy write operations on the primary don't impact the ability of the secondaries to read from the oplog as much as they do in version 2.0.x.
5) You can read more about replication internals here:
-
http://docs.mongodb.org/manual/replication/ -
http://docs.mongodb.org/manual/core/replication-internals/To address your second question:
1) While there isn't enough data in your problem description to definitively diagnose what happened, it seems likely that what happened is that your secondaries were starved of data due to heavy write load on the primary, as described above.
2) Replication lag is normal in the case that your primary is heavily overloaded. Eventually the secondaries should catch up.
3) There are many possible causes of replication lag: some of the most common include an overloaded primary, slow network, and slow disks on the secondary.
4) Some of the possible reasons you could have overloaded your primary include:
- Working set size significantly larger than RAM
- Heavy write load
- Disk slowdown (disk overload, SAN slowdown, controller failure, etc.)
Please let me know if you have further questions.
-William