Hi,
I posted my findings also in the topic "Strange behavior of W" as I think, this is related.
We have a replica set with 4 nodes (+ one arbiter), all with different priorities, one of them is prio 0.
Today I found, that setting W in writeconcern to more than 1 resulted in a timeout.... Then I did some tests, and then i found out that the replication lag was about 1500 secs - for all nodes. I cannot explain, why this happened, as there was no heavy writing at that time.
The network was ok all the time (10GB network - not much load on the servers).
During that time, the load on all secondary nodes was about 4.0 - whereas the load on the primary node was ~0.2!
My Questions:
- what operations might cause an increase of the replication lag?
- how could that be avoided?
- is there a setup error, or what kind of error might cause such a behavior?