Hi guys,
I have a question about the replication that occurs in a replica set.
We have a small setup with 3 servers in one DC (one of them is an arbiter) and 2 more servers in another DC (hidden members).
A few days ago we created a capped collection with approx. 45GB and then noticed that the replication lag was going up a lot. Researching a little bit more into the issue we found here on the group and other sources that this issue occurs when a capped collection is replicated and it does not contain an unique index on the _id field. After creating this index on primary and secondary we restarted the secondary to resume the replication. In fact it went so much faster and after a couple of hours the secondary was synced ok.
The fact that was quite strange is that apparently the primary was overloaded at the replication time and it made some impact on our production system. At some point, all the connections to the replica started to go into the error condition “Unable to connect to the primary member of the replica set”. At the same time, I tried issuing some commands in the shell and the replica looked fine on rs.status() and the logs.
At the time the secondary was replicating there was no other mongod to serve the queries since we connect only to these 2 members on our application. Could the replication process overload the primary with a lot of read operations and cause this issue?
Also, the application gone into a state of keeping a lot of open connections (maybe on the pool) but these connections where not sending/receiving any data, but I guess that this is another part of the problem that could be diagnosed on mongodb-csharp group.
Any tips would be appreciated.
Best,
Alexandre Rocco