Replication overloading primary member

45 views
Skip to first unread message

Alexandre Rocco

unread,
Jul 27, 2012, 1:51:54 PM7/27/12
to mongod...@googlegroups.com
Hi guys,

I have a question about the replication that occurs in a replica set.
We have a small setup with 3 servers in one DC (one of them is an arbiter) and 2 more servers in another DC (hidden members).

A few days ago we created a capped collection with approx. 45GB and then noticed that the replication lag was going up a lot. Researching a little bit more into the issue we found here on the group and other sources that this issue occurs when a capped collection is replicated and it does not contain an unique index on the _id field. After creating this index on primary and secondary we restarted the secondary to resume the replication. In fact it went so much faster and after a couple of hours the secondary was synced ok.

The fact that was quite strange is that apparently the primary was overloaded at the replication time and it made some impact on our production system. At some point, all the connections to the replica started to go into the error condition  “Unable to connect to the primary member of the replica set”. At the same time, I tried issuing some commands in the shell and the replica looked fine on rs.status() and the logs.

At the time the secondary was replicating there was no other mongod to serve the queries since we connect only to these 2 members on our application. Could the replication process overload the primary with a lot of read operations and cause this issue?

Also, the application gone into a state of keeping a lot of open connections (maybe on the pool) but these connections where not sending/receiving any data, but I guess that this is another part of the problem that could be diagnosed on mongodb-csharp group.
Any tips would be appreciated.

Best,
Alexandre Rocco

Scott Hernandez

unread,
Jul 27, 2012, 2:09:46 PM7/27/12
to mongod...@googlegroups.com
I would suggest looking through the logs on the primary at that time;
it sounds like you may have hit a limit on the number of open files
(including connections).
http://www.mongodb.org/display/DOCS/Too+Many+Open+Files

I have almost never seen replication be the cause of the primary
having issue. The case where I have seen this is when there is lots of
disk-io contention (look for page faults) on the primary and the parts
of the oplog which the secondaries needed were not in memory. Also, it
is good to look to see if the primary has non-zero write queues during
the time when replication fell behind.
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
Reply all
Reply to author
Forward
0 new messages