Secondary falling behind. Socket exceptions?

61 views
Skip to first unread message

Mike Templeman

unread,
Jul 15, 2015, 11:36:59 AM7/15/15
to mongod...@googlegroups.com

I have a secondary (running 3.0.4) that appears to fall behind once a day (no specific time that I can determine). Once every day or so I get an alert from cloud manager that the secondary is falling behind. I have found that the only way to recover is to restart the secondary.

I have checked for replication errors with {getLog:"rs"} after bumping the replication verbosity to {verbosity:4} and found nothing to explain this. A comment on StackExchange led me to look for socket exceptions in the secondary log. What I found were multiple exceptions of the type:

2015-07-15T15:21:01.458+0000 D NETWORK  [conn10094] SocketException: remote: 10.146.238.149:57214 error: 9001 socket exception [CLOSED] server [10.146.238.149:57214


These errors seem to happen every 30 seconds or so when the secondary is in this state. I should also mention that the log reports serverStatus as very slow. 


Mongostat reports activity (inserts, deletes) on the secondary but at a highly reduced rate from normal. iostat 5 reports a very bursty activity on writes for the secondary.


The primary has ~470 connections, 22GB cache and is running 3.0.3. 


Any ideas as to what is causing this odd behavior?


Mike




Mike Templeman

unread,
Jul 15, 2015, 1:10:40 PM7/15/15
to mongod...@googlegroups.com
Another note. After writing this I realized I hadn't googled "mongodb secondary socket exceptions". Doing that led me to the mongodb FAQ on keepalive. In this FAQ mongo recommended a tcp keepalive value of 120 vs the default 7200 (for an AWS Linux OS). I have changed the tcp_keepalive_time to 120 and will report back tomorrow the results.

Mike

Mike Templeman

unread,
Jul 17, 2015, 1:16:41 AM7/17/15
to mongod...@googlegroups.com

Changing the tcp_keepalive_time to 120 from 7200 did not prevent the secondary from falling behind. Oh well, time for another hypothesis.

Kelvin Shek

unread,
Dec 5, 2016, 4:47:09 PM12/5/16
to mongodb-user
Did you ever figure out what was going on?
Reply all
Reply to author
Forward
0 new messages