serverStatus was very slow - mongod instnce killed at least once a day

JrMaster

unread,

Aug 13, 2017, 6:20:49 PM8/13/17

to mongodb-user

Hi,

Several of our mongod instance are killed at least once a day on several different replica sets.

We are using mongo 3.4.4 on all instances in the environment.

Our topology consists of the following:

1 x config replica set (3 nodes)
4 x data bearing replica sets (3 nodes - 1 of the nodes is hidden and has a priority of 0)
6 x mongos routers

I'm seeing these cryptic messages and was wondering how I could use them to understand what's going on...

2017-08-12T01:46:45.483+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 10, after asserts: 20, after backgroundFlushing: 31, after connections: 78, after dur: 118, after extra_info: 169, after globalLock: 230, after locks: 230, after network: 230, after opLatencies: 285, after opcounters: 315, after opcountersRepl: 315, after repl: 595, after security: 636, after storageEngine: 696, after tcmalloc: 767, after wiredTiger: 964, at end: 1304 }

2017-08-12T01:46:45.483+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 10, after asserts: 20, after backgroundFlushing: 31, after connections: 78, after dur: 118, after extra_info: 169, after globalLock: 230, after locks: 230, after network: 230, after opLatencies: 285, after opcounters: 315, after opcountersRepl: 315, after repl: 595, after security: 636, after storageEngine: 696, after tcmalloc: 767, after wiredTiger: 964, at end: 1304 }

2017-08-12T01:46:52.127+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 82, after asserts: 82, after backgroundFlushing: 92, after connections: 92, after dur: 92, after extra_info: 159, after globalLock: 200, after locks: 269, after network: 279, after opLatencies: 289, after opcounters: 289, after opcountersRepl: 289, after repl: 521, after security: 541, after storageEngine: 625, after tcmalloc: 645, after wiredTiger: 809, at end: 1024 }

2017-08-12T01:47:09.463+0000 I COMMAND  [conn24] serverStatus was very slow: { after basic: 10, after advisoryHostFQDNs: 110, after asserts: 110, after backgroundFlushing: 120, after connections: 120, after dur: 120, after extra_info: 130, after globalLock: 130, after network: 205, after opLatencies: 215, after opcounters: 215, after opcountersRepl: 215, after oplog: 5185, after repl: 5185, after security: 5185, after sharding: 5195, after storageEngine: 5195, after tcmalloc: 5195, after wiredTiger: 5206, at end: 5221 }

2017-08-12T01:47:09.465+0000 I COMMAND  [conn24] command local.oplog.rs command: serverStatus { serverStatus: 1, advisoryHostFQDNs: 1, locks: 0, recordStats: 0, oplog: 1 } numYields:0 reslen:23635 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 4943878 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } protocol:op_query 5250ms

2017-08-12T07:04:20.914+0000 I CONTROL  [main] ***** SERVER RESTARTED *****

Many thanks in advance.

Rhys Campbell

unread,

Aug 14, 2017, 3:39:23 AM8/14/17

to mongodb-user

Check your syslog to see if the OOM Killer is being invoked.

JrMaster

unread,

Aug 15, 2017, 5:17:18 PM8/15/17

to mongodb-user

Hi Rhys,

Thank you for your response. I've checked dmesg and this is indeed an OOM Killer...

I've read that allocating SWAP space should help mitigating the issue which I will be adding asap.

I was wondering if there is any way to understand why mongod keeps allocating more and more memory? and if so, is there anything I can do to prevent this behavior?

Rhys Campbell

unread,

Aug 18, 2017, 2:09:34 PM8/18/17

to mongodb-user

You probably want to avoid swapping in your mongo instance. Have you modified the WiredTiger Cache Size at all? Have you look at optimising queries / indexes? Improving these can have a big impact on memory usage.

Kevin Adistambha

unread,

Aug 24, 2017, 1:08:42 AM8/24/17

to mongodb-user

Hi,

I was wondering if there is any way to understand why mongod keeps allocating more and more memory?

Just to add to what Rhys has suggested, if your mongod processes was regularly killed by OOMkiller, it typically means that your provisioned RAM is smaller than what your workload requires. Increasing the amount of RAM in your deployment is one way to improve your performance.

To improve your queries and indexes, the low hanging fruit is to check your logs to see if any query is performing a COLLSCAN stage. This stage involves loading the whole collection into RAM. If your collection is small, you may be able to get away with it, but if you have large collections, this will be quite disruptive to your working set and RAM. Use explain() to check if your queries are doing a COLLSCAN, and create indexes to support your queries.