Sharded mongodb-3.0.5 slow downs

Māris Ruskulis

unread,

Aug 26, 2015, 5:35:51 PM8/26/15

to mongodb-user

Greetings! We have 3 node setup where each node hosts 3 shards, each shard has 3 replicas, 9 replicas total. From time to time, once a week, sometimes twice a week one replica becomes very slow, restart fixes problem. Logs does not show any signs of problems in mongodb. Chunk balancer is turned off.

We are using mongodb 3.0.5 with WiredTiger, our main collection is ~80GB.

Any clue would be appreciated!

Asya Kamsky

unread,

Aug 27, 2015, 3:09:42 AM8/27/15

to mongod...@googlegroups.com

What do you mean by very slow. If there is nothing in the logs then how do you know it's slow?

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7135b127-b8ba-41bc-bc70-b301950c73f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Māris Ruskulis

unread,

Aug 27, 2015, 4:09:37 AM8/27/15

to mongodb-user

For example:

2015-08-27T10:43:33.238+0300 I COMMAND [conn303] command contacts.$cmd command: count { count: "contacts", query: { displayName: { $exists: true }, emails.address: { $exists: true }, user: "SOME_USER" } }
ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:20 reslen:154 locks:{ Global: { acquireCount: { r: 42 }, acquireWaitCount: { r: 21 }, timeAcquiringMicros: { r: 2824107 } }, Database: { acquireCount: { r: 21
} }, Collection: { acquireCount: { r: 21 } } } 9109ms

this query is covered by index, and normally it takes 50ms by http request. When single replica becomes slow, logs are spammed with such messages, but its unclear what causes this behaviour, load on web app is the same.

Chris De Bruyne

unread,

Aug 27, 2015, 5:34:30 AM8/27/15

to mongodb-user

Hi Māris,

Do you have a lot of inserts and updates ?

Maybe you are suffering from https://jira.mongodb.org/browse/SERVER-19522 and should upgrade to 3.0.6 ?

Asya Kamsky

unread,

Aug 27, 2015, 3:18:00 PM8/27/15

to mongod...@googlegroups.com

I didn't see OP mention anything about capped collections and in any case this is a count on another collection which would not be affected by insert on a different collection in wired tiger.

However that log line does show very high time for acquiring lock and that's probably because of high number of yields. OP - I would like to see explain for this query the count is using - for a covered index query even 50ms seems a bit on the high side. Explain will show how many entries in the index have to be examined and whether it is in fact a covered query.

numYields:20

Asya

--

You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/6e33749d-b086-4d8f-88b1-9d56b6bc8e07%40googlegroups.com.

Māris Ruskulis

unread,

Aug 31, 2015, 5:29:30 AM8/31/15

to mongodb-user

Hi Chris! Already upgraded to latest, does not help. We are not using capped collection, and query/write ops are ~ 5/1

Chris De Bruyne

unread,

Aug 31, 2015, 2:48:27 PM8/31/15

to mongodb-user

Sorry for the mistake but I meant https://jira.mongodb.org/browse/SERVER-18875 which was mentioned in another discussion about SERVER-19522 .

On Thursday, August 27, 2015 at 9:18:00 PM UTC+2, Asya Kamsky wrote:

I didn't see OP mention anything about capped collections and in any case this is a count on another collection which would not be affected by insert on a different collection in wired tiger.

However that log line does show very high time for acquiring lock and that's probably because of high number of yields. OP - I would like to see explain for this query the count is using - for a covered index query even 50ms seems a bit on the high side. Explain will show how many entries in the index have to be examined and whether it is in fact a covered query.

numYields:20

Asya

On Thursday, August 27, 2015, Chris De Bruyne <debruyn...@gmail.com> wrote:

Hi Māris,

Do you have a lot of inserts and updates ?

Maybe you are suffering from https://jira.mongodb.org/browse/SERVER-19522 and should upgrade to 3.0.6 ?

On Thursday, August 27, 2015 at 10:09:37 AM UTC+2, Māris Ruskulis wrote:
For example:

2015-08-27T10:43:33.238+0300 I COMMAND [conn303] command contacts.$cmd command: count { count: "contacts", query: { displayName: { $exists: true }, emails.address: { $exists: true }, user: "SOME_USER" } }
ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:20 reslen:154 locks:{ Global: { acquireCount: { r: 42 }, acquireWaitCount: { r: 21 }, timeAcquiringMicros: { r: 2824107 } }, Database: { acquireCount: { r: 21
} }, Collection: { acquireCount: { r: 21 } } } 9109ms

this query is covered by index, and normally it takes 50ms by http request. When single replica becomes slow, logs are spammed with such messages, but its unclear what causes this behaviour, load on web app is the same.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

Chris De Bruyne

unread,

Aug 31, 2015, 2:50:29 PM8/31/15

to mongodb-user

Hi Maris,

any change you already looked into Asya's questions?

Might give some useful hints.

Regards

Chris

Māris Ruskulis

unread,

Sep 1, 2015, 6:19:56 AM9/1/15

to mongodb-user

Hi Chris!

Currenlty running some heave map/reduce task, will complete in few days, then will continue with the issue.

Māris Ruskulis

unread,

Sep 4, 2015, 3:30:46 AM9/4/15

to mongodb-user

At some point of time, mongod starts using a lot of memory. Looks like slowdowns happen because of heavy swap usage. Disabled swap on node which hosts primary replica of main shard, which previously caused troubles, node has 16GB of ram, mongod wiredTigerCacheSize=2GB, at some time mongod got killed by oomkiller, mongod consumed a lot beyond 2GB, zabbix memory usage graph: http://postimg.org/image/56uvyjaeh/. Could not find any documentation on how memory is used by WiredTiger.

Chris De Bruyne

unread,

Sep 4, 2015, 11:22:24 AM9/4/15

to mongod...@googlegroups.com

Can you confirm the version you are running?

And can you show some logs around the time of the slowdowns?

And I presume there are no other processes running on the same machine?

Kind regards

Chris

On 4 September 2015 at 09:30, Māris Ruskulis <fr3...@gmail.com> wrote:

At some point of time, mongod starts using a lot of memory. Looks like slowdowns happen because of heavy swap usage. Disabled swap on node which hosts primary replica of main shard, which previously caused troubles, node has 16GB of ram, mongod wiredTigerCacheSize=2GB, at some time mongod got killed by oomkiller, mongod consumed a lot beyond 2GB, zabbix memory usage graph: http://postimg.org/image/56uvyjaeh/. Could not find any documentation on how memory is used by WiredTiger.

--

You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---

You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/euDDVBr_J-U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c6a53176-bfad-4943-b64a-614a5c122f3c%40googlegroups.com.

Matthieu Rigal

unread,

Sep 7, 2015, 10:42:56 AM9/7/15

to mongodb-user

Hi Maris,

We have actually the same problem on the VMs composing our replica sets (we also have physicals with higher priority ;-)). When we have backups enabled, or any LVM or rsync related process running while Mongo is running, it can bring the VM far down on the performance side and also never escape this state until a restart of the process is done. Since it happens once a week, could it be that you have any similar process running?

I don't know in how far this is doable, but this is anyway a quite annoying problem that could also be fixed on the Mongo side. It is an on-going problem we had also earlier with 2.X versions.

Best,

Matthieu

Asya Kamsky

unread,

Sep 9, 2015, 6:07:10 PM9/9/15

to mongodb-user

First - don't remove swap configuration, otherwise you will get OOM
killer killing mongod (unless that's actually what you want which
seems unusual).

Wondering why you have such low memory for wired tiger cache - normal
would be half the physical memory - is this node running other
processes?

You showed a single slow operation in the log - I'm guessing whatever
is causing the problem is running at the same time so you should
examine the full log - I recommend mplotqueries from Mtools here:
http://blog.rueckstiess.com/mtools/2014/04/13/mtools-introduction.html
it can give you a nice visual representation of what is slow and you
may be able to spot from that what is causing this - heavy map/reduce
jobs, aggregations, sorts in memory can all cause *some* of this, but
they would all leave evidence in the logs.

Asya

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---

> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Māris Ruskulis

unread,

Sep 17, 2015, 4:39:41 AM9/17/15

to mongodb-user

added more ram to server, increased wiredtiger cache to fit indexes in memory. Now problem is gone.

Reply all

Reply to author

Forward