MongoDB High iowait, no disk usage

Elmar Weber

unread,

May 26, 2016, 7:36:29 AM5/26/16

to mongodb-user

Hello,

on a cold, i.e. restarted, mongodb we are seeing some issues with iowait. On a machine with 16 cores and 60 GB memory and SSD (max IO around 250 mb/s) (virtual machine on google) we see around constantly 20% iowait on top and iotop shows it's the mongoDB process. Only around 10MB/s are actually read from the disk and as I said, we know from other tests that the machine can do much much more.

The query producing this pattern sorts a result from a query by an attribute (all indexed). After running queries like this a few times it works better. The total index size is around 4GB, so I assume it get's loaded in memory directly.

Is there any suggestions on how to debug this? I can't see how there can be such a high iowait when all computing resources are not even barely utilized.

* Running MongoDB 3.0.10 on Ubuntu 14.04

Thanks,
Elmar

Kevin Adistambha

unread,

Jun 3, 2016, 4:15:56 AM6/3/16

to mongodb-user

Hi Elmar,

on a cold, i.e. restarted, mongodb we are seeing some issues with iowait. On a machine with 16 cores and 60 GB memory and SSD (max IO around 250 mb/s) (virtual machine on google) we see around constantly 20% iowait on top and iotop shows it’s the mongoDB process. Only around 10MB/s are actually read from the disk and as I said, we know from other tests that the machine can do much much more.

The query producing this pattern sorts a result from a query by an attribute (all indexed). After running queries like this a few times it works better. The total index size is around 4GB, so I assume it get’s loaded in memory directly.

Is there any suggestions on how to debug this? I can’t see how there can be such a high iowait when all computing resources are not even barely utilized.

High iowait on the MongoDB process implies that MongoDB is waiting for the disk to read/write data. Are the resources consistently not being utilized, or does this only happen for a period of time after a cold start?

There are a number of potential reasons for your initial performance issues, but collecting more detailed metrics may be able to provide a hint:

Does this happen after a cold start of the machine or mongod? E.g., does restarting mongod exhibit this issue as well?
Do you have multiple mongod running in the machine? E.g. on different ports, using Docker, or any virtualization?
Do you see anything suspicious in the server logs or the mongod logs? You could try using mtools which is a collection of tools to analyze MongoDB deployment by analyzing the log files, for example: mloginfo --queries, mplotqueries, etc.
Can you include the output of explain(true) for the query?
What is the storage engine used: MMAPv1 or WiredTiger?
What size and type of SSD is provisioned on the server?
How many times do you need to run the query before it gets to an acceptable level of performance, and does the performance increase significantly? Is there any difference in the log files between the two cases?
What is the size of the collection?

You may find the following links useful:

Running MongoDB 3.0.10 on Ubuntu 14.04

I would recommend to upgrade to the latest 3.0 version, which is currently at 3.0.12 for bugfixes and improvements.

Best regards,
Kevin

Elmar Weber

unread,

Jun 5, 2016, 1:30:14 PM6/5/16

to mongodb-user

Hi Kevin,

thanks for your feedback, here a quick rundown of the answers:

Does this happen after a cold start of the machine or mongod? E.g., does restarting mongod exhibit this issue as well?

No, then the issue does not happen.

Do you have multiple mongod running in the machine? E.g. on different ports, using Docker, or any virtualization?

No.

Do you see anything suspicious in the server logs or the mongod logs? You could try using mtools which is a collection of tools to analyze MongoDB deployment by analyzing the log files, for example: mloginfo --queries, mplotqueries, etc.

Yes, lots of these, but they also are showing when the iowait issue is not happening:
"I COMMAND [conn252552] getmore cpy-engine.events query:"<query> cursorid:295985550697 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:54 nreturned:1403 reslen:4194730 locks:{ Global: { acquireCount: { r: 110 } }, Database: { acquireCount: { r: 55 } }, Collection: { acquireCount: { r: 55 } } } 883ms

Can you include the output of explain(true) for the query?

At the end, as I read it, it uses indexes for everything.

What is the storage engine used: MMAPv1 or WiredTiger?

WiredTiger

What size and type of SSD is provisioned on the server?

750GB persistent SSD

How many times do you need to run the query before it gets to an acceptable level of performance, and does the performance increase significantly? Is there any difference in the log files between the two cases?

After initial run everything is OK, we are running the query with different filters and then the performance is not showing any difference, even when the exact same query has not been running before. No difference in log files as far as I can see.

What is the size of the collection?

Here are the stats:
        "count" : 10666813,
        "size" : 35794866912,
        "avgObjSize" : 3355,
        "storageSize" : 12123406336,

Thanks,
Elmar

Reply all

Reply to author

Forward