mongo iowait

750 views
Skip to first unread message

tetlika

unread,
Sep 6, 2012, 10:43:49 AM9/6/12
to mongodb-user
hi!

we are on amazon ec2 m2.4xlarge instances and on sharding with mongo
2.0.6, with 4x disks in RAID0, index size on every shard is less than
50Gb

m2.4xlarge are 69GB of RAM

we've noticed such weird behavior of mongod:

1) as soon as "res" value is around 55-60Gb on any of our shards, we
are monitoring high unexplainable iowait on that shard master,
application slows down extremely

2) we are doing stepdown and things are normal again until the res
reaches the value of 55-60Gb (after month or so)

such behavior looks very weird, any thoughts what it can be?

thanks

tetlika

unread,
Sep 6, 2012, 10:49:01 AM9/6/12
to mongodb-user
forgot to say that it is not happening every time the res is reached
to that value, the shard can "live" for weeks with that res but than
suddenly starts that behavior, sometimes the shard "lives" just a
couple of days with that res, and than starts iowaiting

Samuel García Martínez

unread,
Sep 6, 2012, 5:33:52 PM9/6/12
to mongod...@googlegroups.com
Maybe is not resident size related. This can give you a hint: http://www.10gen.com/presentations/MongoNYC-2012/MongoDB-at-foursquare

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb



--
Un saludo,
Samuel García.

tetlika

unread,
Sep 7, 2012, 12:58:40 AM9/7/12
to mongodb-user
thanks, nice video

but looks in our specific case there are no random ebs io problems

On 7 Вер, 00:34, Samuel García Martínez <samuelgmarti...@gmail.com>
wrote:

David Hows

unread,
Sep 7, 2012, 1:16:37 AM9/7/12
to mongod...@googlegroups.com
Hi Tetlika,

How are you collecting the iowait metric when these issues occur?

Are there any mentions of long running queries or yields in your mongod logs when this occurs? 

If the res value increases and you see a spike in iowait it sounds like you would be pagefaulting.

Do you have MMS installed? If so can you provide a link to your dashboard? And which version of the DB are you running?

Cheers,

David

tetlika

unread,
Sep 7, 2012, 1:27:40 AM9/7/12
to mongodb-user
we are on rightscale and iowait is collected by default in reflected
in plots, also during that periods I see queues in mongostat
yes there are long queries in log, but they are regular queries which
are starting executing more longer during iowait periods and are
dropped into log, we already investigated that
also after stepDown the things are back to normal on new master and
old one
the problem is always when we have high res value, we didnt faced
unexpected iowaits while the res is low

we are on 2.0.6
> > > > To post to this group, send email to mongod...@googlegroups.com<javascript:>
> > > > To unsubscribe from this group, send email to
> > > > mongodb-user...@googlegroups.com <javascript:>

David Hows

unread,
Sep 7, 2012, 1:47:44 AM9/7/12
to mongod...@googlegroups.com
Hi Tetlika,

This very much sounds as if your system is paging at the time you are seeing these spikes.

The spike in resident memory correlating with a spike in I/O Wait sounds like your instance is trying to pull paged files from disk into memory.

In MMS we capture this metric explicitly. 

Common solutions here are to look at faster disks or increase the amount of ram available to your server.

Cheers,

David

tetlika

unread,
Sep 7, 2012, 1:54:56 AM9/7/12
to mongodb-user
but why we dont see such spikes while res is low and actively paging
into memory (for example when slave becomes master)?

David Hows

unread,
Sep 7, 2012, 2:20:43 AM9/7/12
to mongod...@googlegroups.com
Hi Tetlika,

Perhaps i am being unclear. If the amount of data currently held in resident memory is high and suddenly you need to access data which is not in memory you will need to perform some juggling. To do this you will need to write data that is in memory out to disk in order to create free space into which you can load the desired data.

If your amount of data resident in memory is lower its likely that you will not need to perfore these swaps to hold the new data, as there is enough available space to hold it.

Hope that clears things up,

David 

tetlika

unread,
Sep 7, 2012, 2:24:07 AM9/7/12
to mongodb-user
ok, but it happens on when res is near 50 GB, while nodes have almost
20gb still free

tetlika

unread,
Sep 7, 2012, 2:26:58 AM9/7/12
to mongodb-user
btw, never saw it will go more than 60, while node have 70 and just
mongod is on it

David Hows

unread,
Sep 7, 2012, 3:05:06 AM9/7/12
to mongod...@googlegroups.com
Have you considered installing MMS? Available for free at http://mms.10gen.com

The MMS agent can collect a lot of these internals and give some good insights into what is going on internally within mongo.

If your worried about these kind of spikes causing performance issues it would be best to install and see what data you can gather from MMS to compare with the data from your EC2 instance.

Cheers,

David

tetlika

unread,
Sep 7, 2012, 3:06:43 AM9/7/12
to mongodb-user
is it free?

On 7 Вер, 10:05, David Hows <david.h...@10gen.com> wrote:
> Have you considered installing MMS? Available for free athttp://mms.10gen.com

David Hows

unread,
Sep 7, 2012, 3:22:49 AM9/7/12
to mongod...@googlegroups.com
Yes. Its available for free at http://mms.10gen.com 

10Gen recommends everyone install MMS

tetlika

unread,
Sep 7, 2012, 3:30:32 AM9/7/12
to mongodb-user
thanks, David

what exactly plot in mms should I mostly pay attention to in my
situation?

On 7 Вер, 10:22, David Hows <david.h...@10gen.com> wrote:
> Yes. Its available for free athttp://mms.10gen.com

David Hows

unread,
Sep 7, 2012, 3:43:03 AM9/7/12
to mongod...@googlegroups.com
I'd start with looking at the following
pagefaults
queues
lock%
opcounters

Those are normally very good for detecting issues with your mongo instance.

Depending on your EC2 setup you may also want to look at network - as there may be a significant correlation between the pagefault and network operations - as your OS pages out via network to retrieve data from disk.

Finally, have a look at following the munin node application (instructions are available on the MMS page) as this tool gathers hardware statistics (including iowait) so you can use these to compare actively with MMS.

Cheers,

David
Reply all
Reply to author
Forward
0 new messages