CPU usage / Connection problems and sharding

93 views
Skip to first unread message

Stephan

unread,
May 29, 2012, 9:07:28 AM5/29/12
to mongod...@googlegroups.com
Hi everybody,

we're using Mongo in our productive environment with 2 shards (each part of a replica set). Almost every day I see that several of these nodes do have a 100% cpu usage and for some time do not respond to queries. This results in Erros and strange behavior of our website. When taking a closer look at the mongodb logs, it seems that there is shard reorganizing going on. Unfortunately this causes the java driver to end with an exception ("can't call something")

We're actually not sure, if sharing is the right approach here. We have a collection where business related events and profiling information is being stored. This tend to be several per second and will be processed offline. Right now there are about 112847590 entries in this collection, half of which in each shard.

So, my question is: is it possible to optimize sharding so that the reorganization of it does not bring the whole server to stop? 
And: what happens if i delete a sharded collection in an environment where several entries are made in a second? will this work or bring us down?

I was thinking about exporting the data, dropping the sharded collection and re-importing the needed data, keep the historic stuff safe.

Any ideas?

Thanks


Scott Hernandez

unread,
May 29, 2012, 9:15:12 AM5/29/12
to mongod...@googlegroups.com
When there is high cpu are there lots of slow operations? Have you
checked to make sure there are indexes for these queries and that they
are efficient? (generally high cpu is caused by queries which cannot
use an index (efficiently).

Are you monitoring with MMS + Munin, or can you provide iostat -xm 2
data during this periods along with mongostat numbers from each
primary?

You may want to turn off the balancer and see how far out of balance
your shards gets, or to eliminate balancing as the root cause:
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing

In general you don't want to drop a sharded collection as they will
not only get rid of all the data, but also the sharded metadata about
the shard key, and will make the collection be un-sharded. There is a
bug related to dropping a sharded collection, and then adding it back
-- what version are you using?
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Stephan Bösebeck

unread,
May 30, 2012, 7:33:26 AM5/30/12
to mongod...@googlegroups.com
Hi Scott,

Lets just answer the questions you asked:
- there are no expensive operations/queries running - the same queries work now and in 5 minutes they end in a timeout.
- yes, we us MMS and Munin...
- Version 2.0.2 of MongoDB is installed on all nodes

I was thinking about dropping the sharded collection in order to have it re-created as unsharded one getting rid of the balancing issue.

Thanks for the docs you mentioned, I try to configure the balancer only to work in the night.

Scott Hernandez

unread,
May 30, 2012, 7:39:28 AM5/30/12
to mongod...@googlegroups.com
On Wed, May 30, 2012 at 7:33 AM, Stephan Bösebeck
<sboes...@googlemail.com> wrote:
> Hi Scott,
>
> Lets just answer the questions you asked:
> - there are no expensive operations/queries running - the same queries work now and in 5 minutes they end in a timeout.
That isn't exactly what I was getting at. When you get to these slow
points can you please provide db.currentOp() output along with the
time things are happening so we can look in MMS?

> - yes, we us MMS and Munin...
Okay, when does this happen and what is your group name?

> - Version 2.0.2 of MongoDB is installed on all nodes
You should upgrade to 2.0.5 or 2.0.6 later this week as there are some
important fixes related to sharding and balancing.

> I was thinking about dropping the sharded collection in order to have it re-created as unsharded one getting rid of the balancing issue.

Please turn off the balancer first to check if this is the issue.

Stephan Bösebeck

unread,
May 30, 2012, 8:01:05 AM5/30/12
to mongod...@googlegroups.com

Am 30.05.2012 um 13:39 schrieb Scott Hernandez:

> On Wed, May 30, 2012 at 7:33 AM, Stephan Bösebeck
> <sboes...@googlemail.com> wrote:
>> Hi Scott,
>>
>> Lets just answer the questions you asked:
>> - there are no expensive operations/queries running - the same queries work now and in 5 minutes they end in a timeout.
> That isn't exactly what I was getting at. When you get to these slow
> points can you please provide db.currentOp() output along with the
> time things are happening so we can look in MMS?

>
>> - yes, we us MMS and Munin...
> Okay, when does this happen and what is your group name?
look in MSS, group name holidayinsider.com
>
>> - Version 2.0.2 of MongoDB is installed on all nodes
> You should upgrade to 2.0.5 or 2.0.6 later this week as there are some
> important fixes related to sharding and balancing.
We actually use 2.0.4 - we need to plan the upgrade a bit. Last time we upgraded, there was a problem with the replicaset resulting in the cluster not to start at all... (annother issue)
>
>> I was thinking about dropping the sharded collection in order to have it re-created as unsharded one getting rid of the balancing issue.
>
> Please turn off the balancer first to check if this is the issue.
done - i'll keep a look at the problem.

Scott Hernandez

unread,
May 30, 2012, 8:13:50 AM5/30/12
to mongod...@googlegroups.com
On Wed, May 30, 2012 at 8:01 AM, Stephan Bösebeck
<sboes...@googlemail.com> wrote:
>
> Am 30.05.2012 um 13:39 schrieb Scott Hernandez:
>
>> On Wed, May 30, 2012 at 7:33 AM, Stephan Bösebeck
>> <sboes...@googlemail.com> wrote:
>>> Hi Scott,
>>>
>>> Lets just answer the questions you asked:
>>> - there are no expensive operations/queries running - the same queries work now and in 5 minutes they end in a timeout.
>> That isn't exactly what I was getting at. When you get to these slow
>> points can you please provide db.currentOp() output along with the
>> time things are happening so we can look in MMS?

You may want to enable database profiling on the primaries of your
shards to get an idea of the types of queries and their timing
historically. You can also enable this in MMS so it is easier to
diagnose during the slow times.

>
>>
>>> - yes, we us MMS and Munin...
>> Okay, when does this happen and what is your group name?
> look in MSS, group name holidayinsider.com

You don't seem to collecting hardware stats with MMS. Can you make
sure connectivity and munin are setup correctly on all your hosts?
http://mms.10gen.com/help/install.html#hardware-monitoring-with-munin-node

>>
>>> - Version 2.0.2 of MongoDB is installed on all nodes
>> You should upgrade to 2.0.5 or 2.0.6 later this week as there are some
>> important fixes related to sharding and balancing.
> We actually use 2.0.4 - we need to plan the upgrade a bit. Last time we upgraded, there was a problem with the replicaset resulting in the cluster not to start at all... (annother issue)
>>
>>> I was thinking about dropping the sharded collection in order to have it re-created as unsharded one getting rid of the balancing issue.

It seems like you have lots of unsharded databases which live on the
hi2 shard. Can you run mongotop on that shard and see where all the
traffic is going -- which collections are active? The operations and
load between the two shards does not seem very even and I'm guess it
is that since your three sharded collections are balanced.
Reply all
Reply to author
Forward
0 new messages