moveChunk taking a very long time?

127 views

Skip to first unread message

hespoddi

unread,

Oct 17, 2012, 6:40:00 PM10/17/12

to mongod...@googlegroups.com

Hi all,

We've noticing that moveChunk is taking a very long time. It's been running for 2 hrs and seems to be stuck at step #3:

{

"opid" : "RS_Crawler_2:1419152355",

"active" : true,

"lockType" : "write",

"waitingForLock" : false,

"secs_running" : 9597,

"op" : "query",

"ns" : "crawlerDb.items",

"query" : {

"moveChunk" : "crawlerDb.items",

"from" : "RS_Crawler_2/<removed>",

"to" : "RS_Crawler/<removed>",

"min" : {

"itemId" : NumberLong(43206356),

"feedId" : 1420

"max" : {

"itemId" : NumberLong(43427248),

"feedId" : 35350

"maxChunkSizeBytes" : NumberLong(67108864),

"shardId" : "crawlerDb.items-itemId_43206356feedId_1420",

"configdb" : "<removed>"

"client_s" : "10.100.0.35:54401",

"desc" : "conn",

"threadId" : "0x7f143287e700",

"connectionId" : 417298,

"msg" : "step5",

"numYields" : 2081

}

There is currently no load on the db at all, only a high lock. What is step #3? Is this normal?

Please let me know what other information is needed and I'll post... Thanks!

Chris

Andre de Frere

unread,

Oct 19, 2012, 12:14:55 AM10/19/12

to mongod...@googlegroups.com

Hi Chris,

There could be a couple of things going on here. Where the chunks are migrating from and migrating to is going to make some difference in the time it takes to move a chunk. This will normally be due to different write loads on the different shards. You've mentioned that you were not seeing high load, but were seeing high lock. What metrics were you using to measure load? In particular I would be interested in seeing I/O metrics such as I/O wait and flush times for the nodes involved with moving chunks.

Are these hosts in MMS?

High lock would normally be an indication of high I/O, and may have an impact on performance of moving chunks, even though the CPU may not be heavily loaded. As mongoDB works in memory mapped files, it would be less common to see CPU as a bound for performance. Rather, memory and disk would be a more common performance bottle neck, especially if mongo needs to page fault. When CPU load starts to ramp up, this is normally caused by poor indexing strategies, or counts and sorts on unindexed keys.

Could you give us an idea of how high your lock % was? Were you seeing a corresponding high page fault metric?

Would it be possible to see some output of mongostat, mongotop and iostat -xm 2? With MMS we can get several of these metrics automatically, but I'm not sure if you have that installed.

Are you seeing this slow behaviour every time a move chunk occurs in your environment?

Regards,

André

Reply all

Reply to author

Forward

0 new messages