Why are queries VERY slow?

129 views
Skip to first unread message

Shi Shei

unread,
Apr 2, 2012, 4:40:55 AM4/2/12
to mongodb-user
I'm running 4 shards, each consiting of 3 nodes, on Linux 64 Bit
v2.0.2.

I observed that some (very few) queries are extremely slow, even when
they use only indexed fields.
Read queries are almost always executed in slaveOk mode.

During the weekend I profiled all nodes by storing queries which took
more than 3 seconds.
Nearly 7000 queries were filtered during this time. Given the total
number of queries, this seems not too much to me but I'm concerned
about the slowest queries which took several minutes up to an hour!

Here is the slowest query:
{
"_id" : ObjectId("4f795741559a0435df3b2cc0"),
"ts" : "2012-04-02T08:11:37Z",
"op" : "query",
"ns" : "offerStore.offer",
"query" : {
"shopId" : 222377,
"onlineProductIds" : {
"$ne" : null
},
"smallPicture" : {
"$ne" : null
},
"bokey" : {
"$ne" : null
},
"_id" : {
"$gt" : 0
},
"lastShopDataChange" : {
"$gt" : "2012-03-02T00:00:00Z"
}
},
"ntoreturn" : 2000,
"nscanned" : 107068,
"nreturned" : 152,
"responseLength" : 30897,
"millis" : 3571537,
"client" : "#.#.#.#",
"user" : ""
}

How can a query that slow (nearly 1 hour!)? A query that returns only
152 documents by scanning a total of only 107068 documents.

Another type of query, which executed very slowly, was this one:
{
"_id" : ObjectId("4f795741559a0435df3b2cb6"),
"ts" : "2012-04-02T04:56:45Z",
"op" : "getmore",
"ns" : "offerStore.offer",
"query" : {
"shopId" : 83769
},
"cursorid" : NumberLong("625703795491758659"),
"nreturned" : 12803,
"responseLength" : 4194466,
"millis" : 864696,
"client" : "#.#.#.#",
"user" : ""
}

The field "shopId" is an indexed field.


Here are the stats of my database:

db.stats()
{
"raw" : {
"offerStoreDE2/s197:27018,s209:27018,s198:27018" : {
"db" : "offerStore",
"collections" : 5,
"objects" : 29862952,
"avgObjSize" : 2137.5809535507406,
"dataSize" : 63834477412,
"storageSize" : 114947460352,
"numExtents" : 88,
"indexes" : 9,
"indexSize" : 11303606160,
"fileSize" : 130866479104,
"nsSizeMB" : 16,
"ok" : 1
},
"offerStoreDE3/s117:27018,s129:27018,s118:27018" : {
"db" : "offerStore",
"collections" : 6,
"objects" : 20098901,
"avgObjSize" : 2126.199923866484,
"dataSize" : 42734281776,
"storageSize" : 69905829456,
"numExtents" : 75,
"indexes" : 10,
"indexSize" : 6773039280,
"fileSize" : 81498472448,
"nsSizeMB" : 16,
"ok" : 1
},
"offerStoreDE4/s115:27018,s132:27018,s125:27018" : {
"db" : "offerStore",
"collections" : 6,
"objects" : 23699669,
"avgObjSize" : 1902.8760452308427,
"dataSize" : 45097532420,
"storageSize" : 59188194976,
"numExtents" : 71,
"indexes" : 10,
"indexSize" : 7836336256,
"fileSize" : 70766297088,
"nsSizeMB" : 16,
"ok" : 1
},
"offerstore/s142:27018,s144:27018,s143:27018" : {
"db" : "offerStore",
"collections" : 8,
"objects" : 18242133,
"avgObjSize" : 1987.735232277936,
"dataSize" : 36260530476,
"storageSize" : 48505167600,
"numExtents" : 75,
"indexes" : 14,
"indexSize" : 6151573344,
"fileSize" : 57887686656,
"nsSizeMB" : 16,
"ok" : 1
}
},
"objects" : 91903655,
"avgObjSize" : 2044.8242464785542,
"dataSize" : 187926822084,
"storageSize" : 292546652384,
"numExtents" : 309,
"indexes" : 43,
"indexSize" : 32064555040,
"fileSize" : 341018935296,
"ok" : 1
}

How to proceed to find the bottleneck and make my queries quicker?
Thank you!

Shi Shei

unread,
Apr 2, 2012, 4:57:31 AM4/2/12
to mongodb-user
Btw. every node is a physical machine (not virtual) equipped with 96
GB RAM.

Eliot Horowitz

unread,
Apr 2, 2012, 9:15:18 AM4/2/12
to mongod...@googlegroups.com
- can you send an explain for a slow query
- have you looked at hardware stats: cpu, disk, etc...

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Shi Shei

unread,
Apr 2, 2012, 10:22:57 AM4/2/12
to mongodb-user
Here comes explain for the slow query:

mongos> db.offer.find({"shopId" : 83769}).explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"offerStoreDE2/s197:27018,s209:27018,s198:27018" : [
{
"cursor" : "BtreeCursor shopId",
"nscanned" : 41327,
"nscannedObjects" : 41327,
"n" : 41327,
"millis" : 45530,
"nYields" : 652,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"shopId" : [
[
83769,
83769
]
]
}
}
],
"offerStoreDE3/s117:27018,s129:27018,s118:27018" : [
{
"cursor" : "BtreeCursor shopId",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"shopId" : [
[
83769,
83769
]
]
}
}
],
"offerStoreDE4/s115:27018,s132:27018,s125:27018" : [
{
"cursor" : "BtreeCursor shopId",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"shopId" : [
[
83769,
83769
]
]
}
}
],
"offerstore/s143:27018,s126:27018,s131:27018" : [
{
"cursor" : "BtreeCursor shopId",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"shopId" : [
[
83769,
83769
]
]
}
}
]
},
"n" : 41327,
"nChunkSkips" : 0,
"nYields" : 652,
"nscanned" : 41327,
"nscannedObjects" : 41327,
"millisTotal" : 45530,
"millisAvg" : 11382,
"numQueries" : 4,
"numShards" : 4
}

I wonder about "millisTotal" of 45530 because the query run much
longer than 45 seconds.

Btw. after some minutes query time, I executed db.currentOp() and saw
my query running already since 453 seconds:

{
"opid" : "offerStoreDE2:261800283",
"active" : true,
"lockType" : "read",
"waitingForLock" : true,
"secs_running" : 453,
"op" : "query",
"ns" : "offerStore.offer",
"query" : {
"query" : {
"shopId" : 83769
},
"$explain" : true
},
"client_s" : "#.#.#.#:37161",
"desc" : "conn",
"threadId" : "0x7f6ff55ec950",
"connectionId" : 106306,
"numYields" : 2004
},

During the query, I also monitored the shards using mongostats for a
few minutes.
Here is the snippet:
http://pastebin.com/raw.php?i=peirBN0E

Can you see something wrong with it?

Shi Shei

unread,
Apr 4, 2012, 6:10:54 AM4/4/12
to mongodb-user
anyone?

Adam C

unread,
Apr 4, 2012, 9:41:56 AM4/4/12
to mongod...@googlegroups.com
Well, offerStoreDE2 is the shard that you are hitting for that query and that shard seems to be the busiest in the mongostat in general.  If I were to guess, I would suspect that you have a hotspot in one of your shard keys and that offerStoreDE2 is being overloaded.  That primary has much higher queues, locking etc. that the others.

Run a similar query that hits the other shards and I'll bet it returns nice and fast with limited yielding.

Adam
> > > To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.

Shi Shei

unread,
Apr 4, 2012, 1:05:13 PM4/4/12
to mongodb-user
Thank you Adam!
We restarted all nodes and the same query executes now in only a few
miliseconds:

mongos> db.offer.find({shopId:83769}).explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"offerStoreDE2/s197:27018,s209:27018,s198:27018" : [
{
"cursor" : "BtreeCursor shopId",
"nscanned" : 19829,
"nscannedObjects" : 19829,
"n" : 19829,
"millis" : 47,
"nYields" : 3,
"nscanned" : 13252,
"nscannedObjects" : 13252,
"n" : 13252,
"millis" : 57,
"nYields" : 10,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"shopId" : [
[
83769,
83769
]
]
}
}
],
"offerstore/s143:27018,s126:27018,s131:27018" : [
{
"cursor" : "BtreeCursor shopId",
"nscanned" : 9445,
"nscannedObjects" : 9445,
"n" : 9445,
"millis" : 34,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"shopId" : [
[
83769,
83769
]
]
}
}
]
},
"n" : 42526,
"nChunkSkips" : 0,
"nYields" : 13,
"nscanned" : 42526,
"nscannedObjects" : 42526,
"millisTotal" : 138,
"millisAvg" : 34,
"numQueries" : 4,
"numShards" : 4
}

Also, the number of slow queries has decreased since the restart.
The longest queries take now less than 5 seconds.

I do have some questions though:

1) How do you interprete "millisTotal:45530" of the explain command
eventhough the query run much
longer than 45 seconds? Even db.currentOp() showed that the query was
running already since 453 seconds.

2) Eliot suggested to profile disk. Do you suggest to run "iostat -xm
2" on every node (12 machines) while executing the query? Or is
mongostat appropriate?

Thank you!
> ...
>
> read more »

Brandon Diamond

unread,
Apr 18, 2012, 5:13:21 PM4/18/12
to mongod...@googlegroups.com
Hi Shi Shei,

The millisTotal field indicates the total number of milliseconds that a given query was executing on a server. I'm not certain why the actual time is different from the reported time; is there a bottleneck elsewhere in the system perhaps?

More details on "explain" fields:

Eliot's suggestion is probably worth trying out -- it'll help you identify if there is a disk bottleneck issue affecting performance of your query. If any numbers look anomalous, they'd be worth investigating.

Thanks,
 - Brandon
Reply all
Reply to author
Forward
0 new messages