Re: Possible bug with Compound index containing Dates

136 views
Skip to first unread message

keith kirk

unread,
Sep 18, 2012, 8:11:44 PM9/18/12
to mongod...@googlegroups.com
This is on 2.2.0 - disregard the " 08 is not a legal ECMA-262 octal constant (shell):1 " error, I have a bad habit of writing months with a leading zero and the console complains.  Tried the query with out that leading zero with no change in results.

Jenna deBoisblanc

unread,
Sep 19, 2012, 1:37:16 PM9/19/12
to mongod...@googlegroups.com
Hi Keith,

I'm not sure if you noticed the difference in the queries, so I wanted to confirm before investigating further- the second query uses a different index, "BtreeCursor site.id_1_lrt_-1," rather than, "BtreeCursor site.id_1_frt_1." Since "lrt" doesn't appear in the second query, it follows that the bounds on "lrt" are unlimited (i.e. $minElement to $maxElement).

Could you run the second query again and use hint() to force mongo to use the index on site.id and frt? e.g.:
> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 08, 18 ) }, "ue": false, "bot": false }).hint("site.id_1_frt_1").explain()


On Tuesday, September 18, 2012 6:22:55 PM UTC-4, keith kirk wrote:
Ran into some issues with an aggregation query returning very slowly and decided to run the $match through an explain - ended up with some very weird results.
I'm running this query on a sharded collection, checking for a specific site and first request time (by index / shard key) and filter the results by a couple boolean flags.  However, different site.id's causes query optimizer to decide to only use the first part of my compound index and wildcards the date, causing it to scan far more documents ( and shards ) than necessary.

If I supply a hint and specify my index, it works as expected, or if I remove the boolean fields - it again works as expected.  It seems isolated to searching on those boolean flags, if I remove those and instead query another field with a string, it again works.

Any ideas here?  I don't believe you can pass a .hint() to an aggregation query...


First query ( expected results ):
mongos> db.visit.find({ "site.id": 253, "frt": { $gte: new Date(2012, 08, 18 ) }, "test": false, "bot": false }).explain()
Tue Sep 18 15:05:56 08 is not a legal ECMA-262 octal constant (shell):1
{
"clusteredType" : "ParallelSort",
"shards" : {
"dynotrax1/s1.dynotrax:2100,s2.dynotrax:2100,s3.dynotrax:2100" : [
{
"cursor" : "BtreeCursor site.id_1_frt_1",
"isMultiKey" : false,
"n" : 31429,
"nscannedObjects" : 32230,
"nscanned" : 32230,
"nscannedObjectsAllPlans" : 32432,
"nscannedAllPlans" : 32432,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 81,
"nChunkSkips" : 0,
"millis" : 555,
"indexBounds" : {
"site.id" : [
[
253,
253
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"server" : "ip-10-6-50-253:2100"
}
]
},
"cursor" : "BtreeCursor site.id_1_frt_1",
"n" : 31429,
"nChunkSkips" : 0,
"nYields" : 81,
"nscanned" : 32230,
"nscannedAllPlans" : 32432,
"nscannedObjects" : 32230,
"nscannedObjectsAllPlans" : 32432,
"millisShardTotal" : 555,
"millisShardAvg" : 555,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"site.id" : [
[
253,
253
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"millis" : 557
}

This second query confuses me, changing the site.id value causes Mongo to only use the first part of my compound index and wildcards the Date field:
mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 08, 18 ) }, "ue": false, "bot": false }).explain()
Tue Sep 18 15:06:45 08 is not a legal ECMA-262 octal constant (shell):1
{
"clusteredType" : "ParallelSort",
"shards" : {
"dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
{
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"isMultiKey" : false,
"n" : 312,
"nscannedObjects" : 48710,
"nscanned" : 48710,
"nscannedObjectsAllPlans" : 49267,
"nscannedAllPlans" : 49267,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 433,
"nChunkSkips" : 0,
"millis" : 5152,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "ip-10-4-211-107:2200"
}
]
},
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"n" : 312,
"nChunkSkips" : 0,
"nYields" : 433,
"nscanned" : 48710,
"nscannedAllPlans" : 49267,
"nscannedObjects" : 48710,
"nscannedObjectsAllPlans" : 49267,
"millisShardTotal" : 5152,
"millisShardAvg" : 5152,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"millis" : 5208
}

Last query, searching by the same site.id and filtering by another field, non-boolean, it works:
mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 08, 18 ) }, "tv": 1 }).explain()
Tue Sep 18 15:18:15 08 is not a legal ECMA-262 octal constant (shell):1
{
"clusteredType" : "ParallelSort",
"shards" : {
"dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
{
"cursor" : "BtreeCursor site.id_1_frt_1",
"isMultiKey" : false,
"n" : 750,
"nscannedObjects" : 787,
"nscanned" : 787,
"nscannedObjectsAllPlans" : 989,
"nscannedAllPlans" : 989,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 15,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"server" : "ip-10-4-211-107:2200"
}
]
},
"cursor" : "BtreeCursor site.id_1_frt_1",
"n" : 750,
"nChunkSkips" : 0,
"nYields" : 0,
"nscanned" : 787,
"nscannedAllPlans" : 989,
"nscannedObjects" : 787,
"nscannedObjectsAllPlans" : 989,
"millisShardTotal" : 15,
"millisShardAvg" : 15,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"millis" : 17
}

visit collection indexes: 
mongos> db.visit.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "dynotrax.visit",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"site.id" : 1,
"frt" : 1
},
"ns" : "dynotrax.visit",
"name" : "site.id_1_frt_1"
},
{
"v" : 1,
"key" : {
"site.id" : 1,
"lrt" : -1
},
"ns" : "dynotrax.visit",
"name" : "site.id_1_lrt_-1"
}
]

keith kirk

unread,
Sep 19, 2012, 2:16:02 PM9/19/12
to mongod...@googlegroups.com
Hey Jenna

I noticed the difference in the index after posting this - so the question would be, what causes Mongo to decide to use the 'last request time' index when that field is not provided, over the 'first request time', which is provided?  Is there a way to force an index in a $match of an aggregation query?

When supplying a hint, the query returns as expected, here are the two again, with and without the hint:

mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 8, 18 ) }, "ue": false, "bot": false }).hint("site.id_1_frt_1").explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
{
"cursor" : "BtreeCursor site.id_1_frt_1",
"isMultiKey" : false,
"n" : 691,
"nscannedObjects" : 1788,
"nscanned" : 1788,
"nscannedObjectsAllPlans" : 1788,
"nscannedAllPlans" : 1788,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 24,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"server" : "ip-10-4-211-107:2200"
}
]
},
"cursor" : "BtreeCursor site.id_1_frt_1",
"n" : 691,
"nChunkSkips" : 0,
"nYields" : 0,
"nscanned" : 1788,
"nscannedAllPlans" : 1788,
"nscannedObjects" : 1788,
"nscannedObjectsAllPlans" : 1788,
"millisShardTotal" : 24,
"millisShardAvg" : 24,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"millis" : 25
}

mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 8, 18 ) }, "ue": false, "bot": false }).explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
{
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"isMultiKey" : false,
"n" : 691,
"nscannedObjects" : 49719,
"nscanned" : 49719,
"nscannedObjectsAllPlans" : 50116,
"nscannedAllPlans" : 50116,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 5,
"nChunkSkips" : 0,
"millis" : 486,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "ip-10-4-211-107:2200"
}
]
},
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"n" : 691,
"nChunkSkips" : 0,
"nYields" : 5,
"nscanned" : 49719,
"nscannedAllPlans" : 50116,
"nscannedObjects" : 49719,
"nscannedObjectsAllPlans" : 50116,
"millisShardTotal" : 486,
"millisShardAvg" : 486,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"millis" : 488
}

keith kirk

unread,
Sep 20, 2012, 3:35:37 PM9/20/12
to mongod...@googlegroups.com
Interesting that different shards are using different query plans - what am I missing here?

shard1:PRIMARY> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 8, 18 ) }, "ue": false, "bot": false }).explain()
{
"cursor" : "BtreeCursor site.id_1_frt_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 47,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"frt" : [
[
ISODate("2012-09-18T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"server" : "ip-10-6-50-253:2100"
}
 
shard2:PRIMARY> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 8, 18 ) }, "ue": false, "bot": false }).explain()
{
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"isMultiKey" : false,
"n" : 3907,
"nscannedObjects" : 53761,
"nscanned" : 53761,
"nscannedObjectsAllPlans" : 54022,
"nscannedAllPlans" : 54022,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 453,
"nChunkSkips" : 0,
"millis" : 4368,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "ip-10-4-211-107:2200"
}

Jenna deBoisblanc

unread,
Oct 11, 2012, 4:16:22 PM10/11/12
to mongod...@googlegroups.com
Hi Keith,

Could you run the same queries with .explain(true) instead of just .explain()? It's possible that the query optimizer is doing a bad job of picking a plan based on the initial data that it encounters.

keith kirk

unread,
Oct 12, 2012, 1:02:12 PM10/12/12
to mongod...@googlegroups.com
Here is the .explain(true) - any thoughts?

mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 9, 11 ) }, "ue": false, "bot": false }).explain(true)
{
"clusteredType" : "ParallelSort",
"shards" : {
"dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
{
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"isMultiKey" : false,
"n" : 556,
"nscannedObjects" : 80146,
"nscanned" : 80146,
"nscannedObjectsAllPlans" : 80487,
"nscannedAllPlans" : 80487,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2244,
"nChunkSkips" : 0,
"millis" : 47873,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor site.id_1_frt_1",
"n" : 25,
"nscannedObjects" : 171,
"nscanned" : 171,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"frt" : [
[
ISODate("2012-10-11T07:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
}
},
{
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"n" : 531,
"nscannedObjects" : 80146,
"nscanned" : 80146,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
},
{
"cursor" : "BasicCursor",
"n" : 0,
"nscannedObjects" : 170,
"nscanned" : 170,
"indexBounds" : {
}
}
],
"server" : "ip-10-4-211-107:2200"
}
]
},
"cursor" : "BtreeCursor site.id_1_lrt_-1",
"n" : 556,
"nChunkSkips" : 0,
"nYields" : 2244,
"nscanned" : 80146,
"nscannedAllPlans" : 80487,
"nscannedObjects" : 80146,
"nscannedObjectsAllPlans" : 80487,
"millisShardTotal" : 47873,
"millisShardAvg" : 47873,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"site.id" : [
[
128,
128
]
],
"lrt" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"millis" : 47875

Jenna deBoisblanc

unread,
Oct 31, 2012, 6:15:33 PM10/31/12
to mongodb-user
In this case, it appears that the query optimizer is failing to
determine the best index choice. The query optimizer works by
examining several plans to find an initial subset of the results. Then
the "best" query plan for that initial subset is used to retrieve the
remaining results of the query.

The provided explain output shows that the site.id_1_frt_1 query plan
had 25 matches and the BasicCursor query plan had 0 matches before the
"winning" query plan site.id_1_lrt_-1 was selected. That winning plan
would have had 101 matches at the time it was selected (a plan wins
once it finds 101 matches). Then after the plan was selected as the
"winner" the query continued running with this plan until eventually
531 total results were found.

This basically means that the query optimizer picked the
site.id_1_lrt_-1 query plan because it had more frequent matches
during the beginning of its index scan than the other query plans did.
Even though the site.id_1_frt_1 is a better index to use for the
entire query, the query optimizer is not smart enough to realize this
and goes with site.id_1_lrt_-1 instead. You will have to use hint() if
site.id_1_frt_1 significantly improves query performance.
> >> *First query ( expected results ):*
> >> *This second query confuses me, changing the site.id value causes Mongo
> >> to only use the first part of my compound index and wildcards the Date
> >> field:*
> >> *Last query, searching by the same site.id and filtering by another
> >> field, non-boolean, it works:*
> >> *visit collection indexes: *

keith kirk

unread,
Nov 13, 2012, 7:03:07 PM11/13/12
to mongod...@googlegroups.com
Thanks for the thorough reply.  Any suggestion in cases of Aggregation queries, where initial matches can use index, but there is no way to supply a hint?

Adam C

unread,
Feb 9, 2013, 4:16:43 PM2/9/13
to mongod...@googlegroups.com
Realize this has been a while, but in the interest of tying up an old thread.  At the moment there is no way to hint in the aggregation command, hence this open feature request (please vote/watch appropriately)


Adam
Reply all
Reply to author
Forward
0 new messages