Message from discussion
Possible bug with Compound index containing Dates
Received: by 10.236.191.68 with SMTP id f44mr27115433yhn.4.1351721745007;
Wed, 31 Oct 2012 15:15:45 -0700 (PDT)
X-BeenThere: mongodb-user@googlegroups.com
Received: by 10.236.120.244 with SMTP id p80ls3151035yhh.0.gmail; Wed, 31 Oct
2012 15:15:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.236.73.230 with SMTP id v66mr2878295yhd.9.1351721733701; Wed,
31 Oct 2012 15:15:33 -0700 (PDT)
Authentication-Results: ls.google.com; spf=pass (google.com: domain of
jenna.deboisbl...@10gen.com designates internal as permitted sender)
smtp.mail=jenna.deboisbl...@10gen.com; dkim=pass
header...@10gen.com
Received: by m4g2000yqf.googlegroups.com with HTTP; Wed, 31 Oct 2012 15:15:33
-0700 (PDT)
Date: Wed, 31 Oct 2012 15:15:33 -0700 (PDT)
In-Reply-To: <112cf5f3-f1c7-4fc2-a7f1-46492b7bd36f@googlegroups.com>
References: <bf090dcf-5ed9-4053-9aa8-0858781acf0f@googlegroups.com>
<d6c5f5d3-da26-46c3-9fc2-7549301670e3@googlegroups.com> <112cf5f3-f1c7-4fc2-a7f1-46492b7bd36f@googlegroups.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)
AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4,gzip(gfe)
Message-ID: <fce29b24-7f98-4f1e-804a-a384c1500120@m4g2000yqf.googlegroups.com>
Subject: Re: Possible bug with Compound index containing Dates
From: Jenna deBoisblanc <jenna.deboisbl...@10gen.com>
To: mongodb-user <mongodb-user@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
In this case, it appears that the query optimizer is failing to
determine the best index choice. The query optimizer works by
examining several plans to find an initial subset of the results. Then
the "best" query plan for that initial subset is used to retrieve the
remaining results of the query.
The provided explain output shows that the site.id_1_frt_1 query plan
had 25 matches and the BasicCursor query plan had 0 matches before the
"winning" query plan site.id_1_lrt_-1 was selected. That winning plan
would have had 101 matches at the time it was selected (a plan wins
once it finds 101 matches). Then after the plan was selected as the
"winner" the query continued running with this plan until eventually
531 total results were found.
This basically means that the query optimizer picked the
site.id_1_lrt_-1 query plan because it had more frequent matches
during the beginning of its index scan than the other query plans did.
Even though the site.id_1_frt_1 is a better index to use for the
entire query, the query optimizer is not smart enough to realize this
and goes with site.id_1_lrt_-1 instead. You will have to use hint() if
site.id_1_frt_1 significantly improves query performance.
On Oct 12, 1:02=A0pm, keith kirk <kmf...@gmail.com> wrote:
> Here is the .explain(true) - any thoughts?
>
> mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012, 9, =
11
> ) }, "ue": false, "bot": false }).explain(true)
> {
> "clusteredType" : "ParallelSort",
> "shards" : {
> "dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
> {
> "cursor" : "BtreeCursor site.id_1_lrt_-1",
> "isMultiKey" : false,
> "n" : 556,
> "nscannedObjects" : 80146,
> "nscanned" : 80146,
> "nscannedObjectsAllPlans" : 80487,
> "nscannedAllPlans" : 80487,
> "scanAndOrder" : false,
> "indexOnly" : false,
> "nYields" : 2244,
> "nChunkSkips" : 0,
> "millis" : 47873,
> "indexBounds" : {
> "site.id" : [
> [
> 128,
> 128
> ]
> ],
> "lrt" : [
> [
> {
> "$maxElement" : 1},
>
> {
> "$minElement" : 1}
>
> ]
> ]},
>
> "allPlans" : [
> {
> "cursor" : "BtreeCursor site.id_1_frt_1",
> "n" : 25,
> "nscannedObjects" : 171,
> "nscanned" : 171,
> "indexBounds" : {
> "site.id" : [
> [
> 128,
> 128
> ]
> ],
> "frt" : [
> [
> ISODate("2012-10-11T07:00:00Z"),
> ISODate("292278995-01--2147483647T07:12:56.808Z")
> ]
> ]}
> },
>
> {
> "cursor" : "BtreeCursor site.id_1_lrt_-1",
> "n" : 531,
> "nscannedObjects" : 80146,
> "nscanned" : 80146,
> "indexBounds" : {
> "site.id" : [
> [
> 128,
> 128
> ]
> ],
> "lrt" : [
> [
> {
> "$maxElement" : 1},
>
> {
> "$minElement" : 1}
>
> ]
> ]}
> },
>
> {
> "cursor" : "BasicCursor",
> "n" : 0,
> "nscannedObjects" : 170,
> "nscanned" : 170,
> "indexBounds" : {
> =A0}}
>
> ],
> "server" : "ip-10-4-211-107:2200"}
> ]
> },
>
> "cursor" : "BtreeCursor site.id_1_lrt_-1",
> "n" : 556,
> "nChunkSkips" : 0,
> "nYields" : 2244,
> "nscanned" : 80146,
> "nscannedAllPlans" : 80487,
> "nscannedObjects" : 80146,
> "nscannedObjectsAllPlans" : 80487,
> "millisShardTotal" : 47873,
> "millisShardAvg" : 47873,
> "numQueries" : 1,
> "numShards" : 1,
> "indexBounds" : {
> "site.id" : [
> [
> 128,
> 128
> ]
> ],
> "lrt" : [
> [
> {
> "$maxElement" : 1},
>
> {
> "$minElement" : 1}
>
> ]
> ]
>
>
>
>
>
>
>
> },
> "millis" : 47875
> }
> On Thursday, October 11, 2012 1:16:22 PM UTC-7, Jenna deBoisblanc wrote:
>
> > Hi Keith,
>
> > Could you run the same queries with .explain(true) instead of just
> > .explain()? It's possible that the query optimizer is doing a bad job o=
f
> > picking a plan based on the initial data that it encounters.
>
> > On Tuesday, September 18, 2012 6:22:55 PM UTC-4, keith kirk wrote:
>
> >> Ran into some issues with an aggregation query returning very slowly a=
nd
> >> decided to run the $match through an explain - ended up with some very
> >> weird results.
> >> I'm running this query on a sharded collection, checking for a specifi=
c
> >> site and first request time (by index / shard key) and filter the resu=
lts
> >> by a couple boolean flags. =A0However, different site.id's causes quer=
y
> >> optimizer to decide to only use the first part of my compound index an=
d
> >> wildcards the date, causing it to scan far more documents ( and shards=
)
> >> than necessary.
>
> >> If I supply a hint and specify my index, it works as expected, or if I
> >> remove the boolean fields - it again works as expected. =A0It seems is=
olated
> >> to searching on those boolean flags, if I remove those and instead que=
ry
> >> another field with a string, it again works.
>
> >> Any ideas here? =A0I don't believe you can pass a .hint() to an aggreg=
ation
> >> query...
>
> >> *First query ( expected results ):*
> >> mongos> db.visit.find({ "site.id": 253, "frt": { $gte: new Date(2012,
> >> 08, 18 ) }, "test": false, "bot": false }).explain()
> >> Tue Sep 18 15:05:56 08 is not a legal ECMA-262 octal constant (shell):=
1
> >> {
> >> "clusteredType" : "ParallelSort",
> >> "shards" : {
> >> "dynotrax1/s1.dynotrax:2100,s2.dynotrax:2100,s3.dynotrax:2100" : [
> >> {
> >> "cursor" : "BtreeCursor site.id_1_frt_1",
> >> "isMultiKey" : false,
> >> "n" : 31429,
> >> "nscannedObjects" : 32230,
> >> "nscanned" : 32230,
> >> "nscannedObjectsAllPlans" : 32432,
> >> "nscannedAllPlans" : 32432,
> >> "scanAndOrder" : false,
> >> "indexOnly" : false,
> >> "nYields" : 81,
> >> "nChunkSkips" : 0,
> >> "millis" : 555,
> >> "indexBounds" : {
> >> "site.id" : [
> >> [
> >> 253,
> >> 253
> >> ]
> >> ],
> >> "frt" : [
> >> [
> >> ISODate("2012-09-18T07:00:00Z"),
> >> ISODate("292278995-01--2147483647T07:12:56.808Z")
> >> ]
> >> ]
> >> },
> >> "server" : "ip-10-6-50-253:2100"
> >> }
> >> ]
> >> },
> >> "cursor" : "BtreeCursor site.id_1_frt_1",
> >> "n" : 31429,
> >> "nChunkSkips" : 0,
> >> "nYields" : 81,
> >> "nscanned" : 32230,
> >> "nscannedAllPlans" : 32432,
> >> "nscannedObjects" : 32230,
> >> "nscannedObjectsAllPlans" : 32432,
> >> "millisShardTotal" : 555,
> >> "millisShardAvg" : 555,
> >> "numQueries" : 1,
> >> "numShards" : 1,
> >> "indexBounds" : {
> >> "site.id" : [
> >> [
> >> 253,
> >> 253
> >> ]
> >> ],
> >> "frt" : [
> >> [
> >> ISODate("2012-09-18T07:00:00Z"),
> >> ISODate("292278995-01--2147483647T07:12:56.808Z")
> >> ]
> >> ]
> >> },
> >> "millis" : 557
> >> }
>
> >> *This second query confuses me, changing the site.id value causes Mong=
o
> >> to only use the first part of my compound index and wildcards the Date
> >> field:*
> >> mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012,
> >> 08, 18 ) }, "ue": false, "bot": false }).explain()
> >> Tue Sep 18 15:06:45 08 is not a legal ECMA-262 octal constant (shell):=
1
> >> {
> >> "clusteredType" : "ParallelSort",
> >> "shards" : {
> >> "dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
> >> {
> >> "cursor" : "BtreeCursor site.id_1_lrt_-1",
> >> "isMultiKey" : false,
> >> "n" : 312,
> >> "nscannedObjects" : 48710,
> >> "nscanned" : 48710,
> >> "nscannedObjectsAllPlans" : 49267,
> >> "nscannedAllPlans" : 49267,
> >> "scanAndOrder" : false,
> >> "indexOnly" : false,
> >> "nYields" : 433,
> >> "nChunkSkips" : 0,
> >> "millis" : 5152,
> >> "indexBounds" : {
> >> "site.id" : [
> >> [
> >> 128,
> >> 128
> >> ]
> >> ],
> >> "lrt" : [
> >> [
> >> {
> >> "$maxElement" : 1
> >> },
> >> {
> >> "$minElement" : 1
> >> }
> >> ]
> >> ]
> >> },
> >> "server" : "ip-10-4-211-107:2200"
> >> }
> >> ]
> >> },
> >> "cursor" : "BtreeCursor site.id_1_lrt_-1",
> >> "n" : 312,
> >> "nChunkSkips" : 0,
> >> "nYields" : 433,
> >> "nscanned" : 48710,
> >> "nscannedAllPlans" : 49267,
> >> "nscannedObjects" : 48710,
> >> "nscannedObjectsAllPlans" : 49267,
> >> "millisShardTotal" : 5152,
> >> "millisShardAvg" : 5152,
> >> "numQueries" : 1,
> >> "numShards" : 1,
> >> "indexBounds" : {
> >> "site.id" : [
> >> [
> >> 128,
> >> 128
> >> ]
> >> ],
> >> "lrt" : [
> >> [
> >> {
> >> "$maxElement" : 1
> >> },
> >> {
> >> "$minElement" : 1
> >> }
> >> ]
> >> ]
> >> },
> >> "millis" : 5208
> >> }
>
> >> *Last query, searching by the same site.id and filtering by another
> >> field, non-boolean, it works:*
> >> mongos> db.visit.find({ "site.id": 128, "frt": { $gte: new Date(2012,
> >> 08, 18 ) }, "tv": 1 }).explain()
> >> Tue Sep 18 15:18:15 08 is not a legal ECMA-262 octal constant (shell):=
1
> >> {
> >> "clusteredType" : "ParallelSort",
> >> "shards" : {
> >> "dynotrax2/s1.dynotrax:2200,s2.dynotrax:2200,s3.dynotrax:2200" : [
> >> {
> >> "cursor" : "BtreeCursor site.id_1_frt_1",
> >> "isMultiKey" : false,
> >> "n" : 750,
> >> "nscannedObjects" : 787,
> >> "nscanned" : 787,
> >> "nscannedObjectsAllPlans" : 989,
> >> "nscannedAllPlans" : 989,
> >> "scanAndOrder" : false,
> >> "indexOnly" : false,
> >> "nYields" : 0,
> >> "nChunkSkips" : 0,
> >> "millis" : 15,
> >> "indexBounds" : {
> >> "site.id" : [
> >> [
> >> 128,
> >> 128
> >> ]
> >> ],
> >> "frt" : [
> >> [
> >> ISODate("2012-09-18T07:00:00Z"),
> >> ISODate("292278995-01--2147483647T07:12:56.808Z")
> >> ]
> >> ]
> >> },
> >> "server" : "ip-10-4-211-107:2200"
> >> }
> >> ]
> >> },
> >> "cursor" : "BtreeCursor site.id_1_frt_1",
> >> "n" : 750,
> >> "nChunkSkips" : 0,
> >> "nYields" : 0,
> >> "nscanned" : 787,
> >> "nscannedAllPlans" : 989,
> >> "nscannedObjects" : 787,
> >> "nscannedObjectsAllPlans" : 989,
> >> "millisShardTotal" : 15,
> >> "millisShardAvg" : 15,
> >> "numQueries" : 1,
> >> "numShards" : 1,
> >> "indexBounds" : {
> >> "site.id" : [
> >> [
> >> 128,
> >> 128
> >> ]
> >> ],
> >> "frt" : [
> >> [
> >> ISODate("2012-09-18T07:00:00Z"),
> >> ISODate("292278995-01--2147483647T07:12:56.808Z")
> >> ]
> >> ]
> >> },
> >> "millis" : 17
> >> }
>
> >> *visit collection indexes: *
> >> mongos> db.visit.getIndexes()
> >> [
> >> {
> >> "v" : 1,
> >> "key" : {
> >> "_id" : 1
> >> },
> >> "ns" : "dynotrax.visit",
> >> "name" : "_id_"
> >> },
> >> {
> >> "v" : 1,
> >> "key" : {
> >> "site.id" : 1,
> >> "frt" : 1
> >> },
> >> "ns" : "dynotrax.visit",
> >> "name" : "site.id_1_frt_1"
> >> },
> >> {
> >> "v" : 1,
> >> "key" : {
> >> "site.id" : 1,
> >> "lrt" : -1
> >> },
> >> "ns" : "dynotrax.visit",
> >> "name" : "site.id_1_lrt_-1"
> >> }
> >> ]