Index intersection

208 views
Skip to first unread message

damir

unread,
Apr 13, 2014, 1:41:08 PM4/13/14
to mongod...@googlegroups.com
Hi,
has anyone tried Index Intersection in Mongo 2.6.0?
It doesn't seem to be working for me:

The Test collection:

>for (a=0; a<100; a++) { 
     for (b=0; b<100; b++) { 
         for (c=0; c<100; c++) { 
             db.abc.insert({"a":a, "b":b, "c":c})
  }}}

> db.abc.count()
1000000


Test indexes

> db.abc.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.abc"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "test.abc"
},
{
"v" : 1,
"key" : {
"b" : 1
},
"name" : "b_1",
"ns" : "test.abc"
},
{
"v" : 1,
"key" : {
"c" : 1
},
"name" : "c_1",
"ns" : "test.abc"
}
]

The Test query explain:

> db.abc.find({a:46, b:23, c:34}).explain()
{
"cursor" : "BtreeCursor a_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"nscannedObjectsAllPlans" : 30000,
"nscannedAllPlans" : 60009,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 468,
"nChunkSkips" : 0,
"millis" : 53,
"indexBounds" : {
"a" : [
[
46,
46
]
]
},
        "server" : "boountu:27017",
"filterSet" : false
}

So, the explain shows that only one index was used - on key "a".

Any ideas why the Index Intersection was not used for this query?

Regards, Damir

David Hows

unread,
Apr 13, 2014, 9:30:40 PM4/13/14
to mongod...@googlegroups.com
Hi Damir,

I've tested this locally and it seems the query planner will evaluate plans that involve multiple indexes, but not select them in this case.

I've loaded your example into 2.6 and run with explain(true), this provides the following:
 {
 
"cursor" : "Complex Plan",
 
"n" : 1,
 
"nscannedObjects" : 0,
 
"nscanned" : 10003,
 
"nChunkSkips" : 0
 
},
 
{
 
"cursor" : "Complex Plan",
 
"n" : 1,
 
"nscannedObjects" : 0,
 
"nscanned" : 10003,
 
"nChunkSkips" : 0
 
},
 
{
 
"cursor" : "Complex Plan",
 
"n" : 1,
 
"nscannedObjects" : 0,
 
"nscanned" : 10003,
 
"nChunkSkips" : 0
 
}
 
],

This shows that the plans with intersection were tested, but not found to be the best for executing this query. Given the sizes of the documents and the indexes, there is likely more overhead in performing the index intersection than simply walking all the documents that match the given "a" value.



Felipe Albrecht

unread,
Apr 14, 2014, 6:19:30 AM4/14/14
to mongod...@googlegroups.com
How can I check if mongodb is doing index intersection? Which new type of information is written in the .explain() output? Is it possible to give a .hint() for 2 indexes?


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b840891e-3683-4c93-ad4a-0c5e73bee1bf%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

damir

unread,
Apr 14, 2014, 10:04:32 AM4/14/14
to mongod...@googlegroups.com
Hi David,

you are right, I have also run explain(true) and got three "Complex Plans", which means the index intersections was tested.
But I am still confused - between scanning 10.000 rows and doing three index scans, I'd always go for the latter...

Regards, Damir

Matthieu Rigal

unread,
Apr 14, 2014, 12:40:43 PM4/14/14
to mongod...@googlegroups.com
Hi David,

I am experimenting exactly the same and it is a very big disillusion for us. We had to build a new functionality and we choose to make use of this announced feature which is not working.

First it should be written more explicitly that it supports only the intersection of two indexes (great !!! :-( )

Then it should simply work ! I want to give to the users the possibility to filter a list of invoices by user, table and/or time range for example, and I need 2000 to 5000 ms to return a limit of 10 results !!!

I can't understand how an algorithm, for the following query :
db.invoices.find({ is_reopened: false, "table.uuid": { $in: [ "UUID1" ] }, venue_id: 5000, invoice_closed_at: { $lte: new Date(1397426399999), $gte: new Date(1363906800000) } })
will be returned by "BtreeCursor table.uuid_1"
and the following query:
db.invoices.find({ is_reopened: false, "table.uuid": { $in: [ "UUID1" ] }, venue_id: 5000, invoice_closed_at: { $lte: new Date(1397426399999), $gte: new Date(1363906800000) }, "closed_by_user.uuid": { $in: [ "UUID2" ] } })
will be returned by "BtreeCursor venue_id_1"

(Yes I use $IN because I want to allow to specify multiple input for these searches)

With venue_id limiting 4 million documents to 40 000 docs, table.uuid having 30 distinct values for the given venue and user having only 2 for the current user... And with indexes on all queried fields, as well as a compound index on venue_id and invoice_closed_at !

Filtering out 100 documents, limiting the returned query to 10, from 4 million entries and only working on indexes should never take several seconds...

Matthieu

Asya Kamsky

unread,
Apr 14, 2014, 1:40:22 PM4/14/14
to mongodb-user
One thing you might check is what happens if you run the query with explain(true) multiple times.

It is possible that one or more of the indexes you expected it to use for intersection was "cold" (i.e. not in RAM) in which case it would perform worse than using a hot index and scanning the documents for a match of other conditions.

I've also noticed that for index intersection to be selected it has to be a more performant query plan than one that uses a single index followed by document examination - and it can be complicated to create this scenario.

Having said that, the more information you can provide in your feedback about the results of testing the better - we may be able to see if there is a bug, or if the expected behavior can be improved on.

Could you provide db.collection.stats() of the collection you are testing on along with output from the queries you tried with explain(true)?

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

Matthieu Rigal

unread,
Apr 14, 2014, 1:55:22 PM4/14/14
to mongod...@googlegroups.com
Hi Asya,

It's slighlty sensible data for a part, but it should be feasible...

First, I've also noticed the same, performances got better with time. But I couldn't find any query using the index intersection whereas we were hoping to have it being used permanently.

Here is the output of the stats:

{
"ns" : "collection.invoices",
"count" : 3980482,
"size" : 20574449136,
"avgObjSize" : 5168,
"storageSize" : 20921180064,
"numExtents" : 28,
"nindexes" : 19,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 4288459168,
"indexSizes" : {
"_id_" : 141681904,
"uuid_1" : 308464128,
"shift_uuid_1_modified_at_1" : 227897824,
"waiters.name_1" : 209232016,
"waiters.uuid_1" : 433679568,
"bill_number_1" : 167616176,
"closed_by_user.name_1" : 204727040,
"closed_by_user.uuid_1" : 425037536,
"invoice_closed_at_1" : 147257936,
"table.name_1" : 176544368,
"table.uuid_1" : 420328160,
"payments.type.name_1" : 169790992,
"payments.type.uuid_1" : 431831792,
"total.amount_1" : 192806432,
"object_uuid_1" : 197916432,
"venue_id_1" : 100156000,
"venue_id_1_invoice_closed_at_1" : 136751776,
"is_reopened_1" : 68097904,
"table.uuid_hashed" : 128641184
},
"ok" : 1
}

And here is the (long) output of the explain:

{
"cursor" : "BtreeCursor venue_id_1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 31857,
"nscanned" : 31857,
"nscannedObjectsAllPlans" : 223004,
"nscannedAllPlans" : 350449,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2737,
"nChunkSkips" : 0,
"millis" : 1132,
"indexBounds" : {
"venue_id" : [
[
5000,
5000
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor venue_id_1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 31857,
"nscanned" : 31857,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"venue_id" : [
[
5000,
5000
]
]
}
},
{
"cursor" : "BtreeCursor venue_id_1_invoice_closed_at_1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 31857,
"nscanned" : 31857,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"venue_id" : [
[
5000,
5000
]
],
"invoice_closed_at" : [
[
ISODate("2013-03-21T23:00:00Z"),
ISODate("2014-04-13T21:59:59.999Z")
]
]
}
},
{
"cursor" : "BtreeCursor closed_by_user.uuid_1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 31858,
"nscanned" : 31859,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"closed_by_user.uuid" : [
[
"55DB9DCC-7558-4941-9D3E-396B18EB489B",
"55DB9DCC-7558-4941-9D3E-396B18EB489B"
]
]
}
},
{
"cursor" : "BtreeCursor table.uuid_1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 31858,
"nscanned" : 31859,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"table.uuid" : [
[
"FC92F7B6-539A-470C-8862-C15C0604A300",
"FC92F7B6-539A-470C-8862-C15C0604A300"
]
]
}
},
{
"cursor" : "BtreeCursor table.uuid_hashed",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 31858,
"nscanned" : 31859,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"table.uuid" : [
[
NumberLong("7110125637611175919"),
NumberLong("7110125637611175919")
]
]
}
},
{
"cursor" : "BtreeCursor is_reopened_1",
"isMultiKey" : false,
"n" : 7,
"nscannedObjects" : 31858,
"nscanned" : 31859,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"is_reopened" : [
[
false,
false
]
]
}
},
{
"cursor" : "Complex Plan",
"n" : 10,
"nscannedObjects" : 0,
"nscanned" : 31860,
"nChunkSkips" : 0
},
{
"cursor" : "Complex Plan",
"n" : 10,
"nscannedObjects" : 0,
"nscanned" : 31860,
"nChunkSkips" : 0
},
{
"cursor" : "BtreeCursor invoice_closed_at_1",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 31858,
"nscanned" : 31859,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"invoice_closed_at" : [
[
ISODate("2013-03-21T23:00:00Z"),
ISODate("2014-04-13T21:59:59.999Z")
]
]
}
},
{
"cursor" : "Complex Plan",
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 31860,
"nChunkSkips" : 0
},
{
"cursor" : "Complex Plan",
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 31860,
"nChunkSkips" : 0
}
],
"server" : "myserver",
"filterSet" : false,
"stats" : {
"type" : "KEEP_MUTATIONS",
"works" : 31859,
"yields" : 2737,
"unyields" : 2737,
"invalidates" : 0,
"advanced" : 10,
"needTime" : 31847,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 31858,
"yields" : 2737,
"unyields" : 2737,
"invalidates" : 0,
"advanced" : 10,
"needTime" : 31847,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 10,
"children" : [
{
"type" : "IXSCAN",
"works" : 31857,
"yields" : 2737,
"unyields" : 2737,
"invalidates" : 0,
"advanced" : 31857,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ venue_id: 1.0 }",
"boundsVerbose" : "field #0['venue_id']: [5000.0, 5000.0]",
"isMultiKey" : 0,
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 31857,
"children" : [ ]
}
]
}
]
}
}

As you are looking to this, I was wondering why the "normal" index on a uuid was always preferred to its hashed version ? I'm fearing to drop the original one, even if they are 4 times bigger, since they could be less performant...

Best and thanks,
Matthieu

Asya Kamsky

unread,
Apr 14, 2014, 2:03:30 PM4/14/14
to mongodb-user
Can you provide a sample document?   You can X-out the values of the fields, I just want to see the structure as far as arrays vs. embedded documents, etc.

Also, the exact query passed, was it like in your previous email?   I noticed you mentioned limit but I didn't see limit on the query...   I assume if there is a limit there is also a sort?

A

Asya



Asya Kamsky

unread,
Apr 14, 2014, 2:07:12 PM4/14/14
to mongodb-user
As far as your question about hashed:

As you are looking to this, I was wondering why the "normal" index on a uuid was always preferred to its hashed version ? 
> I'm fearing to drop the original one, even if they are 4 times bigger, since they could be less performant

What was the purpose of the hashed index?   Are you sharding on it?  Or planning on using it to shard on in the future?

A hashed index can and will be used for equality queries but it can't be used for range queries.

Asya

Matthieu Rigal

unread,
Apr 15, 2014, 5:24:14 AM4/15/14
to mongod...@googlegroups.com
First an answer here to this easier point :

We are not sharding on it neither planning to do so in the future. We just need an index since we are querying for equality (or in list of values), but not range. We would like to use a hashed index since it is much smaller !

Matthieu

Matthieu Rigal

unread,
Apr 15, 2014, 6:46:08 AM4/15/14
to mongod...@googlegroups.com
Hi Asya,

Thanks for your interest. Here is a sample document below.

As for the query, the first intent is actually to first filter (by applying up to 4-5 filters), get a count of the available documents (per venue), then make a sort by only one value (sometimes inside the filter list, but sometimes not) and limit the query to 10, 25 or 50 results. This is why in terms of query (through PyMongo), it doesn't use the limit since the beginning. I am actually reproducing the queries that are arriving to the DB by extracting output from the log.

What we are actually doing is a kind of AJAX drive view on the document collection. We started putting like 30-50k documents for a couple venue to test the system and it was working brilliantly, we had all results in a dozens of milliseconds. Then we started raising the amount of venues to a couple thousands, without changing the amount of documents per venue and the system became horribly slow (2000-5000 milliseconds). We are very surprised about that since a good index on venue_id should make the raise of the total amount of documents almost imperceptible.

We are actually hitting two points here :
- We always read that Mongo had scalability problems but we couldn't feel them too much on our previous collections (and their usage). Here it is obvious that the parsing of an indexed collections scales bad.
- The timing get even worse when adding filters, due to the fact that the query planner is not using the index intersection and eventually fetching 40k docs to then return a count of a couple hundreds docs (without talking about the limitation of index intersection to 2 indexes, nor about the fact it requires the index to be in the filter to be used for sorting)

Matthieu

{
  "_id": { "$oid" : "5343c1f1ec565773d29baa27" },
  "venue_uuid": { "$binary" : "bla2", "$type" : "3" },
  "_venue_uuid": "UUID-V",
  "venue_id": 247,
  "object_uuid": { "$binary" : "bla1", "$type" : "3" },
  "_object_uuid": "UUID-O",
  "waiters": [
    {
      "uuid": "UUID-W1",
      "name": "K"
    }
  ],
  "closed_by_device": {
    "identifier": "UUID-D",
    "name": "K H"
  },
  "closed_by_user": {
    "uuid": "UUID-U",
    "name": "K"
  },
  "tab_opened_at": { "$date": 1396949487000 },
  "invoice_created_at": { "$date": 1396949487000 },
  "invoice_closed_at": { "$date": 1396949487000 },
  "timezone": "Europe/Berlin",
  "created_at": { "$date": 1396956689221 },
  "shift_uuid": { "$binary" : "bla2", "$type" : "3" },
  "session_uuid": { "$binary" : "bla3", "$type" : "3" },
  "uuid": "UUID",
  "is_reopened": false,
  "invoice_number": "3625",
  "bill_number": 3625,
  "table": {
    "uuid": "UUID_T",
    "name": "T"
  },
  "total": {
    "amount": 188,
    "currency": "EUR",
    "exp": -2
  },
  "discounts": [DISCOUNT_LIST],
  "taxes": [TAX_LIST],
  "payments": [PAYMENT_LIST],
  "positions": [POSITION_LIST],
  "printed_text": "TEXT"
}

Matthieu Rigal

unread,
Apr 15, 2014, 7:10:01 AM4/15/14
to mongod...@googlegroups.com
One last input maybe, here is an example of the mongo log:

query collection.invoices query: { $query: { is_reopened: false, venue_id: 5000, invoice_closed_at: { $lte: new Date(1397512799999), $gte: new Date(1363906800000) } }, $orderby: { bill_number: 1 } } planSummary: IXSCAN { bill_number: 1 } cursorid:121498310751 ntoreturn:10 ntoskip:0 nscanned:119904 nscannedObjects:119903 keyUpdates:0 numYields:119 locks(micros) r:3790280 nreturned:10 reslen:3646 2133ms

When there are 30k objects belonging to this venue_id, and I want 10 objects back, I don't see how it is possible to have to scan 120k objects... And how it can take 2133 ms to return...

Matthieu

et...@orderbird.com

unread,
Apr 15, 2014, 10:04:36 AM4/15/14
to mongod...@googlegroups.com
Hi, I am working on the same project as Matthieu.  To work around the problem with the index intersection, I am trying to make use of compound indexes in our staging instance.  Since all queries have a venue and a timerange, I made a compound index on them.  Ideally the other criteria would use index intersection, but we could at least make the page useable by limiting the timerange.  Here are three different queries with an interesting result.  Can you find the problem with the query planner in the third query?

1) Searching and sorting on compound index uses it well, scans 7408 objects to find the 7314 matches (filtering out by closed_by_user.uuid) in 33 ms:

> db.invoices.find({venue_id: 5000, invoice_closed_at: {$gt: ISODate("2013-10-27T20:05:45.410Z"), $lt: ISODate("2014-02-02T10:10:10.123Z")}, "closed_by_user.uuid": "FCD27EE6-3F8A-4EE0-8261-61D7FB086C3C"}).sort({invoice_closed_at: -1}).explain()
{
    "cursor" : "BtreeCursor venue_id_1_invoice_closed_at_1 reverse",
    "isMultiKey" : false,
    "n" : 7314,
    "nscannedObjects" : 7408,
    "nscanned" : 7408,
    "nscannedObjectsAllPlans" : 7709,
    "nscannedAllPlans" : 8020,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 63,
    "nChunkSkips" : 0,
    "millis" : 35,

    "indexBounds" : {
        "venue_id" : [
            [
                5000,
                5000
            ]
        ],
        "invoice_closed_at" : [
            [
                ISODate("2014-02-02T10:10:10.123Z"),
                ISODate("2013-10-27T20:05:45.410Z")
            ]
        ]
    },
    "server" : "XXXXX",
    "filterSet" : false
}

2) I can remove the sort and the planner is still smart enough to use the index well.

> db.invoices.find({venue_id: 5000, invoice_closed_at: {$gt: ISODate("2013-10-27T20:05:45.410Z"), $lt: ISODate("2014-02-02T10:10:10.123Z")}, "closed_by_user.uuid": "FCD27EE6-3F8A-4EE0-8261-61D7FB086C3C"}).explain()

{
    "cursor" : "BtreeCursor venue_id_1_invoice_closed_at_1",
    "isMultiKey" : false,
    "n" : 7314,
    "nscannedObjects" : 7408,
    "nscanned" : 7408,
    "nscannedObjectsAllPlans" : 7993,
    "nscannedAllPlans" : 8587,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 67,
    "nChunkSkips" : 0,
    "millis" : 60,

    "indexBounds" : {
        "venue_id" : [
            [
                5000,
                5000
            ]
        ],
        "invoice_closed_at" : [
            [
                ISODate("2013-10-27T20:05:45.410Z"),
                ISODate("2014-02-02T10:10:10.123Z")
            ]
        ]
    },
    "server" : "XXXXXX",
    "filterSet" : false
}

3) But as soon as I add a sort on another position.... it scans the entire collection!!! (almost 4 million entries).  How is this a better query plan than loading the 7314 matches in memory and sorting them?  This machine has no other load than our manual testing.

> db.invoices.find({venue_id: 5000, invoice_closed_at: {$gt: ISODate("2013-10-27T20:05:45.410Z"), $lt: ISODate("2014-02-02T10:10:10.123Z")}, "closed_by_user.uuid": "FCD27EE6-3F8A-4EE0-8261-61D7FB086C3C"}).sort({"table.name": 1}).explain()
{
    "cursor" : "BtreeCursor table.name_1",
    "isMultiKey" : false,
    "n" : 7314,
    "nscannedObjects" : 3963270,
    "nscanned" : 3963270,
    "nscannedObjectsAllPlans" : 4018234,
    "nscannedAllPlans" : 4067333,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 31779,
    "nChunkSkips" : 0,
    "millis" : 7715,
    "indexBounds" : {
        "table.name" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "XXXXX",
    "filterSet" : false
}


As to an idea of the collection size, here are some stats, almost 4 million objects, around 27 GB of storage size.

> db.invoices.stats(1024*1024)
{
    "ns" : "XXXXX.invoices",
    "count" : 3963270,
    "size" : 26663,
    "avgObjSize" : 7054,
    "storageSize" : 27297,
    .....
    "indexSizes" : {
        "_id_" : 122,
        "venue_id_1__cls_1" : 99,
        "uuid_1" : 206,
        "shift_uuid_1_modified_at_1" : 134,
        "waiters.name_1" : 102,
        "waiters.uuid_1" : 210,
        "bill_number_1" : 95,
        "closed_by_user.name_1" : 100,
        "closed_by_user.uuid_1" : 206,
        "invoice_closed_at_1" : 95,
        "table.name_1" : 86,
        "table.uuid_1" : 206,
        "payments.type.name_1" : 80,
        "payments.type.uuid_1" : 206,
        "total.amount_1" : 95,
        "object_uuid_1" : 129,
        "venue_id_1_invoice_closed_at_1" : 129
    },
    "ok" : 1

Asya Kamsky

unread,
Apr 16, 2014, 10:04:35 AM4/16/14
to mongodb-user
TL;DR if the index that would work well isn't being picked, you can
hint() the index.

Ethan,

What's happening here is that you know that there will only be 7K
documents to sort in memory, and I know that, but the query optimizer
does not know that and errs on the side of using an index to get the
documents in order (table.name) and then filter through them on the
fly. This will process documents in order that they should be
returned in. If the match instead matched 70,000 documents or
700,000 or 7 million then doing in-memory sort could be fatal (there
is a hard limit of 32MB on in-memory sorts to avoid completely
derailing performance for all other operations).

There is a really good article that a colleague of mine wrote that you
might find helpful:
http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/

Let me know if this makes sense.

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/01fb96b5-1568-46fb-a0b5-e639fb8df588%40googlegroups.com.

Eric Xu

unread,
Jul 11, 2014, 2:15:15 PM7/11/14
to mongod...@googlegroups.com
I'm encountering the same problem.  index intersection never worked for me no matter what I tried.  It seemed the complex plans were always evaluated, but never picked.  
A single index with 2 million scans and minutes of execution time are favored over index intersection.  

> db.xxx.find({'byUser._id': {$gt: ObjectId("53bf75898de2fb8790ef7f0b")}, activityId: {$gt: ObjectId("53bf75898de2fb8790ef83b6")}, 'activityType': 'SHEET' }).explain(true)

{
   
"cursor":"BtreeCursor byUser._id_1",
   
"isMultiKey":false,
   
"n":2271416,
   
"nscannedObjects":2272000,
   
"nscanned":2272000,
   
"nscannedObjectsAllPlans":2272404,
   
"nscannedAllPlans":2273021,
   
"scanAndOrder":false,
   
"indexOnly":false,
   
"nYields":54736,
   
"nChunkSkips":0,
   
"millis":438548,
   
"indexBounds":{
     
"byUser._id":[
         
[
           
ObjectId("53bf75898de2fb8790ef7f0b"),
           
ObjectId("ffffffffffffffffffffffff")
         
]
     
]
   
},
   
"allPlans":[
     
{
         
"cursor":"BtreeCursor byUser._id_1",
         
"isMultiKey":false,
         
"n":2271416,
         
"nscannedObjects":2272000,
         
"nscanned":2272000,

         
"scanAndOrder":false,
         
"indexOnly":false,
         
"nChunkSkips":0,
         
"indexBounds":{

           
"byUser._id":[
               
[
                 
ObjectId("53bf75898de2fb8790ef7f0b"),
                 
ObjectId("ffffffffffffffffffffffff")
               
]
           
]
         
}
     
},
     
{
         
"cursor":"BtreeCursor activityId_1",
         
"isMultiKey":false,
         
"n":0,
         
"nscannedObjects":202,
         
"nscanned":203,

         
"scanAndOrder":false,
         
"indexOnly":false,
         
"nChunkSkips":0,
         
"indexBounds":{

           
"activityId":[
               
[
                 
ObjectId("53bf75898de2fb8790ef83b6"),
                 
ObjectId("ffffffffffffffffffffffff")
               
]
           
]
         
}
     
},
     
{
         
"cursor":"BtreeCursor activityType_1",
         
"isMultiKey":false,
         
"n":0,
         
"nscannedObjects":202,
         
"nscanned":203,

         
"scanAndOrder":false,
         
"indexOnly":false,
         
"nChunkSkips":0,
         
"indexBounds":{

           
"activityType":[
               
[
                 
"SHEET",
                 
"SHEET"

               
]
           
]
         
}
     
},
     
{
         
"cursor":"Complex Plan",
         
"n":0,
         
"nscannedObjects":0,

         
"nscanned":205,

         
"nChunkSkips":0
     
},
     
{
         
"cursor":"Complex Plan",
         
"n":0,
         
"nscannedObjects":0,

         
"nscanned":205,

         
"nChunkSkips":0
     
},
     
{
         
"cursor":"Complex Plan",
         
"n":0,
         
"nscannedObjects":0,

         
"nscanned":205,
         
"nChunkSkips":0
     
}
   
],
   
"server":"admins-MacBook-Pro.local:27017",

   
"filterSet":false,
   
"stats":{
     
"type":"KEEP_MUTATIONS",

     
"works":2309234,
     
"yields":54736,
     
"unyields":54736,
     
"invalidates":0,
     
"advanced":2271416,
     
"needTime":584,
     
"needFetch":37233,

     
"isEOF":1,
     
"children":[
         
{
           
"type":"FETCH",

           
"works":2309234,
           
"yields":54736,
           
"unyields":54736,
           
"invalidates":0,
           
"advanced":2271416,
           
"needTime":584,
           
"needFetch":37233,

           
"isEOF":1,
           
"alreadyHasObj":0,
           
"forcedFetches":0,

           
"matchTested":2271416,
           
"children":[
               
{
                 
"type":"IXSCAN",
                 
"works":2272000,
                 
"yields":54736,
                 
"unyields":54736,
                 
"invalidates":0,
                 
"advanced":2272000,

                 
"needTime":0,
                 
"needFetch":0,
                 
"isEOF":1,

                 
"keyPattern":"{ byUser._id: 1 }",
                 
"boundsVerbose":"field #0['byUser._id']: (ObjectId('53bf75898de2fb8790ef7f0b'), ObjectId('ffffffffffffffffffffffff')]",

                 
"isMultiKey":0,
                 
"yieldMovedCursor":0,
                 
"dupsTested":0,
                 
"dupsDropped":0,
                 
"seenInvalidated":0,
                 
"matchTested":0,

                 
"keysExamined":2272000,
                 
"children":[


                 
]
               
}
           
]
         
}
     
]
   
}
}



Asya Kamsky

unread,
Jul 12, 2014, 2:36:17 AM7/12/14
to mongodb-user
I think you are running into a different bug that's being fixed for 2.6.4 where two plans tie in race-off when one of them is much more selective than the other.

Asya



Eric Xu

unread,
Jul 12, 2014, 3:08:02 AM7/12/14
to mongod...@googlegroups.com
cool!  thank you!


You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/AN6GePLJVEg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
Reply all
Reply to author
Forward
0 new messages