Hi John
It’s difficult to say for sure what indexes is optimal without knowing for sure the cardinality of your data. However, I tried to make an educated guess regarding your data, and populated a test collection using 50,000 randomly generated documents to check out your queries.
I tried with the arguably simpler second query you posted:
{start: {$gte: 12345, $lte: 5678}, amount_sent: 88}
For this query, you need an index having the fields start
and amount_sent
. You suggested using {start:-1, amount_sent:1}
. However, this is not optimal, which we can see in the explain()
output. Note the use of hint()
to force MongoDB to use a certain index:
> db.test.explain('executionStats').find({amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}).sort({start:-1}).hint({start:-1, amount_sent:1})
...
"nReturned": 253,
"executionTimeMillis": 38,
"totalKeysExamined": 24947,
"totalDocsExamined": 253,
...
From the explain('executionStats')
output, MongoDB needs to scan ~25,000 keys to return 253 documents matching the query.
A better index is {amount_sent:1, start:1}
. This index is structured by first having an exact value of amount_sent
, where for each value of amount_sent
, sorted values of start
is stored. Here’s an illustration of the index structure:
amount_sent | 1 | 2 | 3 | … | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
start | 1 | 2 | 3 | … | 1 | 2 | 3 | … | 1 | 2 | 3 | … | … |
Since you need an exact value of amount_sent
and a range of start
(and possibly sort using start
), you can verify from the illustration above that the index facilitates the query perfectly.
The explain()
result of the query also shows its efficiency:
> db.test.explain('executionStats').find({amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}).sort({start:-1}).hint({amount_sent:1, start:1})
...
"nReturned": 253,
"executionTimeMillis": 1,
"totalKeysExamined": 253,
"totalDocsExamined": 253,
...
Now MongoDB examines 253 keys (instead of ~25,000) and returns the same number of documents. One takeaway from this is that an index fits the query if the numbers in nReturned
, totalKeysExamined
, and totalDocsExamined
are the same.
Your first query is more involved:
{$or: [{sender: “Bob”, receiver: “Bob}], amount_sent: 30, start: {$gte: 12345, $lte: 5678}}
Let’s say we have an index {sender:1, amount_sent:1, start:1}
:
> db.test.explain('executionStats').find({$or:[{sender:"Oscar"}, {receiver:"Oscar"}], amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}).sort({start:-1}).hint({sender:1, amount_sent:1, start:1})
...
"nReturned": 4,
"executionTimeMillis": 118,
"totalKeysExamined": 50000,
"totalDocsExamined": 50000,
...
"stage": "SORT_KEY_GENERATOR",
...
This is not great at all. MongoDB looks at 50,000 keys and documents (basically the whole collection), only to return 4 of them. Also, note the stage called “SORT_KEY_GENERATOR” there. The existence of this stage means that MongoDB cannot use the index to present a sorted result, and thus is doing an in-memory sort instead, which is limited to 32 MB. If your query returns a large result set that exceeds 32 MB, the query will fail. See Use Indexes to Sort Query Results.
We can improve this by refactoring your query (by unwrapping the $or statement) and creating a couple of indexes to support both terms of the $or query.
The refactored query looks like:
{$or:[
{sender:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}},
{receiver:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}
]}
and the two indexes are {sender:1, amount_sent:1, start:1}
and {receiver:1, amount_sent:1, start:1}
, to satisfy both terms of the $or query. The explain()
result I got is:
> db.test.explain('executionStats').find({$or:[{sender:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}, {receiver:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}]}).sort({start:-1})
...
"nReturned": 4,
"executionTimeMillis": 1,
"totalKeysExamined": 4,
"totalDocsExamined": 4,
...
"stage": "SORT_MERGE",
...
The refactored query and the two indexes appear to work together well. Now MongoDB returns 4 documents, and examined exactly the same number of index keys and documents. Also note the existence of a “SORT_MERGE” stage. This is basically the merge stage of a merge sort, which is not bound to the 32 MB limit. This stage merges the two (already sorted) results for each of the $or terms. Also, execution time of the query drops from 118 ms to just 1 ms.
Please note that the analysis I did is mostly an educated guess since I don’t know the exact nature of your data. To optimize your use case, I would encourage you to fully understand the explain()
result: https://docs.mongodb.com/v3.2/reference/explain-results/
Best regards,
Kevin