MongoDB create index efficiently to support query with "$OR" logic and sort operation

27 views
Skip to first unread message

Bình Lê Việt Duy

unread,
Dec 7, 2017, 4:03:01 PM12/7/17
to mongodb-user
Hello everyone, suppose that I have a very large collection students with this structure:
{
   
"_id": ObjectId(""),
   
"name": "Nguyen Van A",
   
"age": 23,
   
"city": "HCM",
   
"gender": "male",
   
"gpa: 8.26
}
My question is how to create the best index for the query:
db.students.find({
"gender": "male",
"$or": [{
"name": {
"$regex": "nam"
}
},
{
"city": {
"$regex": "nam"
}
}]
}).sort({
"gpa": -1
});
I think the best way is to re-construct the query like that:
db.students.find({
"$or": [{
"gender": "male",
"name": {
"$regex": "nam"
}
},
{
"gender": "male",
"city": {
"$regex": "nam"
}
}]
}).sort({
"gpa": -1
})
After that we create 2 indexes which will support our query and sort operation like this:
db.students.createIndex({"name": 1, "gender": 1, "gpa": 1});
db.students.createIndex({"city": 1, "gender": 1, "gpa": 1});
Unfortunately these indexes cannot help with the sort operation although it efficiently works with the two "OR" operator query.

Do you have any solutions for this?

I really appreciate your answers and ideas.

Thank you in advance.

Kevin Adistambha

unread,
Dec 18, 2017, 1:19:51 AM12/18/17
to mongodb-user

Hi

How did you determine that the indexes didn’t help with your query? I assume that you used the output of explain(). The explain() method can show you how the query planner is using indexes to answer your query. Most of the time, you can determine the query’s performance by examining the stages planned by the query planner. More information are available in the Explain Results page.

With regard to your question, the indexes you currently have:

db.students.createIndex({“name”: 1, “gender”: 1, “gpa”: 1});
db.students.createIndex({“city”: 1, “gender”: 1, “gpa”: 1});

are not the best indexes to answer your query. This is due to the presence of the $regex field in your query. The explain() result would contain the SORT_KEY_GENERATOR stage. This stage means that MongoDB is using an in-memory sort stage, which is limited to 32MB in memory usage. If your query returns a lot of data, the sorting stage could use more than 32MB of memory, at which point your query will fail.

To read more about how to improve your compound index usage, please see: Optimizing MongoDB Compound Indexes. Please note that the stage names in the post are outdated (e.g. BasicCursor instead of COLLSCAN, BtreeCursor instead of IXSCAN, etc.) but the information are still relevant.

By using the guidelines presented in that page, your indexes could be changed to:

db.students.createIndex({gender:1, gpa:1, name:1})
db.students.createIndex({gender:1, gpa:1, city:1})

Examining the explain() output of your query, you can see that the SORT_KEY_GENERATOR stage is replaced by a SORT_MERGE stage. This stage doesn’t use an in-memory sort, and your query should be more efficient as a result.

As a side note, you are correct that refactoring your query from:

db.students.find({gender:..., $or: [{name:...}, {city:...}]})

to:

db.students.find({$or: [{gender:..., name:...}, {gender:..., city:...}]})

will result in better index usage in MongoDB <= 3.4.10. However, this is changed in MongoDB 3.6.0, and the first form of the query should be equally performant as the second form. See SERVER-13732 for more details regarding this improvement.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages