Index Optimization & Sorting

55 views
Skip to first unread message

Χρήστος Μάλλιος

unread,
Aug 22, 2017, 7:36:48 AM8/22/17
to mongodb-user
I have seen in many posts that in general the optimal way to create an index in mongodb for a query is to set the equal fields firstly, after that the sort fields and in the end the range fields. In this way, because the sort fields are not in the beginning of the index, mongodb should do in memory sort(there is no global sorting). So my first question is how this sorting is implementing. I guess, because the elements are sorted per "segment", the optimal way to do sort is by merging the already sorted segments. Is this correct or not? 

My seconds question has to do with the above mentioned optimal field order. Why is it more optimal to use the equality fields first and not the sort fields. I understand that the equal filters can probably reduce the size of the result too much, but even though, the result is not already sorted. On the contrary, if the sorted values were in the beginning of the index, the results would be already sorted and I guess that because the index includes the filter fields after that, mongodb will not load useless documents in memory. Is it something wrong in my way of thinking?

My last question is about the limit of 32MB on data that will be sorted in memory. Does this limitation exists? If this limit exists, the sorting is done by segments(for example traditional SQL dbms use in memory some blocks and not the whole data) or not? In case that the sorting is not done in segments, that means that if the desired resultset that we want to sort is larger than 32MB, then the query will not be completed?

Rhys Campbell

unread,
Aug 23, 2017, 4:28:34 AM8/23/17
to mongodb-user
"I understand that the equal filters can probably reduce the size of the result too much, but even though, the result is not already sorted."

It is already sorted if you querying using the equality.

db.collection.find( { "username": "rhys" }, {} ).sort({ "timestamp": -1})

Assuming we have an index on { username: 1, timestamp: -1 } then the data is already in the required sort order. If we switched to an index of (timestamp, username) then this query would scan a much larger number of index keys (assuming the collection size is non-trivial).

You can view this on your collection / queries with explain...



Χρήστος Μάλλιος

unread,
Sep 30, 2017, 12:36:00 PM9/30/17
to mongodb-user
You are right mate. The main problem is that until now I didn't know that when there is an "or" operator, mongoDB will use one index per clause and not one totally.
Reply all
Reply to author
Forward
0 new messages