$natural order avoids indexes

808 views
Skip to first unread message

David Nellessen

unread,
Dec 5, 2013, 7:08:19 AM12/5/13
to mongod...@googlegroups.com
Profiling slow queries I found something really strange: For the following operation the entire collection was scanned (33061 documents) even though there is an index on the query parameter family_id:

{
 
"ts" : ISODate("2013-11-27T10:20:26.103Z"),
 
"op" : "query",
 
"ns" : "mydb.zones",
 
"query" : {
 
"$query" : {
 
"family_id" : ObjectId("52812295ea84d249934f3d12")
 
},
 
"$orderby" : {
 
"$natural" : 1
 
}
 
},
 
"ntoreturn" : 20,
 
"ntoskip" : 0,
 
"nscanned" : 33061,
 
"keyUpdates" : 0,
 
"numYield" : 91,
 
"lockStats" : {
 
"timeLockedMicros" : {
 
"r" : NumberLong(83271),
 
"w" : NumberLong(0)
 
},
 
"timeAcquiringMicros" : {
 
"r" : NumberLong(388988),
 
"w" : NumberLong(22362)
 
}
 
},
 
"nreturned" : 7,
 
"responseLength" : 2863,
 
"millis" : 393,
 
"client" : "127.0.0.1",
 
"user" : "mydb"
}

After some Google searches without results I found out that leaving out the "$orderby": { "$natural" : 1} the query is very fast and only 7 documents are scanned instead of 33061. So I assume using $orderby in my case does avoid using the index on family_id. The strange thing is that the resulting order is not different in either case. As far as I understand $natural order it is tautologically to use  "$orderby": { "$natural" : 1} or no explicit order. Another very interesting observation is that this issue does not arise on capped collection!!
 This issue  arises the following questions:
  1. If not using any ordering/sorting, shouldn't the resulting order be the order on disk, i.e. $natural order?
  2. Can I create a (compound-)index that would be used sorting naturally?
  3. How can I invert the ordering of a simple query that uses an index an no sorting without severe performance losses?
  4. What happens behind the scenes when using query parameters and orderby? Why is this not happening on capped collections? I would like to understand this strange behaviour.
  5. Are the answers of the above questions independent of whether you use sharding/replication or not? What is the natural order of a query over multiple shards?
Note I am using MongoDB 2.2. There is a ticket related to this issue: https://jira.mongodb.org/browse/SERVER-5672. Though it seems in that ticket that the issue occures in capped collections too, which I cannot confirm (maybe due to different mongo versions).

Eoin Brazil

unread,
Jan 3, 2014, 10:03:29 AM1/3/14
to mongod...@googlegroups.com
Hi David,

In general, best practise is to use an index to improve the speed of retrieval of data. The $natural option should not be used for queries as it will force a table scan which can be very slow for large collections. The documentation has a good starting point on indexing, http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/ and then on how to make an index more selective to improve its effectiveness, http://docs.mongodb.org/manual/tutorial/create-queries-that-ensure-selectivity/.

I created an example of a simple one criteria query with an index and documented its behaviour below showing how you can use the same index to sort the results either for ascending or descending ordering. Is this what you mean by a simple query ?

Taking a sample document with fields 'age' in a collection 'test' and querying it on age with an index on this field and sorting it on 'age'.
i)
{code:js}
for (i=0; i<10000; i++) {db.test.insert({"age":Math.floor(Math.random()*120)});}
{code}
ii) {code:js}
db.test.ensureIndex({"age": 1})
{code}
iii) a) age sorted descending query {code:js}
db.test.find({"age": 25}).sort("age": -1).limit(5)
{ "_id" : ObjectId("52c6ccdf70aa401c79ba8965"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdf70aa401c79ba88ca"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdf70aa401c79ba8868"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdf70aa401c79ba87a2"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdf70aa401c79ba86e4"), "age" : 25 }
{code}
iii) b) age sorted ascending query {code:js}
db.test.find({"age": 25}).sort("age": 1).limit(5)
{ "_id" : ObjectId("52c6ccdd70aa401c79ba643a"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdd70aa401c79ba64d2"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdd70aa401c79ba6531"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdd70aa401c79ba661f"), "age" : 25 }
{ "_id" : ObjectId("52c6ccdd70aa401c79ba6631"), "age" : 25 }
{code}

In MongoDB when you do not include a sort criteria then the ordering of the results is undefined. It is not possible to specify a $natural sort in an index as when you specify the $natural option for a query then no index will be used. 

The ticket https://jira.mongodb.org/browse/SERVER-5672 highlights that the use of $natural forces the database to do a table scan to return the documents in the order they are on disk. You should not $natural for your queries, the answer to ticket (SERVER-5672) is to use a query with an index which returns the duplicates in their disk order.

1) The resulting order on disk cannot be guaranteed to be natural order since documents can grow in size once updated or they can be deleted/removed and therefore can be moved around on the disk creating space for new documents. This is somewhat related to your 4th question below.
2) You can create a compound index (http://docs.mongodb.org/manual/core/index-compound/#index-type-compound) or you could use the hint(http://docs.mongodb.org/manual/reference/method/cursor.hint/#cursor.hint) to explicitly use your existing index, which is probably preferable than creating an additional index and also relates to how queries are optimised (see question 4 below).
3) MongoDB can traverse an index in either direction. This will not have any performance issues for a simple index. A further tip is to put a sort() call before a limit() call when querying to reduce the scanning of your index.
4) here is a good overview of how the query optimiser chooses the index to use in a query in the documentation (http://docs.mongodb.org/manual/core/query-plans/#read-operations-query-optimization). The reason you are not seeing this behaviour in capped collections is that the records are of a fixed size so they do not move around the disk (see answer 1) as documents cannot be removed, deleted or increase in size within a capped collection (note: they can age-out).
5) This behaviour is independent of whether or not you shard your collection. You can use explain() with sharding to see how the query is processed across shards. The natural order of a query on a shard will depend on your shard key strategy and your data distribution. In essence, MongoS will route your query to the database contain that portion of the data and it will be queried on the database as normal (see question 4).

Hope this answers your questions!

David Nellessen

unread,
Jan 8, 2014, 12:42:16 PM1/8/14
to mongod...@googlegroups.com
Hi Eoin.

Thank you very much for your detailed answer and for clearing thinks up. I do have a much better understanding of how natural sorting works. Though I am a bit disappointed because it is basically useless unless you are working on capped collection. Natural order was my hope to query a collection by insertion order for which there is no other way as far as I know. But that is another topic...

Cheers
David
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/-Pb7lEahvf4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Asya Kamsky

unread,
Jan 9, 2014, 5:19:36 AM1/9/14
to mongodb-user
David,

Two things to clarify:

$natural order is NOT insertion order, except in a capped collection.

You never need to specify $natural order - that is what you will get by default.

Asya
P.S. but if you are using MongoDB generated ObjectId() then ordering by _id will give you insertion order to a reasonably good approximation.



You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

David Nellessen

unread,
Jan 13, 2014, 8:31:15 AM1/13/14
to mongod...@googlegroups.com


Am Donnerstag, 9. Januar 2014 11:19:36 UTC+1 schrieb Asya Kamsky:
David,

Two things to clarify:

$natural order is NOT insertion order, except in a capped collection.

You never need to specify $natural order - that is what you will get by default.

Asya
P.S. but if you are using MongoDB generated ObjectId() then ordering by _id will give you insertion order to a reasonably good approximation.
Only if you consider one second as a good approximation :-( 

Asya Kamsky

unread,
Jan 13, 2014, 4:11:06 PM1/13/14
to mongodb-user
On Mon, Jan 13, 2014 at 5:31 AM, David Nellessen
<davidne...@gmail.com> wrote:
>> P.S. but if you are using MongoDB generated ObjectId() then ordering by _id will give you insertion order to a reasonably good approximation.
> Only if you consider one second as a good approximation :-(

If millisecond granularity is important, I would recommend adding a
timestamp field that's populated on insert.

Asya
P.S. don't forget to index it! :)

David Nellessen

unread,
Feb 19, 2014, 12:40:51 PM2/19/14
to mongod...@googlegroups.com
And still that wouldn't work on distributed systems :-(
Reply all
Reply to author
Forward
0 new messages