Compound index order question

186 views
Skip to first unread message

Luke Ehresman

unread,
Apr 28, 2011, 10:45:42 AM4/28/11
to mongod...@googlegroups.com
I have a collection of 50 million documents.  Every query against this collection will contain two parameters:  data_type and datetime.  I am not doing any sorting on the results of the queries.

data_type is a somewhat small list of discrete values (about 100 values, and each of the 50 million documents as one data type).

datetime is an integer in the format YYYYMMDDHH (i.e. 2011042606).

A common query is to find all the documents for 4 data types within the last 45 minutes.

I want a compound index on these two values.  Is there any difference between a compound index of {data_type:1, datetime:1}  and {datetime:1, data_type:1} in this case?  Remember, every query against this collection has both of those parameters, and I will not be sorting the results.

Thanks,
Luke

--
Luke Ehresman, lu...@ehresman.org
CopperEgg - http://copperegg.com
Tebros Systems - http://tebros.com

Scott Hernandez

unread,
Apr 28, 2011, 11:00:45 AM4/28/11
to mongod...@googlegroups.com
Yes, in general, you want the range argument to be last in the index.

This is the one you want: {data_type:1, datetime:1}


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Luke Ehresman

unread,
Apr 28, 2011, 11:05:41 AM4/28/11
to mongod...@googlegroups.com
Mind explaining that a bit?  Why is that the case?

Thanks,
Luke

Scott Hernandez

unread,
Apr 28, 2011, 11:36:00 AM4/28/11
to mongod...@googlegroups.com
Sure :)

It is because mongodb indexes are btrees and the fields are essentially concatenated in the index entry. Each branch of the tree will be a greater or lesser value from parent node. This will lead to skipping and generally worse performance if the distinct values are after the range value in the index spec.

It all depends on your actual cardinality of data between those two fields but that is a general rule when traversing btree indexes with a mix of equality and range restrictions.

If you aren't familiar with the btree data structure, or performance characteristics, then you can read up a little here: http://en.wikipedia.org/wiki/Btree



However, there are some reasons you might want to use the other order; like if you know that the number of data_type values will be small for any date range and you want to keep lots of historic data. If you put the timestamp first then much of the index can be ignored, and never used, if you always look for a small date range on either ends of the index (oldest/newest).
Reply all
Reply to author
Forward
0 new messages