indexing queries involving dates and boolean flags

2,594 views
Skip to first unread message

Jeff Lee

unread,
Sep 20, 2011, 12:38:21 PM9/20/11
to mongodb-user
Apologies, I've got a very noob-seeming question here.

I've got a collection that has boolean flag and a date field. In all
cases I want to sort by the date field, and sometime I care about the
value of the boolean flag, and sometimes I don't.

In other words, sometime I will do this:

db.collection.find().sort({my_date_field:1})

And sometimes I will do this:

db.collection.find({my_flag_field:false}).sort({my_date_field:1})

In this case, is it dramatically preferable to use a multi-key index,
involving both the flag and date fields, rather than a single-key
index on the date field? How can I conceive of the differences between
these two, under the hood?

Sergei Tulentsev

unread,
Sep 20, 2011, 5:26:46 PM9/20/11
to mongod...@googlegroups.com
If you don't index a field, then when you query by it, you will get a full scan.
On the other hand, index on a single boolean field is almost useless, it just takes up space and doesn't dramatically accelerate queries.

From exampes of your queries I deduced that your collection is (likely) small, so you might do fine with no indexes at all.

But this is just my (dilettantish) opinion. Set up your test env and run some benchmarks :-)


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.




--
Best regards,
Sergei Tulentsev

Bernie Hackett

unread,
Sep 20, 2011, 7:17:55 PM9/20/11
to mongodb-user
One idea would be to create a compound key index on {my_date_field: 1,
my_flag_field: 1}. That index will be used for both queries.

http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeysIndexes

Sergei Tulentsev

unread,
Sep 20, 2011, 7:51:58 PM9/20/11
to mongod...@googlegroups.com
To use compound index, you have to query only on fields that are listed first in the index definition.
So, this one won't support second query.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Bernie Hackett

unread,
Sep 20, 2011, 8:02:43 PM9/20/11
to mongodb-user
It will work:

> db.foo.insert({'my_flag_field': true, 'my_date_field': new Date()})
> db.foo.find()
{ "_id" : ObjectId("4e791e6a45c4bf56905ed349"), "my_flag_field" :
true, "my_date_field" : ISODate("2011-09-20T23:14:50.592Z") }

> db.foo.ensureIndex({'my_date_field': 1, 'my_flag_field': 1})

> db.foo.find({my_flag_field:true}).sort({my_date_field:1}).explain()["cursor"]
BtreeCursor my_date_field_1_my_flag_field_1

> db.foo.find().sort({my_date_field: 1}).explain()["cursor"]
BtreeCursor my_date_field_1_my_flag_field_1


On Sep 20, 4:51 pm, Sergei Tulentsev <sergei.tulent...@gmail.com>
wrote:

Sergei Tulentsev

unread,
Sep 20, 2011, 8:05:39 PM9/20/11
to mongod...@googlegroups.com
How curious. But the doc says otherwise.

Sergei Tulentsev

unread,
Sep 20, 2011, 8:06:57 PM9/20/11
to mongod...@googlegroups.com

If you have a compound index on multiple fields, you can use it to query on the beginning subset of fields. So if you have an index on


a,b,c

you can use it query on


a

a,b

a,b,c

And you're querying on b. How's that possible?

Bernie Hackett

unread,
Sep 20, 2011, 8:15:28 PM9/20/11
to mongodb-user
Let me clarify. The index will be used for the specified query:

> db.foo.find({my_flag_field:true}).sort({my_date_field:1}).explain()
{
"cursor" : "BtreeCursor my_date_field_1_my_flag_field_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"my_date_field" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"my_flag_field" : [
[
true,
true
]
]
}
}

It will not be used for a query on my_flag_field without sorting on
my_date_field:

> db.foo.find({my_flag_field:true}).explain()["cursor"]
BasicCursor

Sergei Tulentsev

unread,
Sep 20, 2011, 8:22:30 PM9/20/11
to mongod...@googlegroups.com
Interesting. But how efficient it is? To my current knowledge (and I might be wrong again) it will scan the whole index.

And does sorting serve as a hint to use index? I think it should be clarified in the documentation.

Sergei Tulentsev

unread,
Sep 20, 2011, 8:42:19 PM9/20/11
to mongod...@googlegroups.com
I populated test collection with 1M items and ran some tests.

> db.testcoll.find({bf : false}).sort({df : 1}).explain() 
{
"cursor" : "BtreeCursor df_1_bf_1",
"nscanned" : 613642,
"nscannedObjects" : 499594,
"n" : 499594,
"millis" : 3347,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"df" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"bf" : [
[
false,
false
]
]
}
}

Why nScanned == 613642 instead of 1000000? It seems that I don't understand something very fundamental to compound indexes :-)

Bernie Hackett

unread,
Sep 21, 2011, 5:30:00 PM9/21/11
to mongodb-user
It's not going to be super efficient, mainly because an index on a
boolean field isn't very selective. It's better than doing a table
scan, particularly for the sort.

On Sep 20, 5:22 pm, Sergei Tulentsev <sergei.tulent...@gmail.com>

vacuum regular

unread,
Apr 6, 2012, 5:21:24 AM4/6/12
to mongod...@googlegroups.com
How to effectively solve problem with this kind of queries? When you need to find Documents with certain flag and do sort on it.
> > > > > mongodb-user+unsubscribe@googlegroups.com.
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/mongodb-user?hl=en.
>
> > > > --
> > > > Best regards,
> > > > Sergei Tulentsev
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > mongodb-user+unsubscribe@googlegroups.com.

Max Schireson

unread,
Apr 6, 2012, 10:18:03 AM4/6/12
to mongod...@googlegroups.com

The most efficient way for the query with the flag is likely an index on flag, date.

An index on date, flag will be better than no index likely slower than the first. How much slower depends on how selective the field is. If you have a field called 'rare' which is usually false then rare = false sorted by date will be good performance but rare = true sorted by date will be much slower than the if the index were on rare, date.

Btw if you have a date and an in-query on a multi-valued attribute (rather than a simple boolean) this way will be much faster if you are doing a limit on your query.

Agree doc should be clearer here.

Back to your overall problem it depends on thr data volumes and distributions whether you want both indexes (one on date alone and one on flag,date). I'd test and see with your data how much faster it is. If you aren't returning massive data sets or searching on highly selective flags, it may not be worth it to have both.

-- Max

To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/c5I4fTbNrOoJ.

To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Max Schireson

unread,
Apr 6, 2012, 10:26:49 AM4/6/12
to mongod...@googlegroups.com

Surprises me too. Maybe someone smarter than me will know :)

-- Max

Max Schireson

unread,
Apr 6, 2012, 10:31:51 AM4/6/12
to mongod...@googlegroups.com

Oh wait a sec - did you have a bunch of docs sharing dates in the index? It might eg skip a whole bucket of bf: true thay share a common df.

-- Max

Reply all
Reply to author
Forward
0 new messages