distinct() not using indices?

860 views
Skip to first unread message

MG

unread,
Aug 11, 2010, 2:51:55 PM8/11/10
to mongodb-user
Hello!

I'm now testing mongodb performance in order to determine if it can
meet our needs. A first simple test I did was populate a collection
with 150 million "documents", (let's say "flat" records with a,b,c,d,e
fields), create two indices (a=1,b=1,c=-1,d=1) and (a=1,c=-1,d=1) and
make a query

db.collection.distinct("a")

In this setup "a" has only two values, so I assumed the query will
return instantly (because it is indexed), but it took a couple of
minutes to return. And

db.collection.distinct("a").hint(...) returned with "distinct("a).hint
is not a function error. explain() produces the same error.

Question: am I correct in assuming that distinct() does not use
indices? If yes, why? If not, why the query took so long to return?

Thanks,
MG

Michael Dirolf

unread,
Aug 11, 2010, 3:06:29 PM8/11/10
to mongod...@googlegroups.com
On Wed, Aug 11, 2010 at 2:51 PM, MG <michael...@gmail.com> wrote:
> Hello!
>
> I'm now testing mongodb performance in order to determine if it can
> meet our needs. A first simple test I did was populate a collection
> with 150 million "documents", (let's say "flat" records with a,b,c,d,e
> fields), create two indices (a=1,b=1,c=-1,d=1) and (a=1,c=-1,d=1) and
> make a query
>
> db.collection.distinct("a")
>
> In this setup "a" has only two values, so I assumed the query will
> return instantly (because it is indexed), but it took a couple of
> minutes to return. And
>
> db.collection.distinct("a").hint(...) returned with "distinct("a).hint
> is not a function error. explain() produces the same error.

distinct is a database command, not a regular query - so hint and
explain don't work.

> Question: am I correct in assuming that distinct() does not use
> indices? If yes, why? If not, why the query took so long to return?

I think distinct will use an index (somebody please correct me if I'm
wrong), but even so it still needs to scan the index to get all of the
distinct values - in your case there are only two values, but there is
no way for the command to know that a priori.

MG

unread,
Aug 11, 2010, 3:25:07 PM8/11/10
to mongodb-user

> I think distinct will use an index (somebody please correct me if I'm
> wrong), but even so it still needs to scan the index to get all of the
> distinct values - in your case there are only two values, but there is
> no way for the command to know that a priori.

As there are only two values at the root, there is no need to scan the
whole index, only the root of the B-tree, which should be done
instantaneously. Anyway, I have a bigger issue (will ask in a separate
post).

Thanks,
MG

Michael Dirolf

unread,
Aug 12, 2010, 3:12:27 PM8/12/10
to mongod...@googlegroups.com

I don't think that quite works (someone please correct me if I'm
wrong) - there's still no way of knowing if the values in the root
node are the only values that appear in the rest of tree.

Eliot Horowitz

unread,
Aug 12, 2010, 3:52:20 PM8/12/10
to mongod...@googlegroups.com
It could make much better use of indices, so not just look at root,
but be much faster.
Someone want to open a jira?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Kevin McCarthy

unread,
Aug 24, 2010, 6:22:29 PM8/24/10
to mongodb-user
> Someone want to open a jira?

Sure, I will. This would be immensely useful.

http://jira.mongodb.org/browse/SERVER-1673
Reply all
Reply to author
Forward
0 new messages