sorted skip queries very slow

441 views
Skip to first unread message

David Fooks

unread,
Apr 6, 2011, 1:33:26 PM4/6/11
to mongodb-user
Hi,

I can't find any way to quickly produce page results (after a few
pages in). For example say I have a scoreboard and I want to skip too
the 10,000th result and see the 25 scores around them then I would do
the following

db.leaderboard.find().sort({'score': -1}).skip(9975).limit(50)

(with an index on score)
However, if we look at the output of the explain:

db.leaderboard.find().sort({'score':
-1}).skip(9975).limit(50).explain()

{
"cursor" : "BtreeCursor score_-1",
"nscanned" : 10025,
"nscannedObjects" : 10025,
"n" : 50,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"score" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
}

We can see that it is scanning all 9,975 documents upto the skip
start! Is there anyway to avoid this?

Keeping a record of the number of documents under each child of the
index b-tree would solve this (it would also improve the performance
of count). This would cause upserts/deletes to be slower (don't know
whether to increment/decrement the counts until we find the document)
but perhaps a special index type could allow the option for fast skip
operations rather than fast upserts.

Thanks,

David Fooks

Scott Hernandez

unread,
Apr 6, 2011, 1:41:17 PM4/6/11
to mongod...@googlegroups.com
That is the problem with using skip this way. It is better to use a
range query, if you know the starting value (in the index).

Are you doing this for pagination? If so, a range based pagination
scheme is much better for this exact reason -- the index must be
scanned to the end of the skip/offset.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

David Fooks

unread,
Apr 6, 2011, 2:02:53 PM4/6/11
to mongodb-user
Unfortunately, there is no way of knowing where these range queries
are for the score. I would need to know the spread of the scores to
even guess. This would require gathering heuristics of the spread of
scores and isn't really preferable.
I should also note that I am finding the skip number or "rank" - the
number of positions behind the player with the highest score, by first
finding their score and then counting all players with a score higher
than that score:

rank = leaderboard.find({'score': {'$gt': score}}, []).count()

If there was a better way to find the scores around a player then that
would solve this problem.

Thanks for your reply!

David Fooks
Reply all
Reply to author
Forward
0 new messages