find/distinct/pagination

Valentin Kuznetsov

unread,

May 23, 2011, 9:57:19 AM5/23/11

to mongodb-user

Hi,
I found one problem which seems does not have easy solution. A common
operation on a web application is pagination. Let's assume DB is
populated with non-unique values which user specify in a query. How we
can show non-unique documents for given query? Let's say web page
should show total number of results and 10 results per page. If we use
count() it does not account for non-unique values. If we use
distinct() we can't combine it with count. Moreover distinct() return
type is a list, while return type of find() is a cursor to which we
can apply skip(idx).limit(limit). The former consumes memory of
application, the later can fetch number of documents on a fly, but
both can't be combined together.

When it does not required to show uniqueness, then everything is very
trivial (python syntax)
nresults = db.col.find(spec).count()
for row in db.col.find(spec).idx(idx).limit(limit):
yield row

while if uniqueness is an issue, then we need something like
result_list = db.col.find(spec).distinct(key)
nresults = len(result_list)
for row in result_list[idx:idx+limit]:
yield row

The problem is a returned list by distinct, since it can be very large
and need to be constructed on every request call for every page user
will request. I'm not in favor of caching the list, due to concurrent
nature of application and user queries. A MR solution has its own
limitation since it is required to write explicitly a JS code, which
is not always suitable with dynamic nature of application who can use
a query specs. A requirement of unique key on a document ruin the
advantage of storing documents, which may have common key, but
different structure.

Any ideas/suggestions?

Thanks,
Valentin.

Kyle Banker

unread,

May 23, 2011, 5:24:50 PM5/23/11

to mongodb-user

If I understand correctly, you want to paginate a collection of
documents that may have a non-unique value uniquely.

At the moment, the best way to do this is to perform a standard query
and then filter for uniqueness on the application level. You could
also have a separate collection that stores the values uniquely, but,
as you've said, this isn't ideal.

I'd recommend adding a feature request at jira.mongodb.org to add
distinct capability to a cursor.

Kyle

Valentin Kuznetsov

unread,

May 23, 2011, 6:44:37 PM5/23/11

to mongodb-user

Kyle,

> If I understand correctly, you want to paginate a collection of
> documents that may have a non-unique value uniquely.

precisely !

>
> At the moment, the best way to do this is to perform a standard query
> and then filter for uniqueness on the application level. You could
> also have a separate collection that stores the values uniquely, but,
> as you've said, this isn't ideal.
>

as I pointed out I do know how to do it at app level, but it looks
ugly at least and may lead to problems with large set of data.

> I'd recommend adding a feature request at jira.mongodb.org to add
> distinct capability to a cursor.
>

Thanks, will do.

Reply all

Reply to author

Forward