cannot iterate through queryset more than once

4,010 views
Skip to first unread message

Rudy

unread,
Feb 8, 2010, 2:23:49 AM2/8/10
to MongoEngine Developers
Take a look at this example. Basically I am retrieving all the
documents for the collection. I then want to iterate through the
queryset *more than once*. But on the second attempt, it does not seem
to work. Is this correct?

>>> blogposts = BlogPost.objects()
>>> for post in blogposts:
... print post.title
(this outputs a row for each for post)

(repeat the loop)
>>> for post in blogposts:
... print post.title
(this time nothing outputs)

(attempt to access the first index of the queryset)
>>> print blogposts[0].title
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Projects\myproj\mongoengine\queryset.py", line 416, in
__getitem__
return self._document._from_son(self._cursor[key])
File "C:\Projects\myproj\pymongo\cursor.py", line 223, in
__getitem__
self.__check_okay_to_chain()
File "C:\Projects\myproj\pymongo\cursor.py", line 158, in
__check_okay_to_chain
raise InvalidOperation("cannot set options after executing query")
InvalidOperation: cannot set options after executing query
>>>

Why is it that it cannot iterate more than once... once it reaches the
end of the queryset?

Steve

unread,
Feb 8, 2010, 7:58:38 AM2/8/10
to MongoEngine Developers
This is a limitation of Python iterators, which unfortunately cannot
be reset.

If you want to access the results again then either perform another
query, or store the results of the first query in a list etc.

Harry

unread,
Feb 8, 2010, 8:04:21 AM2/8/10
to MongoEngine Developers
PyMongo provides a way of rewinding the cursor - I'll add a wrapper
for this to the QuerySet object, this should mean that the following
should work:

for title in blogposts:
print title

blogposts.rewind()
for title in blogposts:
print title

However, this will cause the query to be run multiple times - with
smaller result sets it would be preferable to cache the results in a
list, however this won't be the default behaviour as this would use up
huge amounts of memory when there are lots of results, and would mean
that QuerySets are no longer lazy.

Harry

On Feb 8, 7:23 am, Rudy <rudymenen...@gmail.com> wrote:

Rudy

unread,
Feb 8, 2010, 2:46:18 PM2/8/10
to MongoEngine Developers
I'm used to working with Django where the queryset works this way. I
don't believe the queryset (which is lazy) once executed is run
again... or is it?

For example:

from django.contrib.auth.models import User
users = User.objects.all()
for u in users:
print u.username

(repeat again and it works)
for u in users:
print u.username
(I'm not sure it's running the query again in the loop but rather
running through a cached version)


On a side note, I'm working with mongoengine and Django. I'm creating
a list of methods and/or helper tools that could be useful within
mongoengine. I can share them with you through the groups page and you
can decide whether to include them. I'll probably code a few of them
as needed and share them with you as well.

Rudy

Philip Plante

unread,
Feb 8, 2010, 2:54:00 PM2/8/10
to mongoen...@googlegroups.com
QuerySets in Django are executed each time an iterator is called on it. So 2 loops = 2 queries to db. Thats why Django recommends using the various caching methods since it will hit the cache rather than DB.

Rudy

unread,
Feb 8, 2010, 4:39:20 PM2/8/10
to MongoEngine Developers
I just ran a test and I'm only seeing 1 query to the DB when not
adding any extra methods to the queryset.

from django.contrib.auth.models import User

users = User.objects.all() # using filter still performs the
same: users = User.objects.filter(id__gt=0)


for u in users:
print u.username
for u in users:
print u.username

(The above performs 1 query to the DB)

users = User.objects.all()
for u in users:
print u.username

for u in users.order_by('username')::
print u.username


for u in users:
print u.username

(The above performs 2 queries to the DB)

users = User.objects.all()
for u in users:
print u.username

for u in users.order_by('username')::
print u.username
for u in users.order_by('username')::
print u.username


for u in users:
print u.username

(The above performs 3 queries to the DB - twice for the order_by
version)

Philip Plante

unread,
Feb 8, 2010, 4:47:41 PM2/8/10
to mongoen...@googlegroups.com
Good work Rudy.

Couldn't we just modify __iter__ of the QuerySet to rewind the cursor? I am thinking something like:

def __iter__(self):
self._cursor()

for r in self:
yield self


Wouldn't that accomplish the same thing as Django is doing? Maybe I mis-understand the __iter__ function.

Harry

unread,
Feb 22, 2010, 5:32:51 PM2/22/10
to MongoEngine Developers
Django does seem to use a result cache on QuerySet objects; I'd be
open to including this in MongoEngine, but perhaps there should be a
limit on the number of documents to cache to prevent too much memory
being eaten up. The cache limit could have a sensible default and
could be changed by the user if necessary. Alternatively, providing an
easy way of disabling the cache would work (in which case the cursor
would be automatically rewound). What are your thoughts?

Rudy

unread,
Feb 23, 2010, 4:20:29 PM2/23/10
to MongoEngine Developers
I would recommend using the same default as Django. Where it caches by
default and if you want to remove the cache then add the queryset
iterator() method. The same results can be done via switches as you
mentioned or possibly a default Document class meta setting. Either
way, it would be nice to allow for results caching. Let me know if
there's anyway I can help.

How Django deals with queryset results caching:
http://docs.djangoproject.com/en/dev/topics/db/queries/#caching-and-querysets

Info on the Queryset iterator() method:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator

Harry

unread,
Mar 8, 2010, 5:27:58 PM3/8/10
to MongoEngine Developers
For now (and probably in v0.3), rewind will be called whenever
StopIteration is raised. I'm not totally against some kind of implicit
caching, but I'd rather it was done properly than rushed for 0.3.

On Feb 23, 9:20 pm, Rudy <rudymenen...@gmail.com> wrote:
> I would recommend using the same default as Django. Where it caches by
> default and if you want to remove the cache then add the queryset
> iterator() method. The same results can be done via switches as you
> mentioned or possibly a default Document class meta setting. Either
> way, it would be nice to allow for results caching. Let me know if
> there's anyway I can help.
>

> How Django deals with queryset results caching:http://docs.djangoproject.com/en/dev/topics/db/queries/#caching-and-q...

Rudy

unread,
Mar 10, 2010, 11:08:57 PM3/10/10
to MongoEngine Developers
Thanks for the update.
Reply all
Reply to author
Forward
0 new messages