Oh, no, don't evaluate that QuerySet!

153 views

Skip to first unread message

Karen Tracey

unread,

Aug 23, 2006, 10:20:09 PM8/23/06

to Django users

I hit an interesting gotcha when passing a QuerySet in a template
context: if the template tries to access a nonexistant context
variable, the QuerySet gets evaluated. More than that, it gets repr()
called on it, which can be quite expensive for a large QuerySet. (In
fact it brings my poor machine to its knees while it tries to stuff the
more than 500,000 rows from the table into a string representation of
itself.)

It is this line in template/__init.py__ that does it (resolve_variable
[__init__.py:657]):

raise VariableDoesNotExist, "Failed lookup for key [%s] in %r" %
(bits[0], current)

bits[0] is the variable my template was trying to access (it was doing
{% if title %} and title wasn't being set by the calling code if no
title was to be output). current is the context, which happenened to
contain my potentially very large QuerySet. The template code itself
was only going to call count() on the QuerySet and then pass it on to a
template tag that would wrap it in a paginator before ever trying to
pull any values from the database. Meanwhile, the code that was
calling resolve_variable is this (resolve [__init__.py:548]):

def resolve(self, context, ignore_failures=False):
try:
obj = resolve_variable(self.var, context)
except VariableDoesNotExist:
if ignore_failures:
return None
else:
return settings.TEMPLATE_STRING_IF_INVALID

Regardless of the setting of ignore_failures, that ginormous string
resolve_variable was trying to build is going to get tossed away. I
realize it's used in other cases ('cause I think I've seen the message
when I've tried to access a non-existant variable in places where it is
absolutely needed), but I wonder if it's a good idea to be trying to
stuff the whole context into a string.

Anyway, things I've learned:

- It's better to explictly set a context variable to False vs. just not
setting it. (This perhaps should have been obvious, but it never
occurred to me that a side-effect of not setting a variable would be
the whole context getting stuffed into a string.)
- It's bad to pass around "bare" potentially large QuerySets. Wrapping
them up in a little object prevents unintentional evaluation.

Something I wonder:

Might it be a good idea to limit the size of the result of repr() on a
QuerySet? Other places where I have run into trouble here is in trying
to step through code in the debugger (Eclipse/PyDev) when I have a
potentially huge QuerySet variable -- the debugger goes out to lunch as
it tries to show me what's in it. I believe it would only show me a
small portion of it if it could (like it does for strings) but
something underlying is trying to generate the whole thing, and that's
just impossible given the limits of my machine and the size of the
table. Also if I have a bug in my code (invalid syntax, unhandled
exception), the framework tries to helpfully generate a page showing
all the local variables, etc. (since I have debug set to True in my
settings file), but it's never able to complete that task either in
this case.

The documentation
(http://www.djangoproject.com/documentation/db_api/#when-querysets-are-evaluated)
says this about repr(): "A QuerySet is evaluated when you call repr()
on it. This is for convenience in the Python interactive interpreter,
so you can immediately see your results when using the API
interactively." Would it not retain its convenience but be a bit safer
to truncate the repr() output after some limit?

Karen

Russell Keith-Magee

unread,

Aug 25, 2006, 10:06:47 PM8/25/06

to django...@googlegroups.com

On 8/24/06, Karen Tracey <gray...@bellsouth.net> wrote:

Anyway, things I've learned:

- It's better to explictly set a context variable to False vs. just not
setting it. (This perhaps should have been obvious, but it never
occurred to me that a side-effect of not setting a variable would be
the whole context getting stuffed into a string.)

This indicates that we might need to revise the exact error message that is returned here. Printing a debug message shouldn't bring the system to its knees.

IMHO, returning the name and type of current would be just as informative as the full __repr__ of context (and therefore the full contents of the queryset) for the purposes of debugging. I'm hard pressed to think of an example where that particular error message would actually help - the error indicates that you couldn't find a key in a context, or an attribute of a variable; the contents of the QuerySet would not be examined as a source. The fact that three exceptions are caught in one is also an indication that the error message could be a little smarter.

Feel free to open an enhancement request for this - You seem to have a decent handle on the problem; if you're feeling adventurous, try your hand at a fix, as well.

- It's bad to pass around "bare" potentially large QuerySets. Wrapping
them up in a little object prevents unintentional evaluation.

This is good advice. The corollary of this is that you should be putting your database operations in the view, not the template. If you are passing large QuerySets into your template, it suggests that you should be doing some pruning at the view level.

Of course, sometimes large QuerySets will be unavoidable (or are the intended result); for these occasions, the debug messages should be more accomodating.

Something I wonder:

Would it not retain its convenience but be a bit safer
to truncate the repr() output after some limit?

Less convinced about this one. If you explicitly ask for __repr__ on a query set, it makes sense that it should tell you what is contained in the set. The bigger problem is avoiding the unintended consequence that you encountered when context is printed. I think fixing the error message in the template generator is a better approach.