This seems to be about using Django rather than developing Django. I
think django-users is the right place to ask rather than
django-developers.
Something I've just noticed here. One of the comments links to the
documentation for QuerySet.iterator:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator
The documentation includes the statement, "A QuerySet typically reads
all of its results and instantiates all of the corresponding objects
the first time you access it; iterator() will instead read results and
instantiate objects in discrete chunks, yielding them one at a time."
Am I mistaken, or is this not exactly correct? As I understand it,
the difference between QuerySet.__iter__ and QuerySet.iterator isn't
that the former reads and instantiates everything all at once, but
that the former will make use of the QuerySet's result cache, reading
from it when available and filling it as a side effect of iteration.
Ian
If I'm reading it right, it looks like ForNode doesn't use .iterator.
I can see why it might be useful to assume QS cache should be used--
maybe the same QS will be repeatedly iterated.
Even so, it seems like it'd be useful to have a built-in filter which
uses iter(object)?
{% for question in poll.questions.all()|iterate %}
?
Ugh.
Sorry, I'm an idiot.
{% for question in poll.questions.all.iterator %}
works just fine.
OK, last one from me.
As a 2.0 wish, I'd like to make .iterator the default behavior, and
the cached-version a special case. I realize this point is quite
debatable.
However-- is there already a place for 2.0-wishlist sort of things? I
know there's no sense discussing in for the 1.x line. So how do we
remember these sorts of issues when it comes 2.x time?
Well, it's obligatory for me first to say "wow, Blogger sucks", since
I can't actually read that post -- I just get a Blogger template with
a big white empty space where the article ought to be (looking even at
the HTML source, the content just ain't there).
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
It used to be there. I think the OP deleted the post.
I'm not sure. If you click the blog archive link for February, you
can still read the full post. It's only the direct link that's not
working, which means that the comments are (AFAIK) inaccessible.
I'd be somewhat against this, I think. It's *very* easy to reuse
querysets and inadvertently cause extra database queries. Unless you're
using really huge querysets, the memory usage is not going to kill you.
Pulling back the huge number of results already uses a bunch of memory
and that's a property of the db wrapper. There's a multiplier involved
for creating Python objects. Since we have a way to not use the caching
if somebody wants to optimise on that level and since doing that and
then doing a second database access is quite slow, we're trading memory
usage for speed and ease of use (and providing a way to improve the
former in "expert mode").
I really don't look forward to the five questions a day on django-users
about all the databae queries that are happening. I know you're only
talking about the mythical 2.0, but that doesn't change how people will
behave. I'm strongly in favour of keeping Django's primary audience as
experienced developers wanting to work faster, but we do have a large
non-experienced and even absolute beginner userbase, so simple things
that can save them a lot of time aren't to be dismissed out of hand.
Regards,
Malcolm
Point taken.
I wish there were some way to issue a warning if _result_cache is
filled but __iter__ isn't used more than once. :-/
I could imagine a warning being issued if the functionality offered by
.iterator is used more than once. That might be a happy medium-- then
I could use .iterator as my default coding practice, and be slapped
when I iterate more than once after all.
if settings.DEBUG and self.prior_iteration:
warnings.warn("dope!")
?
Possible. Requires relying on __del__ being called so that we know when
it's not being used any longer. I prefer your other option, however.
> I could imagine a warning being issued if the functionality offered by
> .iterator is used more than once. That might be a happy medium-- then
> I could use .iterator as my default coding practice, and be slapped
> when I iterate more than once after all.
>
> if settings.DEBUG and self.prior_iteration:
> warnings.warn("dope!")
This certainly sounds reasonable and doable today without any real
overhead. Go ahead and make a patch/ticket.
Regards,
Malcolm
OK.
Do you think there should be a PerformanceWarning class, or just use
the default UserWarning?
It should be blue! :-)
Slight preference for using a standard warning type at the moment.
Either UserWarning or RuntimeWarning. Only a slight preference, though.
Please yourself here.
Please don't ask me what to do about issuing multiple times, because I
was thinking about that over lunch just now and it may be fiddly.
Issuing a warning always sounds right, since it's commonly going to be a
property of a template that will cause this to happen. But if you use
that template and hit a problem, you're going to be swamped with
warnings. We really need a "once per template" option that obviously
doesn't exist in the warnings module. I might be over-thinking it,
though. "Always" is probably the right answer.
Regards,
Malcolm
Speaking as someone who has (accidentally) brought down a beefy server
by accidentally evaluating a reasonably large QuerySet, I'd say
there's not a whole lot we can do without impacting usability in
other, more vital-to-support scenarios.
When we had our nasty server-crashing query (which thankfully never
made it out of staging; that's what staging servers are for and why
you should have one to test things before you ever think about
deploying), just fetching the data from the DB -- no object
instantiation at all -- was a significant drain. Actually trying to
instantiate the model objects kicked the usage up even higher, of
course, but it was mostly an interesting exercise in watching the
memory spike move from place to place as the data worked its way from
the DB to the Python process in which Django was running.
(incidentally, the above sort of situation is one reason why a
QuerySet limits itself to a certain number of objects displayed in
__repr__; the real killer was that an error was being thrown, and as
part of the Django debug page it was trying to print the __repr__ of a
QuerySet of, IIRC, about half a million objects. A QuerySet doesn't
try to do that anymore)
>
> On Tue, 2009-02-17 at 18:57 -0600, Jeremy Dunck wrote:
>> On Tue, Feb 17, 2009 at 6:49 PM, Malcolm Tredinnick
>> <mal...@pointy-stick.com> wrote:
>> ...
>>> I'd be somewhat against this, I think. It's *very* easy to reuse
>>> querysets and inadvertently cause extra database queries.
>> ...
>>> we're trading memory
>>> usage for speed and ease of use (and providing a way to improve the
>>> former in "expert mode").
>>
>> Point taken.
>>
>> I wish there were some way to issue a warning if _result_cache is
>> filled but __iter__ isn't used more than once. :-/
>
> Possible. Requires relying on __del__ being called so that we know
> when
> it's not being used any longer. I prefer your other option, however.
Well, since __del__ messes with cyclic GC, one could also make a tool
that tracks instances of weakref(QuerySets made, created_at), and at
the end of each request, print out a list of querysets which were
never reused.
But you'd probably end up finding a lot of cases where, in the future,
a cache would help you. And you could start getting the opposite
issue, which is what the result cache alleviates: many queries.
Which begs the question, would it even be interesting to know if an QS
makes an iterator out of itself more than once? Easy to implement, hm
hm.
I think I'll play around with these two ideas next Someday.
- Ludvig