Is this true. that django really takes a lot of memory?

86 views
Skip to first unread message

NitinHayaran

unread,
Feb 17, 2009, 7:40:33 AM2/17/09
to Django developers
Hi All,
Today i read this article and was wondering whether django orm is
really that bad.

http://dayhacker.blogspot.com/2009/02/why-django-orm-sucks-it-takes-hell-lot.html

I think this is the right place to ask?


Jeremy Dunck

unread,
Feb 17, 2009, 7:59:04 AM2/17/09
to django-d...@googlegroups.com

This seems to be about using Django rather than developing Django. I
think django-users is the right place to ask rather than
django-developers.

Philippe Raoult

unread,
Feb 17, 2009, 7:52:05 AM2/17/09
to django-d...@googlegroups.com
Hi,

I did a reply on the post. Might be some time before it's approved.
The gist is that yes it's that bad if you're using it naively. As long
as you know what's gonna be loaded from the DB you can avoid those
cases pretty easily.

You can also check
http://github.com/dcramer/django-idmapper/tree/master which is a
rewrite of the infamous ticket 17
(http://code.djangoproject.com/ticket/17).

Regards,
Philippe

M N Islam Shihan

unread,
Feb 17, 2009, 9:40:26 AM2/17/09
to django-d...@googlegroups.com
Hi,

Please go through the comments of the blog post you are referring and you'll
understand why, how and where to use the django ORM.

Regards,
Shihan

Ian Kelly

unread,
Feb 17, 2009, 1:50:54 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 7:40 AM, M N Islam Shihan <mnis...@gmail.com> wrote:
>
> Hi,
>
> Please go through the comments of the blog post you are referring and you'll
> understand why, how and where to use the django ORM.
>
> Regards,
> Shihan

Something I've just noticed here. One of the comments links to the
documentation for QuerySet.iterator:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator

The documentation includes the statement, "A QuerySet typically reads
all of its results and instantiates all of the corresponding objects
the first time you access it; iterator() will instead read results and
instantiate objects in discrete chunks, yielding them one at a time."

Am I mistaken, or is this not exactly correct? As I understand it,
the difference between QuerySet.__iter__ and QuerySet.iterator isn't
that the former reads and instantiates everything all at once, but
that the former will make use of the QuerySet's result cache, reading
from it when available and filling it as a side effect of iteration.

Ian

Alex Gaynor

unread,
Feb 17, 2009, 3:11:14 PM2/17/09
to django-d...@googlegroups.com
Neither is completely correct ;).  Both do chunked reads from the DB(__iter__ using iterator for getting the data), however __iter__ also caches them, so if you reiterate you don't do a second db query, whereas iterator doesn't cache them.

Alex

--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Jeremy Dunck

unread,
Feb 17, 2009, 4:12:41 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 2:11 PM, Alex Gaynor <alex....@gmail.com> wrote:
...

>
> Neither is completely correct ;). Both do chunked reads from the
> DB(__iter__ using iterator for getting the data), however __iter__ also
> caches them, so if you reiterate you don't do a second db query, whereas
> iterator doesn't cache them.

If I'm reading it right, it looks like ForNode doesn't use .iterator.
I can see why it might be useful to assume QS cache should be used--
maybe the same QS will be repeatedly iterated.

Even so, it seems like it'd be useful to have a built-in filter which
uses iter(object)?

{% for question in poll.questions.all()|iterate %}
?

Jeremy Dunck

unread,
Feb 17, 2009, 4:15:34 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 3:12 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> Even so, it seems like it'd be useful to have a built-in filter which
> uses iter(object)?
>
> {% for question in poll.questions.all()|iterate %}

Ugh.

Sorry, I'm an idiot.

{% for question in poll.questions.all.iterator %}
works just fine.

Jeremy Dunck

unread,
Feb 17, 2009, 4:20:21 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 3:15 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
...

> {% for question in poll.questions.all.iterator %}
> works just fine.
>

OK, last one from me.

As a 2.0 wish, I'd like to make .iterator the default behavior, and
the cached-version a special case. I realize this point is quite
debatable.

However-- is there already a place for 2.0-wishlist sort of things? I
know there's no sense discussing in for the 1.x line. So how do we
remember these sorts of issues when it comes 2.x time?

James Bennett

unread,
Feb 17, 2009, 4:52:17 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 7:40 AM, NitinHayaran <nitinh...@gmail.com> wrote:
> Today i read this article and was wondering whether django orm is
> really that bad.
>
> http://dayhacker.blogspot.com/2009/02/why-django-orm-sucks-it-takes-hell-lot.html

Well, it's obligatory for me first to say "wow, Blogger sucks", since
I can't actually read that post -- I just get a Blogger template with
a big white empty space where the article ought to be (looking even at
the HTML source, the content just ain't there).


--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Jeremy Dunck

unread,
Feb 17, 2009, 5:00:43 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 3:52 PM, James Bennett <ubern...@gmail.com> wrote:
>
> On Tue, Feb 17, 2009 at 7:40 AM, NitinHayaran <nitinh...@gmail.com> wrote:
>> Today i read this article and was wondering whether django orm is
>> really that bad.
>>
>> http://dayhacker.blogspot.com/2009/02/why-django-orm-sucks-it-takes-hell-lot.html
>
> Well, it's obligatory for me first to say "wow, Blogger sucks", since
> I can't actually read that post -- I just get a Blogger template with
> a big white empty space where the article ought to be (looking even at
> the HTML source, the content just ain't there).

It used to be there. I think the OP deleted the post.

Ian Kelly

unread,
Feb 17, 2009, 5:34:27 PM2/17/09
to django-d...@googlegroups.com

I'm not sure. If you click the blog archive link for February, you
can still read the full post. It's only the direct link that's not
working, which means that the comments are (AFAIK) inaccessible.

Malcolm Tredinnick

unread,
Feb 17, 2009, 7:49:25 PM2/17/09
to django-d...@googlegroups.com
On Tue, 2009-02-17 at 15:20 -0600, Jeremy Dunck wrote:
> On Tue, Feb 17, 2009 at 3:15 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> ...
> > {% for question in poll.questions.all.iterator %}
> > works just fine.
> >
>
> OK, last one from me.
>
> As a 2.0 wish, I'd like to make .iterator the default behavior, and
> the cached-version a special case. I realize this point is quite
> debatable.

I'd be somewhat against this, I think. It's *very* easy to reuse
querysets and inadvertently cause extra database queries. Unless you're
using really huge querysets, the memory usage is not going to kill you.
Pulling back the huge number of results already uses a bunch of memory
and that's a property of the db wrapper. There's a multiplier involved
for creating Python objects. Since we have a way to not use the caching
if somebody wants to optimise on that level and since doing that and
then doing a second database access is quite slow, we're trading memory
usage for speed and ease of use (and providing a way to improve the
former in "expert mode").

I really don't look forward to the five questions a day on django-users
about all the databae queries that are happening. I know you're only
talking about the mythical 2.0, but that doesn't change how people will
behave. I'm strongly in favour of keeping Django's primary audience as
experienced developers wanting to work faster, but we do have a large
non-experienced and even absolute beginner userbase, so simple things
that can save them a lot of time aren't to be dismissed out of hand.

Regards,
Malcolm

Jeremy Dunck

unread,
Feb 17, 2009, 7:57:34 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 6:49 PM, Malcolm Tredinnick
<mal...@pointy-stick.com> wrote:
...

> I'd be somewhat against this, I think. It's *very* easy to reuse
> querysets and inadvertently cause extra database queries.
...

> we're trading memory
> usage for speed and ease of use (and providing a way to improve the
> former in "expert mode").

Point taken.

I wish there were some way to issue a warning if _result_cache is
filled but __iter__ isn't used more than once. :-/

I could imagine a warning being issued if the functionality offered by
.iterator is used more than once. That might be a happy medium-- then
I could use .iterator as my default coding practice, and be slapped
when I iterate more than once after all.

if settings.DEBUG and self.prior_iteration:
warnings.warn("dope!")
?

Malcolm Tredinnick

unread,
Feb 17, 2009, 8:13:00 PM2/17/09
to django-d...@googlegroups.com
On Tue, 2009-02-17 at 18:57 -0600, Jeremy Dunck wrote:
> On Tue, Feb 17, 2009 at 6:49 PM, Malcolm Tredinnick
> <mal...@pointy-stick.com> wrote:
> ...
> > I'd be somewhat against this, I think. It's *very* easy to reuse
> > querysets and inadvertently cause extra database queries.
> ...
> > we're trading memory
> > usage for speed and ease of use (and providing a way to improve the
> > former in "expert mode").
>
> Point taken.
>
> I wish there were some way to issue a warning if _result_cache is
> filled but __iter__ isn't used more than once. :-/

Possible. Requires relying on __del__ being called so that we know when
it's not being used any longer. I prefer your other option, however.

> I could imagine a warning being issued if the functionality offered by
> .iterator is used more than once. That might be a happy medium-- then
> I could use .iterator as my default coding practice, and be slapped
> when I iterate more than once after all.
>
> if settings.DEBUG and self.prior_iteration:
> warnings.warn("dope!")

This certainly sounds reasonable and doable today without any real
overhead. Go ahead and make a patch/ticket.

Regards,
Malcolm

Jeremy Dunck

unread,
Feb 17, 2009, 8:25:58 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 7:13 PM, Malcolm Tredinnick
<mal...@pointy-stick.com> wrote:
...

>> if settings.DEBUG and self.prior_iteration:
>> warnings.warn("dope!")
>
> This certainly sounds reasonable and doable today without any real
> overhead. Go ahead and make a patch/ticket.

OK.

Do you think there should be a PerformanceWarning class, or just use
the default UserWarning?

Malcolm Tredinnick

unread,
Feb 17, 2009, 9:11:49 PM2/17/09
to django-d...@googlegroups.com

It should be blue! :-)

Slight preference for using a standard warning type at the moment.
Either UserWarning or RuntimeWarning. Only a slight preference, though.
Please yourself here.

Please don't ask me what to do about issuing multiple times, because I
was thinking about that over lunch just now and it may be fiddly.
Issuing a warning always sounds right, since it's commonly going to be a
property of a template that will cause this to happen. But if you use
that template and hit a problem, you're going to be swamped with
warnings. We really need a "once per template" option that obviously
doesn't exist in the warnings module. I might be over-thinking it,
though. "Always" is probably the right answer.

Regards,
Malcolm

James Bennett

unread,
Feb 17, 2009, 9:31:30 PM2/17/09
to django-d...@googlegroups.com
On Tue, Feb 17, 2009 at 7:49 PM, Malcolm Tredinnick
<mal...@pointy-stick.com> wrote:
> I'd be somewhat against this, I think. It's *very* easy to reuse
> querysets and inadvertently cause extra database queries. Unless you're
> using really huge querysets, the memory usage is not going to kill you.
> Pulling back the huge number of results already uses a bunch of memory
> and that's a property of the db wrapper. There's a multiplier involved
> for creating Python objects.

Speaking as someone who has (accidentally) brought down a beefy server
by accidentally evaluating a reasonably large QuerySet, I'd say
there's not a whole lot we can do without impacting usability in
other, more vital-to-support scenarios.

When we had our nasty server-crashing query (which thankfully never
made it out of staging; that's what staging servers are for and why
you should have one to test things before you ever think about
deploying), just fetching the data from the DB -- no object
instantiation at all -- was a significant drain. Actually trying to
instantiate the model objects kicked the usage up even higher, of
course, but it was mostly an interesting exercise in watching the
memory spike move from place to place as the data worked its way from
the DB to the Python process in which Django was running.

(incidentally, the above sort of situation is one reason why a
QuerySet limits itself to a certain number of objects displayed in
__repr__; the real killer was that an error was being thrown, and as
part of the Django debug page it was trying to print the __repr__ of a
QuerySet of, IIRC, about half a million objects. A QuerySet doesn't
try to do that anymore)

Ludvig Ericson

unread,
Feb 19, 2009, 7:49:19 PM2/19/09
to django-d...@googlegroups.com
On Feb 18, 2009, at 02:13, Malcolm Tredinnick wrote:

>
> On Tue, 2009-02-17 at 18:57 -0600, Jeremy Dunck wrote:
>> On Tue, Feb 17, 2009 at 6:49 PM, Malcolm Tredinnick
>> <mal...@pointy-stick.com> wrote:
>> ...
>>> I'd be somewhat against this, I think. It's *very* easy to reuse
>>> querysets and inadvertently cause extra database queries.
>> ...
>>> we're trading memory
>>> usage for speed and ease of use (and providing a way to improve the
>>> former in "expert mode").
>>
>> Point taken.
>>
>> I wish there were some way to issue a warning if _result_cache is
>> filled but __iter__ isn't used more than once. :-/
>
> Possible. Requires relying on __del__ being called so that we know
> when
> it's not being used any longer. I prefer your other option, however.


Well, since __del__ messes with cyclic GC, one could also make a tool
that tracks instances of weakref(QuerySets made, created_at), and at
the end of each request, print out a list of querysets which were
never reused.

But you'd probably end up finding a lot of cases where, in the future,
a cache would help you. And you could start getting the opposite
issue, which is what the result cache alleviates: many queries.

Which begs the question, would it even be interesting to know if an QS
makes an iterator out of itself more than once? Easy to implement, hm
hm.

I think I'll play around with these two ideas next Someday.

- Ludvig

Reply all
Reply to author
Forward
0 new messages