garbage collection on models after use?

240 views
Skip to first unread message

Crispin Wellington

unread,
Mar 11, 2009, 4:13:53 AM3/11/09
to django...@googlegroups.com
Hello,

I have a surprisingly simple bit of code, injecting data into a database
via django's ORM. The "Hit" table has 1.5 million records. The problem
is, as the loop runs, more and more memory is consumed until my machine
starts thrashing on swap. the first 400,000 records finishes in 5
minutes. Then the next 10,000 take over 30 minutes! As far as I can
tell, when the 'hit' variable drops out of scope, its ref count should
go to 0 and it should be garbage collected. It appears it is not, as
memory usage gradually grows and grows over the loop, bringing the
machine to its knees. Here is the code snippet:

print "filling hits identities percent..."

total = Hit.objects.all().count()

for n, hit in enumerate(Hit.objects.all()):
print "%d/%d..."%(n,total),
sys.stdout.flush()

hit.identities_percent =
int(hit.deprecated_identities_percent())
hit.save()
print "done"

print "all done"

What am I missing? Surely each "hit" object should be garbage collected
at the end of the innermost block? Is django holding onto these objects
internally? Why is it consuming so much mega ram? I could break it into
LIMIT/OFFSET blocks with slice notation, but I'd rather understand why
this is misbehaving so badly.

Kind Regards

Crispin


Malcolm Tredinnick

unread,
Mar 11, 2009, 6:31:22 AM3/11/09
to django...@googlegroups.com
On Wed, 2009-03-11 at 08:13 +0000, Crispin Wellington wrote:
> Hello,
>
> I have a surprisingly simple bit of code, injecting data into a database
> via django's ORM. The "Hit" table has 1.5 million records. The problem
> is, as the loop runs, more and more memory is consumed until my machine
> starts thrashing on swap. the first 400,000 records finishes in 5
> minutes. Then the next 10,000 take over 30 minutes! As far as I can
> tell, when the 'hit' variable drops out of scope, its ref count should
> go to 0 and it should be garbage collected. It appears it is not, as
> memory usage gradually grows and grows over the loop, bringing the
> machine to its knees. Here is the code snippet:

It's a reasonable guess, based on you not ruling it out, that you're
experiencing this:

http://docs.djangoproject.com/en/dev/faq/models/#why-is-django-leaking-memory

Regards,
Malcolm


Alex Gaynor

unread,
Mar 11, 2009, 11:27:50 AM3/11/09
to django...@googlegroups.com
Try this method: http://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator

Alex

--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Crispin Wellington

unread,
Mar 11, 2009, 10:10:16 PM3/11/09
to django...@googlegroups.com
That would be it! Thanks!

Crispin
Reply all
Reply to author
Forward
0 new messages