Multiprocess Queryset memory consumption increase (DEBUG=False and using iterator)

35 views
Skip to first unread message

pc

unread,
May 28, 2012, 10:42:15 AM5/28/12
to Django users
I am stumped. I am trying process a lot of data (2 million records and
up) and once I have a QuerySet, I immediately feed it to a
queryset_iterator that fetched results in chunks of 1000 rows each. I
use MySQL and the DB server is on another machine (so I don't think it
is MySQL caching).

I kick off 4 sub-processes using multiprocessing.Process, but even if
I keep it to 1, eventually, I will run out of memory.

RAM usage just steadily seems to increase. When I finish processing a
resultset, I would allocate a new resultset to the variable and I
would expect gc to get my memory back.

Any ideas?


def run(self):
.....

messages=queryset_iterator(MessageDAO.get_all_messages_for_date(now))
self.process_messages(job,messages)
messages=None
gc.collect()
...

def queryset_iterator(queryset, chunksize=1000):
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()

akaariai

unread,
May 28, 2012, 3:48:52 PM5/28/12
to Django users
Are you sure the leak is not in process_messages?

I tested something similar on PostgreSQL, and the queryset_iterator
doesn't seem to leak memory:
def queryset_iterator(queryset, chunksize=100):
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
print len(connection.queries)
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()

for i in queryset_iterator(TestModel.objects.all()):
print memory()

where memory() is from http://stackoverflow.com/questions/938733/python-total-memory-used

The result seems stable, and don't indicate any memory leak.

TestModel contains 100000 objects.

- Anssi

Pieter Claassen

unread,
May 30, 2012, 2:05:41 AM5/30/12
to django...@googlegroups.com
Anssi,

Thanks for your trouble, you were right, the problem was in my other code.

Regards,
P
Reply all
Reply to author
Forward
0 new messages