Questions on Django queryset iterator - wrt select_related and prefetch_related and how it works

401 views
Skip to first unread message

Web Architect

unread,
Mar 17, 2017, 1:53:09 AM3/17/17
to Django users
Hi,

Could someone please let me know what the implications of Django queryset iterator on select_related and prefetch_related? 

Also, I am still not quite clear on the concept of iterator which I understand returns a Generator. Whenever a for loop is run on the Generator, the DB is queried for each element in the for loop - if my understanding is correct. The result of the Query is not stored in the memory. So, for some model A,

qs = A.objects.all() which probably does 'Select "all columns/fields'" from A in some order". This would probably fetch the results in one go. I am not sure how the iterator() changes this. 

BTW I observed that the iterator doesn't work like a typical Generator. Repeated call with next() on the Generator produces the same value. 

Would appreciate if someone could explain the above or provide any reference.

Thanks.

Shawn Milochik

unread,
Mar 17, 2017, 1:57:52 AM3/17/17
to django...@googlegroups.com
I think the benefit of using the iterator is best explained by an example:

Without iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, all the items are now in your local scope, using up RAM. If, after the loop, you should want to loop through the data again, you can. Upside: Can re-use the data. Downside: memory usage.

With iterator:

You loop through the queryset, using each item for  whatever you're doing. As you do this, read items are garbage-collected. If you want to loop through the data again, you'll have to hit the database again. Upside: Memory usage.




Web Architect

unread,
Mar 17, 2017, 5:47:57 AM3/17/17
to Django users, Sh...@milochik.com
Hi,

Thanks for your response. But I have observed the following:

Without Iterator: It takes a bit of a time before the for loop is executed and also the CPU spikes up during that period and so does the Memory - which implies the DB is accessed to fetch all the results.

With iterator: The for loop execution starts immediately and the memory usage is also low. This probably implies that not all the results are fetched with a single query. 

Based on what you have mentioned, I am not sure how to understand the above behaviour. 

Thanks,

knbk

unread,
Mar 17, 2017, 11:20:07 AM3/17/17
to Django users, Sh...@milochik.com
Django uses client-side cursors. 1.11, which is currently in beta, switches to server-side cursors on PostgreSQL [1], but other databases still use client-side cursors. When a client-side cursor executes a query, it loads all results in memory before the results can be accessed.

But that's just the raw results. Without iterator(), these raw results are immediately converted to model instances, and related objects that have been loaded with select_related() or prefetch_related are converted to model instances as well. This can cause a spike in CPU and memory usage. When using iterator(), most of the CPU usage is in the database itself, and the raw results use quite a bit less memory than model instances. The CPU resources needed to convert the raw results to model instances is spread out in the loop iterations, and the model instances can in most situations be garbage-collected after the iteration moves on the next instance. 
Reply all
Reply to author
Forward
0 new messages