Can the new `Prefetch` solve my problem?

156 views
Skip to first unread message

cool-RR

unread,
Feb 25, 2015, 3:05:50 PM2/25/15
to django...@googlegroups.com
Hi guys,

I'm trying to solve a problem using the new `Prefetch` but I can't figure out how to use it. 

I have these models:

    class Desk(django.db.models.Model):
        pass
    
    class Chair(django.db.models.Model):
        desk = django.db.models.Foreignkey('Desk', related_name='chair',)
        nearby_desks = django.db.models.ManyToManyField(
            'Desk',
            blank=True,
        )

I want to get a queryset for `Desk`, but it should also include a prefetched attribute `favorite_or_nearby_chairs`, whose value should be equal to: 

    Chair.objects.filter(
        (django.db.models.Q(nearby_desks=desk) | django.db.models.Q(desk=desk)),
        some_other_lookup=whatever,
    )

Is this possible with `Prefetch`? I couldn't figure out how to use the arguments.


Thanks,
Ram.

James Schneider

unread,
Feb 25, 2015, 4:19:06 PM2/25/15
to django...@googlegroups.com
I assume that you are talking about the select_related() and prefetch_related() queryset methods?

https://docs.djangoproject.com/en/1.7/ref/models/querysets/#select-related

Both of those sections have excellent examples, and detail what the differences are (primarily joins vs. separate queries, respectively).

For better help, you'll need to go into more detail about the queries you are trying to make, what you've tried (with code examples if possible), and the results/errors you are seeing.

In general, I would try to get an initial queryset working and gathering the correct results first before looking at optimizations such as select_related(). Any sort of pre-fetching will only confuse the situation if the base queryset is incorrect.

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/46d9fdb7-c008-4496-acda-ac7cb30b4a89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ram Rachum

unread,
Feb 25, 2015, 4:29:13 PM2/25/15
to django-users, jrschn...@gmail.com
Hi James,

I've read the docs but I still couldn't figure it out. My queryset works great in production, I'm trying to optimize it because our pageloads are too slow. I know how to use querysets in Django pretty well, I just don't know how to use `Prefetch`. 

Can you give me the solution for the simplified example I gave? This might help me figure out what I'm not understanding. One thing that might be unclear with the example I gave, is that I meant I want to get a queryset for `Desk` where every desk has an attribute names `favorite_or_nearby_chairs` which contains the queryset of chairs that I desrcibed, prefetched.


Thanks,
Ram.

--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/EuPduHjSNos/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.

Ram Rachum

unread,
Feb 25, 2015, 4:31:42 PM2/25/15
to django-users

James Schneider

unread,
Feb 25, 2015, 11:02:15 PM2/25/15
to Ram Rachum, django-users

Well, the Desk model you provided is blank, but I'll believe you that there's a favorite_or_nearby_chairs attribute. ;-)

Should be relatively simple. Just add a .select_related('nearby_desks') to your existing query and that should pull in the associated Desk object in a single query. You can also substitute in prefetch_related(), although you'll still have two queries at that point.

If you are trying to profile your site, I would recommend the Django-debug-toolbar. That should tell you whether or not that query set is the culprit.

-James

cool-RR

unread,
Feb 26, 2015, 5:28:06 AM2/26/15
to django...@googlegroups.com, r...@rachum.com, jrschn...@gmail.com
James, you misunderstood me.

There isn't supposed to be a `favorite_or_nearby_chairs` attribute. That's the new attribute I want the prefetching to add to the `Desk` queryset that I need. Also, I don't understand why you'd tell me to add a `.select_related('nearby_desks')` to my query. Are you talking about the query that starts with `Chair.objects`? I'm not looking to get a `Chair` queryset. I'm looking to get a `Desk` queryset, which has a prefetched attribute `favorite_or_nearby_chairs` which contains the `Chair` queryset I wrote down.


Thanks,
Ram.

aRkadeFR

unread,
Feb 26, 2015, 6:50:33 AM2/26/15
to django...@googlegroups.com
I may not have completely understand your problem, but
why not prefetching all the chairs? and then with the (new)
attribute favorite_or_nearby_chairs loading only the favorite
or nearby one?

like:
@property
def favorite_or_nearby_chairs(self):
    for chair in self.chair_set.all():
          #filter...
          ans += ...
    return ans

It will only hit the DB once thanks to the first join of desk
<-> chair.

Ram Rachum

unread,
Feb 26, 2015, 6:53:41 AM2/26/15
to django-users, con...@arkade.info
There may be a big number of chairs, and I don't want all the chairs prefetched. I want to have the database filter them according to the queryset I specified in a single call, I don't want to filter them in Python or make a new call to filter them.

Thanks,
Ram.

aRkadeFR

unread,
Feb 26, 2015, 7:20:48 AM2/26/15
to django...@googlegroups.com
got it, so you want to prefetch but not all chairs.

I will def follow this thread to see the possibilities of Prefetch :)

James Schneider

unread,
Feb 26, 2015, 1:18:21 PM2/26/15
to django...@googlegroups.com
Yep, looks like I misunderstood.

So, you want to have something like this pseudo code:

desks = Desk.objects.filter(<some filter>)
for desk in desks:
    print desk.favorite_or_nearby_chairs

And have favorite_or_nearby_chairs be pre-populated with your Chair queryset mentioned earlier?

Due to the nature of the custom Chair queryset, I doubt you can do any sort of pre-fetching that would reduce the number of queries. You can probably simulate the effect of prefetch_related() though by overriding the __init__() method of your Desk model, and having the Desk model populate favorite_or_nearby_chairs whenever a Desk object is created. 

class Desk(models.Model)
    def __init__(self):
        # using list() to force the queryset to be evaluated
        self.favorite_or_nearby_chairs = list(<custom Chair queryset>)


However, that probably doesn't buy you much since you are still doing an extra query for every Desk you pull from your original query.

Funny enough, I was googling around for an answer here, and stumbled across this:




James Schneider

unread,
Feb 26, 2015, 2:03:10 PM2/26/15
to django...@googlegroups.com
Whoops, accidentally sent that last one too early, here's the continuation:

However, that probably doesn't buy you much since you are still doing an extra query for every Desk you pull from your original query.

Funny enough, I was googling around for an answer here, and stumbled across this:


which I think is what you were referring to initially in your OP. I wasn't even aware of its existence. Prefetch() is a helper class for prefetch_related(). Taking a quick glance through the source code, I would imagine that it probably won't help you much, since the functionality of that class only controls the action of prefetch_related().


Zooming out a bit, the crux of your problem is this: An attribute you wish to populate is not an FK or M2M field, it is an entirely separate Queryset with some moderately complex filters. The built-in ORM functionality for pre-loading via prefetch/select_related() is expecting a FK or M2M relationship and can't  use another queryset AFAIK. The high-level functionality of prefetch_related() is probably close to what you want, which is to run a single second query to collect all of the favorite_or_nearby_chairs for all of the Desks in your original query, and then glue everything together behind the scenes in Python to make the desk_obj.favorite_or_nearby_chairs available seamlessly.

I would then wonder if there is another way to organize this data to make it easier to work with? How about adding a 'favorite_chairs' field to the Desk model that has an M2M to Chair? I would also update your 'nearby_desks' model field to use 'nearby_chairs' as the related_field. 

Then you could do something like the following:

desks = Desk.objects.filter(<filter here>).select_related('nearby_chairs', 'favorite_chairs')

Then, you can modify your model with a property that will return the concatenation of nearby_chairs and favorite_chairs:

class Desk(models.Model):
    @property
    def favorite_or_nearby_chairs(self):
        return self.nearby_chairs.all() + self.favorite_chairs

I don't believe this will spawn another query, since select_related() will have already run and have the results cached. You may also want to consider moving 'nearby_desks' out of Chair and renaming it to 'nearby_chairs' in Desk, and using a related_name of 'nearby_desks' instead. Then you can remove the .all() from the property definition from above and it definitely won't spawn a query. Obviously you'll need to create other processes that will populate desk.favorite_chairs, which may or may not be feasible.

TL;DR; I don't believe you can pre-fetch anything because of the extra SQL logic needed to calculate the favorite_or_nearby_chairs attribute. It might be possible via raw SQL though. Reformatting your data models may lead to an easier time since you can then take advantage of the some of the optimizations Django offers.

I'm slightly out in right field on this one, so YMMV, but taking a hard look at the current model design would be where I would start to try and eliminate the need for that custom queryset.

Again, the django-debug-toolbar is your friend in these cases, but obviously a high number of even relatively fast queries can have a detrimental effect on your load times. Also ensure that the fields you are using to filter contain indexes, if appropriate/available.

-James



James Schneider

unread,
Feb 26, 2015, 2:27:52 PM2/26/15
to django...@googlegroups.com
Heh, I just realized that aRkadeFR had replied with a similar idea to use a property. At least I know I'm not too far off on my thinking. :-D

-James

aRkadeFR

unread,
Feb 27, 2015, 3:54:02 AM2/27/15
to django...@googlegroups.com
Yeah, but from my experience, your example through a new
query. You have to (and please correct me if I'm wrong or
there are other ways) use the self.chair_set.all() in order
to not through a new query when you have prefetched the
chairs.

To be simple and answer the problem: my only solution I
have in mind is to prefetched all the objects (chairs), and
then filter it in python with properties like I said. But as you
said it will load too much objects...

Still watching the thread cause I have couple of problems
like this one :)

aRkadeFR

James Schneider

unread,
Feb 27, 2015, 5:39:15 AM2/27/15
to django...@googlegroups.com


On Feb 27, 2015 12:51 AM, "aRkadeFR" <con...@arkade.info> wrote:
>
> Yeah, but from my experience, your example through a new
> query. You have to (and please correct me if I'm wrong or
> there are other ways) use the self.chair_set.all() in order
> to not through a new query when you have prefetched the
> chairs.

AFAIK, self.chair_set.all() would always spawn a second query unless something like select_related() had been used previously to cache that query result

I provided several examples, some of which spawn a second query (and may be appropriate, I can't provide an affirmative answer for the OP without knowing how many queries are being run per page load and average query time, etc.). You'll need to be more specific.

I think I accidentally referred to some of the model fields as if they were FK's, but they probably should have had .all() after all of the references since everything was an M2M relationship.

>
> To be simple and answer the problem: my only solution I
> have in mind is to prefetched all the objects (chairs), and
> then filter it in python with properties like I said. But as you
> said it will load too much objects...

I think there was some miscommunication here. The OP stated that loading all of the chairs was infeasible, and I would tend to agree.

My only clarification would be that loading all of the chairs via something like Chair.objects.all() would be a bad idea, since you have no idea how many chairs you may have in the entire database.

Loading >10k Chair objects into memory and having Django coerce those into model objects, and then performing post processing in some custom app code to filter that list back down to something reasonable will give you a bad time, every time. Your users will be unhappy with pages that take seconds to load, and your server processes will be unhappy assuming you have plenty of RAM/CPU to handle such a request. Now multiply that load by X number of users...and you're quickly hitting CPU and process limits for RAM allocation.

However, "loading all of the chairs" via a M2M or FK relationship such as desk_obj.nearby_chairs.all() (assuming the OP reconfigured the model fields as I suggested before, or even using the existing Chair M2M to Desk) would likely be perfectly valid and would probably be a small subset of the total chairs in the system (or maybe None or all of them).

>
> Still watching the thread cause I have couple of problems
> like this one :)
>

I don't necessarily run in to this specific problem, so I'm not sure how much more I can contribute. A lot of the model changes I suggested were educated guesses and may not be valid at all given other requirements or design considerations in the project.

> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/54F0300A.9000204%40arkade.info.

James Schneider

unread,
Mar 1, 2015, 7:11:54 PM3/1/15
to django...@googlegroups.com

Ask and you shall receive (eventually). Another post in this list has an example using Prefetch(), perhaps that will help you:

https://groups.google.com/d/msgid/django-users/3daddb38-3260-4f7d-9559-7d0d3f17b59e%40googlegroups.com?utm_medium=email&utm_source=footer

Simon Charette

unread,
Mar 3, 2015, 12:59:37 AM3/3/15
to django...@googlegroups.com
Hi cool-RR,

The following should do:

filtered_chairs = Chair.objects.filter(some_other_lookup=whatever)
desks = Desk.objects.prefetch_related(
    PrefetchRelated('chairs', filtered_chairs, to_attr='filered_chairs'),
    PrefetchRelated('nearby_chairs', filtered_chairs, to_attr='filtered_nearby_chairs'),
)

from itertools import chain
for desk in desks:
    for chair in chain(desk.filtered_chairs, desk.filtered_nearby_chairs):
        # ....

It will issue only three queries independently of the number of Chair or Desks you have.

Simon

aRkadeFR

unread,
Mar 3, 2015, 3:26:22 AM3/3/15
to django...@googlegroups.com
Thanks a lot for your answer :)

I will definitely use it from now on.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.

Tom Evans

unread,
Mar 3, 2015, 6:24:51 AM3/3/15
to django...@googlegroups.com
On Tue, Mar 3, 2015 at 5:59 AM, Simon Charette <chare...@gmail.com> wrote:
> Hi cool-RR,
>
> The following should do:
>
> filtered_chairs = Chair.objects.filter(some_other_lookup=whatever)
> desks = Desk.objects.prefetch_related(
> PrefetchRelated('chairs', filtered_chairs, to_attr='filered_chairs'),
> PrefetchRelated('nearby_chairs', filtered_chairs,
> to_attr='filtered_nearby_chairs'),
> )
>
> from itertools import chain
> for desk in desks:
> for chair in chain(desk.filtered_chairs, desk.filtered_nearby_chairs):
> # ....
>
> It will issue only three queries independently of the number of Chair or
> Desks you have.
>
> Simon

Does filtered_chairs get executed 'as-is', or is it then filtered by
the query to which it is attached? IE:

filtered_chairs = Chair.objects.filter(some_other_lookup=whatever)
desks = Desk.objects.filter(pk__range=(1,1000)).prefetch_related(
PrefetchRelated('chairs', filtered_chairs, to_attr='filered_chairs'),
PrefetchRelated('nearby_chairs', filtered_chairs,
to_attr='filtered_nearby_chairs'),
)

is filtered_chairs filtered by the 'desk__pk__range'?

Cheers

Tom
Reply all
Reply to author
Forward
0 new messages