Making QuerySets composable

130 views
Skip to first unread message

Patryk Zawadzki

unread,
Jan 22, 2016, 5:25:24 AM1/22/16
to django-d...@googlegroups.com
Hi,

Currently the way QuerySets work makes it hard to reuse code. Consider an example where I have template tags to render an author and a book (that in turn displays author). I don't want my template code to execute any additional queries per row displayed so I am forced to prefetch everything upfront. Now if I want to display a list of authors and a list of books my code is vastly different:

```
authors = Author.objects.filter(visible=True)
authors_for_display = authors.annotate(num_books=Count('books'), num_awards=Count('awards')).prefetch_related('foo')
```

vs.

```
books = Book.objects.filter(visible=True, author__visible=True)
books_for_display = books.annotate(author_num_books=Count('author__books'), author_num_awards=Count('author__awards')).select_related('author').prefetch_related('author__foo')
```

Both cases need to do the exact same filtering, annotations and prefetches for the Author object but notation is so different that no code reuse can happen ("baz" vs. "bar__baz" vs. "foo__bar__baz"). The behaviour of annotations is also different as now they end up on the Book object which forces me to write some glue code. If my structure gains another level of depth I will be forced to go to even greater extents.

Now consider that I have many other objects that reference the Author and thus I have many places in code that implement the same logic but each time with a slight syntactic twist. What if a front-end engineer asks me to change the information fetched about the Author? Or perhaps I discover that half of the query time is spent fetching a megabyte of Author's extended biography in HTML that the front-end never uses outside of that Author's page. I am forced to hunt down and update all of these.

Currently we have Prefetch objects that kind of solve this problem for M2M relations but perhaps it would be nice to also be able to use QuerySets in select_related() or even in filter(). I don't think Prefetch objects are best suited for that and I personally find having to instantiate them explicitly kind of ugly. To me having another SelectRelated object is probably a no-go.

Ideas?

charettes

unread,
Jan 22, 2016, 11:44:41 AM1/22/16
to Django developers (Contributions to Django itself), pat...@room-303.com
Hi Patryk,


> Currently we have Prefetch objects that kind of solve this problem for M2M
> relations but perhaps it would be nice to also be able to use QuerySets in
> select_related() or even in filter(). I don't think Prefetch objects are best
> suited for that and I personally find having to instantiate them explicitly
> kind of ugly.

From what I understand you're not a fan of Prefetch objects but I think most
of your problems can be solved by using them and custom managers as the formers
also support foreign keys[1].

class AuthorDisplayManager(models.Manager):
    def get_queryset(self):
        queryset = super().get_queryset()
        return queryset.filter(
            visible=True,
        ).annotate(
            num_books=Count('books'),
            num_awards=Count('awards'),
        ).prefetch_related(
            'foo',
        )

class Author(models.Model):
    ...
    for_display = AuthorDisplayManager()


class BookDisplayManager(models.Manager):
    def get_queryset(self):
        queryset = super().get_queryset()
        return queryset.filter(
            visible=True,
        ).prefetch_related(
            Prefetch(
                'author',
                Author.for_display.all(),
            )
        )


class Book(models.Model):
    ...
    for_display = BookDisplayManager()

Simon

[1] https://code.djangoproject.com/ticket/17003

Patryk Zawadzki

unread,
Jan 22, 2016, 1:57:21 PM1/22/16
to charettes, Django developers (Contributions to Django itself)
pt., 22.01.2016 o 17:44 użytkownik charettes <chare...@gmail.com> napisał:
Hi Patryk,


> Currently we have Prefetch objects that kind of solve this problem for M2M
> relations but perhaps it would be nice to also be able to use QuerySets in
> select_related() or even in filter(). I don't think Prefetch objects are best
> suited for that and I personally find having to instantiate them explicitly
> kind of ugly.

From what I understand you're not a fan of Prefetch objects but I think most
of your problems can be solved by using them and custom managers as the formers
also support foreign keys[1].

Funny, I was aware of the featurebut always assumed prefetch_related would always execute an extra query instead of becoming an inner join.

Anssi Kääriäinen

unread,
Jan 25, 2016, 1:44:03 AM1/25/16
to django-d...@googlegroups.com
We have room for some improvement here. It should be possible to add
subqueries to the current query. This would be especially useful for
adding aggregates.

For aggregates we need this feature because the current API doesn't
work for anything but simple cases. For example
Book.objects.annotate(authors_cnt=Count('authors'),
suppliers_cnt=Count('suppliers')) will currently produce two joins,
one to authors and one to suppliers. Thus, if a book has three authors
and two suppliers, it will have both counts as six. Solving this
problem (taking in account .filter() calls, too) is extremely complex.

If the user could instead tell the ORM to add the counts in
subqueries, the ORM would have much easier job of generating a correct
query.

For the "add single aggregate case" it might be sufficient to add a
strategy flag to aggregates. So,
Book.objects.annotate(authors_cnt=Count('authors',
strategy=models.SUBQUERY)) is a possible API.

If you want multiple aggregates in a single subquery we likely need
something more explicit. Maybe something like
Book.objects.add_related(
authors_subq=models.AggregateSubquery(
Author.objects.annotate(cnt=Count('pk'), avg_age=Avg('age')))
This would add authors_subq__cnt and authors_subq__avg_age
automatically to the outer query as annotations.

The add_related() method would come handy in other cases, too.
Consider joining a translation for currently active language:
Book.objects.add_related(
current_translation=models.ModelRelation('translations',
only=Q(lang=get_current_language()))

You could also attach raw SQL in the same way:
Room.objects.add_related(
current_reservation=models.RawSQLRelation(
"select * from reservations where start_date <=
now()::date and end_date >= now()::date",
on_condition=Q(pk=F('current_reservation__room_id')),
model=Reservation)
))

We could later on add support for more esoteric concepts like lateral
joins or materializing the subquery into a temporary table.

I believe the current API for aggregates and multivalued relations has
come to a dead end. Simpler queries work well enough with the current
API. For more complex queries the API isn't easy to understand, and it
is extremely hard to translate complex ORM queries to SQL. Consider
Book.objects.filter(
Q(authors__height__gte=180), ~Q(authors__age__gte=50)
).annotate(author_cnt=Count('authors'))

This query is *extremely* hard to translate to SQL. Just the first
filter case is something nasty, the Q-object doesn't mean "keep all
rows that has author's age less than 50 and height greater than 180".
It means "a book must have at least one author taller than 180. *All*
of such authors for the book must have age less than 50 or the age
must be null". Still, we need to generate a count over such authors
for the book. And this example doesn't even reveal the full complexity
of the problem.

In summary, I vote +1 for the ability to compose querysets. The first
issue to tackle is to come up with a good extensible API for this.

- Anssi
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/ebad6483-f90c-41f9-8710-65b328320cdd%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages