Use an aggregation function's filter with aggregate of a value from an annotation with a subquery

877 views
Skip to first unread message

Daniel Gilge

unread,
Dec 5, 2018, 7:16:30 PM12/5/18
to Django users
Hi,

I think I've found a bug. But I'm not sure and I never opened a ticket for Django before. So, I first wanted to ask here.

I'm using Django 2.0 and this doesn't work:

subquery = Subquery(
    Vote.objects
    .filter(comment=OuterRef('pk'))
    .values('value')[:1]
)

Comment.objects.annotate(vote=subquery).aggregate(
    count=Count('vote', filter=Q(vote__gte=4)),
)

It results in a quite useless AssertionError:

django/db/models/expressions.py

    168 
    169     def set_source_expressions(self, exprs):
--> 170         assert len(exprs) == 0
    171 
    172     def _parse_expressions(self, *expressions):

AssertionError:


Vars:
exprs
[Ref(__col8, Col(U0, myapp.Vote.comment))]
self
<django.db.models.expressions.Subquery object at 0x1080077b8>


It probably doesn't make sense because I simplified it. Why I'm using subqueries is that I have several sums involved in the query:

subquery = Subquery(
    Vote.objects
    .filter(comment=OuterRef('pk'))
    .values('comment_id')
    .annotate(sum=Sum('value', filter=Q(**filter_fields)))
    .values('sum')[:1]
)

However, what I had to remove is a filter statement and then it works:

Comment.objects.annotate(vote=subquery).aggregate(
    count=Count('vote'),
)

Any advice appreciated!

Matthew Pava

unread,
Dec 5, 2018, 9:48:18 PM12/5/18
to django...@googlegroups.com

Though I can’t address the issue of whether it is a bug, there is a note in the documentation:

 

https://docs.djangoproject.com/en/2.1/topics/db/aggregation/#filtering-on-annotations

“Avoid using the filter argument…”

 

You may want to consider conditional expressions (Case … When) to achieve your goal.

https://docs.djangoproject.com/en/2.1/ref/models/conditional-expressions/

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/d405cbf2-0a57-4639-aac1-4376a09099b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Charette

unread,
Dec 6, 2018, 12:37:36 AM12/6/18
to Django users
Hello Daniel,

It's hard to tell was causes the exception without digging a bit more but
based on your mention of multiple sums I assume you are trying to work
around the cross join of multiple table annotations?

I understand you simplified your query but from what I can see right now
it can be expressed as

commens_count = Comment.objects.annotate(
    votes_count=Count('votes', filter=votes_filter),
).filter(
    votes_count__gte=4
).count()

That should result in

SELECT COUNT(*) FROM (
    SELECT 1
    FROM comment
    LEFT OUTER JOIN vote ON (vote.comment_id = comment.id)
    GROUP BY comment.id
    HAVING COUNT(*) FILTER (WHERE ...) > 4
)

There's tickets tracking adding subquery support to aggregate functions but
using subqueries doesn't seem to be necessary here?

Cheers,
Simon

Daniel Gilge

unread,
Dec 6, 2018, 10:32:21 AM12/6/18
to django...@googlegroups.com
Hi everyone,

@Matthew and @Simon Thanks for your answers!

I assume you are trying to work
around the cross join of multiple table annotations?

This is correct.

There's tickets tracking adding subquery support to aggregate functions but
using subqueries doesn't seem to be necessary here?

Does this mean I can’t use subqueries in aggregate right now? (Doesn’t seem to be the case because when I replace the filter in my code below with the commented filter, the query produces expected results so far.) I had a look at following ticket but it seems to be another case:

Add support for aggregation through subqueries

You may want to consider conditional expressions (Case … When) to achieve your goal.

These functions attracted my attention when I was looking for a solution for the described exception. I want to try to adjust my query and use them. (But I thought that what I found is a bug. Therefore, I decided to write this e-mail.) I will see if conditional expressions help.

I actually didn’t want to bother you with the current code of my complete query but I should probably send it to you to understand why I use subqueries and filters that way. However, I’m not experienced with votes and such complex queries and there might be a better solution. But I didn’t want to ask you to take this work.

This subquery is also used in other queries. Therefore, I placed it in a function:

def get_vote_subquery(role, user=None):
    filter_fields = {'role': role}
    if user:
        filter_fields.update({'user': user})
    subquery = Subquery(
        models.Vote.objects
        .filter(segment=OuterRef('pk'))
        .values('segment_id')
        .annotate(sum=Sum('value', filter=Q(**filter_fields)))
        .values('sum')[:1]
    )
    return subquery

The following code is slightly simplified. I didn’t manage to get the necessary data with one query so far. (I’ll use Redis in the future to get the votes. Therefore, it doesn’t have to be the most effective query for now).

    @property
    def table_of_contents(self):
        if self._table_of_contents:
            return self._table_of_contents

        # Objects
        headings = list(
            models.Segment.objects
            .filter(tag__in=IMPORTANT_HEADINGS)
            .order_by('position')
            .values(
                'pk',
                'position',
                'content',
            )
        )

        # Translation state (aggregation because annotation not possible)
        aggregates = {}
        required = REQUIRED_APPROVALS[self.language]
        for i, h in enumerate(headings, start=1):
            filters = {'position__gte': h['position']}
            try:
                filters['position__lt'] = headings[i]['position']
            except IndexError:
                # The last item
                pass
            aggregates['translation_done_{}'.format(i)] = Count(
                'progress',  # this is a field of the model
                # filter=Q(**filters),
                filter=Q(**filters) & (
                    Q(progress__gte=TRANSLATION_DONE)
                    | Q(reviewers_vote__gte=1)
                ),
            )
            aggregates['review_done_{}'.format(i)] = Count(
                'reviewers_vote',
                # filter=Q(**filters),
                filter=Q(reviewers_vote__gte=required['reviewer'], **filters),
            )
            aggregates['trustee_done_{}'.format(i)] = Count(
                'trustees_vote',
                # filter=Q(**filters),
                filter=Q(trustees_vote__gte=required['trustee'], **filters),
            )
        aggregates = (
            models.Segment.objects
            .annotate(
                reviewers_vote=get_vote_subquery('reviewer'),
                trustees_vote=get_vote_subquery('trustee'),
            )
            .aggregate(**aggregates)
        )

        # Assign some values
        for i, h in enumerate(headings, start=1):
            try:
                h['segments'] = headings[i]['position'] - h['position']
            except IndexError:
                # The last item
                h['segments'] = self.segments_count - h['position'] + 1
            h['translation_done'] = aggregates['translation_done_{}'.format(i)]
            h['review_done'] = aggregates['review_done_{}'.format(i)]
            h['trustee_done'] = aggregates['trustee_done_{}'.format(i)]
        self._table_of_contents = headings
        return self._table_of_contents


Cheers,

Daniel

-- 
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/JfZguPo-G_g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Daniel Gilge

unread,
Dec 16, 2018, 1:55:20 PM12/16/18
to django...@googlegroups.com
Hi,

I’ve spent several hours to find a solution but I couldn’t: Case … When doesn’t seem to help. I can’t use Window because I have to do a calculation over rows not included in the queryset. I don’t see another solution than using raw SQL (or so) or fixing an (assumed) Django bug.

Does somebody know a solution? (I’m directly asking because it’s recommended in the docs to ask in this group: https://docs.djangoproject.com/en/2.1/topics/db/sql/#performing-raw-sql-queries)

Anyway, thank you to every developer of the Django ORM with these powerful features you find in Django 2.0!!

Daniel
Reply all
Reply to author
Forward
0 new messages