more than one field for aggregates? (ArrayAgg, StringAgg)

5,143 views
Skip to first unread message

Tomasz Nowak

unread,
Feb 29, 2016, 7:31:15 PM2/29/16
to Django users
Hi!

Module django.contrib.postgres.aggregates provides classes for ARRAY_AGG and STRING_AGG PostgreSQL aggregates.

Is it possible to provide more than one field to these aggregates?
Like in SQL, where you can provide more fields ("row" is optional):

ARRAY_AGG(row(table.field1, table.field2))

Best wishes,
Tomasz

Simon Charette

unread,
Feb 29, 2016, 9:52:58 PM2/29/16
to Django users
Hi Tomasz,

I suppose you could implement `ROW` as a custom expression[1] and use it in
your `ArrayAgg` aggregate as follow:

from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Expression

class Row(Expression):
    template = 'ROW(%(expressions)s)'

    # ...

MyModel.objects.aggregate(
    ArrayAgg(Row(F('field1'), F('field2'))),
)


Cheers,
Simon

[1] https://docs.djangoproject.com/en/1.9/ref/models/expressions/#writing-your-own-query-expressions

Tomasz Nowak

unread,
Mar 4, 2016, 9:57:43 AM3/4/16
to Django users
Hi Simon, thank you for your suggestion!
I managed to accomplish that for one column with custom ROW function and JSON version with JSON_AGG and JSON_BUILD_OBJECT.

However when using two such annotations the number of aggregated elements gets multiplied by the number of elements in the other relation.
I found a different way to get such aggregations independently: http://dba.stackexchange.com/a/130212/88817
but when I use extra(select={...}) the same SQL code is placed in GROUP BY clause, which results in:

ProgrammingError: could not identify an equality operator for type json

How to prevent adding this extra select to GROUP BY? Without it the query works well.

Best wishes
Tomasz

Simon Charette

unread,
Mar 4, 2016, 12:51:51 PM3/4/16
to Django users
Hi Thomasz,

Using extra() is a dead end as it's planned to be removed.

It's hard for me to guide you about how you should use the expression API with no
model definition or queryset.

Could you provide us with the set of model and the annotated querysets you're building?

The queryset's SQL representation would also be useful (str(queryset.query)).

Cheers,
Simon

Tomasz Nowak

unread,
Mar 4, 2016, 4:02:14 PM3/4/16
to Django users
Hi!

I would like to create a JSON structure of chosen objects (animals) with corresponding events in one database hit.
The model represents animals and different types of events that happened to them:

class Animal(models.Model):
    life_number = models.TextField(primary_key=True)
    birthday = models.DateField(blank=True, null=True)

class Inseminacja(models.Model):                                                                                     
    cow = models.ForeignKey(Animal, models.DO_NOTHING, db_column='cow_life_number')
    date = models.DateTimeField()                                                                                      
    number = models.IntegerField()                                                                                     

class WynikBadania(models.Model):
  animal = models.ForeignKey(Animal, models.DO_NOTHING, db_column='anilifenumber')
  date_entered = models.DateTimeField(db_column='timestamp')

I use my JsonAgg and JsonBuildObject functions submitted to Trac at:
https://code.djangoproject.com/ticket/26327#no1

>>> qs = Animal.objects.filter(pk='PL005257066205').annotate(
    ins          = JsonAgg(JsonBuildObject([F('inseminacja__date'),F('inseminacja__number')], output_field=CharField())),
    examinations = JsonAgg(JsonBuildObject([F('wynikbadania__date_entered')], output_field=CharField()))
  )

The resulting query naturally results in cartesian product of the events, which is exactly my problem:

>>> str(qs.query)

'SELECT
    "animal"."life_number",
    "animal"."birthday",
    JSON_AGG(JSON_BUILD_OBJECT(\'timestamp\',"wynikbadania"."timestamp")) AS "examinations",
    JSON_AGG(JSON_BUILD_OBJECT(\'date\',"inseminacja"."date",\'number\',"inseminacja"."number")) AS "ins"
FROM "animal"
LEFT OUTER JOIN "wynikbadania" ON ("animal"."life_number" = "wynikbadania"."anilifenumber")
LEFT OUTER JOIN "inseminacja" ON ("animal"."life_number" = "inseminacja"."cow_life_number")
WHERE "animal"."life_number" = PL005257066205
GROUP BY "animal"."life_number"'

>>> qs.values()

[{
  'examinations': [
    {u'timestamp': u'2014-11-20T12:41:54'},
    {u'timestamp': u'2014-11-20T12:41:54'},
    {u'timestamp': u'2015-07-13T11:41:08.40319'},
    {u'timestamp': u'2015-07-13T11:41:08.40319'}
  ],
  'birthday': None,
  'ins': [
    {u'date': u'2013-11-16T15:00:00', u'number': 1},
    {u'date': u'2013-12-09T10:00:00', u'number': 2},
    {u'date': u'2013-11-16T15:00:00', u'number': 1},
    {u'date': u'2013-12-09T10:00:00', u'number': 2}
  ],
  'life_number': u'PL005257066205'
}]

I don't know exact SQL query that should be used.
Using DISTINCT is not possible, as "JSON_AGG(DISTINCT JSON_BUILD_OBJECT(" results in "ERROR:  could not identify an equality operator for type json".
Using GROUP BY "animal"."life_number", "wynikbadania"."timestamp","inseminacja"."date" naturally results in 4 separate records.

Do you have any ideas?

Best wishes,
Tomasz

Simon Charette

unread,
Mar 4, 2016, 4:28:16 PM3/4/16
to Django users
Hi Thomasz,

I think this might be a bug in either your custom expressions or Django's
GROUP BY handling on PostgreSQL.

Can you try setting `django.db.backends.postgresql.features.DatabaseFeatures.allows_group_by_selected_pks`
to `False` and see if the appropriate GROUP BY clause is generated?

Meanwhile I'll have a look at both your custom expressions to try to identify
the bug.

Simon

Simon Charette

unread,
Mar 4, 2016, 4:41:31 PM3/4/16
to Django users
In order to make sure the bug doesn't lie in your expressions could you try
running the following queryset:

Animal.objects.filter(pk='PL005257066205').annotate(
    inseminacja_cnt=Count('inseminacja'),
    wynikbadania_cnt=Count('wynikbadania'),
)

If the results are `inseminacja_cnt=4` and `wynikbadania_cnt=4` I'm afraid you
might have hit a Django bug with multiple annotations over multi-valued
relationships.

Adding a `DISTINCT ON ('animal.life_number', 'wynikbadania.id', 'inseminacja.id')`
might help you but I wouldn't be surprised if it affected performance badly.

You should be able to do that using the `distinct()` method:

Animal.objects.filter(pk='PL005257066205').annotate(
    ins=JsonAgg(
        JsonBuildObject(

            [F('inseminacja__date'),F('inseminacja__number')], output_field=CharField()
        )
    ),
    examinations=JsonAgg(
        JsonBuildObject(
            [F('wynikbadania__date_entered')], output_field=CharField()
        )
    ),
).distinct('life_number', 'inseminacja__id', 'wynikbadania__id')

Cheers,
Simon

Tomasz Nowak

unread,
Mar 4, 2016, 5:49:00 PM3/4/16
to Django users
Hi Simon!
Thank you for your suggestions.

Indeed, I get {'wynikbadania_cnt': 4, 'inseminacja_cnt': 4}.

Adding the 'DISTINCT ON' (just after the initial SELECT) results in a message that both ".id" columns "must appear in the GROUP BY clause or be used in an aggregate function".
On the other side, the qs.distinct(...) results in an error:

Traceback (most recent call last):
...
  File "django/db/models/sql/compiler.py", line 824, in execute_sql
    sql, params = self.as_sql()
  File "django/db/models/sql/compiler.py", line 412, in as_sql
    "annotate() + distinct(fields) is not implemented.")
NotImplementedError: annotate() + distinct(fields) is not implemented.

This happens on both 1.9.x branch and 1.10.0.alpha.

Best wishes,
Tomasz

Tomasz Nowak

unread,
Mar 4, 2016, 5:54:58 PM3/4/16
to Django users
After setting DatabaseFeatures.allows_group_by_selected_pks = False the "extra(select=...)" version still copies the expression to the "GROUP BY":

  GROUP BY (( SELECT json_agg(inseminacja.*) FROM inseminacja WHERE inseminacja.cow_life_number = animal.life_number )), ...



W dniu piątek, 4 marca 2016 22:28:16 UTC+1 użytkownik Simon Charette napisał:
Hi Thomasz,

Simon Charette

unread,
Mar 6, 2016, 11:26:39 PM3/6/16
to Django users
Hi Tomasz,

Unfortunately I'm now pretty sure you are affected by an ORM bug when annotating
over multiple multi-valued relationships[1].

Since this bug is 7 years old and seemed quite hard to fix it was recently
documented as a limitation[2].

From this point I suggest you CC on ticket to get notified when it's finally
fixed and rely on .raw()[3] until then.

Cheers,
Simon

[1] https://code.djangoproject.com/ticket/10060
[2] https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#combining-multiple-aggregations
[3] https://docs.djangoproject.com/en/1.9/topics/db/sql/#performing-raw-queries

Tomasz Nowak

unread,
Mar 7, 2016, 6:51:39 AM3/7/16
to Django users
Thank you, Simon!
Reply all
Reply to author
Forward
0 new messages