Good usage of native JSON functions and operators in ORM queries

177 views

Skip to first unread message

Shaheed Haque

unread,

May 30, 2020, 8:50:40 AM5/30/20

to django...@googlegroups.com

Hi,

I have a model MyModel which has a JSONField() called 'snapshot'. In Python terms, each snapshot looks like this:

======================

snapshot = {

'pay_definition' : {

'1234': {..., 'name': 'foo', ...},

'99': {..., 'name': 'bar', ...},

}

======================

I'd like to find all unique values of 'name' in all instances of MyModel. I have this working using native JSON functions like this:

class PayDefs(Func):
function='to_jsonb'
template="%(function)s(row_to_json(jsonb_each((%(expressions)s->'pay_definition')))->'value'->'name')"

MyModel.objects.annotate(xxx=PayDefs(F('snapshot'))).order_by().distinct('xxx').values_list('xxx', flat=True)

My question is if this the best way to solve this problem? The way my current logic works, reading from insider out is, I think:

Pass in the 'snapshot'.
Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this key.
To skip the unknown numeric keys, "jsonb_each()" turns each key, value pair into an inner row like ['1234', {...}].
To get to the value column of the inner row "row_to_json()->'value'".
To get the name field's value "->'name'".
A final call to "to_jsonb" in the PayDefs class.

For example, since all I care about is the string value of 'name', is there a way to get rid of the PayDefs class, and its invocation of to_jsonb? Likewise, is there a better way to do the inner parts? To provide context on what "better" might be:

Snapshot JSONs might easily be 20MB in size.
Each 'pay_definition' is probablyonly about 1kB in size, and there might be 50 of them in a snapshot.
There might be 1000 MyModel instances in a given query.
I'm using PostgreSQL 12

so my concern is not have the database server or Django perform extraneous work converting between strings and JSON for example.

Thanks, Shaheed

Shaheed Haque

unread,

Jun 2, 2020, 10:12:59 AM6/2/20

to django...@googlegroups.com

It has been pointed out that a JOIN LATERAL like this can also perform the iteration over the keys:

SELECT DISTINCT snapshot->’pay_definition’->k.value->’name’
FROM mymodel
JOIN LATERAL jsonb_object_keys(snapshot->’pay_definition’) AS k(value) ON true

But Google suggests that a raw query might be the only way to get this to work under the ORM. Pondering further, I was able to simplify that to this:

SELECT DISTINCT snapshot -> 'pay_definition' -> k.value -> 'name'
FROM mymodel, jsonb_object_keys(snapshot -> 'pay_definition') AS k(value);

Is there a way to wrap this form in the embrace of the ORM?

Thanks, Shaheed

=================

Thanks, Shaheed

Reply all

Reply to author

Forward

0 new messages