Hi,
I have a model MyModel which has a JSONField() called 'snapshot'. In Python terms, each snapshot looks like this:
======================
snapshot = {
'pay_definition' : {
'1234': {..., 'name': 'foo', ...},
'99': {..., 'name': 'bar', ...},
}
======================
I'd like to find all unique values of 'name' in all instances of MyModel. I have this working using native JSON functions like this:
class PayDefs(Func):
function='to_jsonb'
template="%(function)s(row_to_json(jsonb_each((%(expressions)s->'pay_definition')))->'value'->'name')"
MyModel.objects.annotate(xxx=PayDefs(F('snapshot'))).order_by().distinct('xxx').values_list('xxx', flat=True)
My question is if this the best way to solve this problem? The way my current logic works, reading from insider out is, I think:
- Pass in the 'snapshot'.
- Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this key.
- To skip the unknown numeric keys, "jsonb_each()" turns each key, value pair into an inner row like ['1234', {...}].
- To get to the value column of the inner row "row_to_json()->'value'".
- To get the name field's value "->'name'".
- A final call to "to_jsonb" in the PayDefs class.
For example, since all I care about is the string value of 'name', is there a way to get rid of the PayDefs class, and its invocation of to_jsonb? Likewise, is there a better way to do the inner parts? To provide context on what "better" might be:
- Snapshot JSONs might easily be 20MB in size.
- Each 'pay_definition' is probablyonly about 1kB in size, and there might be 50 of them in a snapshot.
- There might be 1000 MyModel instances in a given query.
- I'm using PostgreSQL 12
so my concern is not have the database server or Django perform extraneous work converting between strings and JSON for example.
Thanks, Shaheed