[Django] #35568: Searching nested JSON values

14 views
Skip to first unread message

Django

unread,
Jun 29, 2024, 4:37:41 PM6/29/24
to django-...@googlegroups.com
#35568: Searching nested JSON values
-----------------------------------------+--------------------------------
Reporter: Vasu Nagendra | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 5.0
Severity: Normal | Keywords: QuerySet.extra
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+--------------------------------
As requested in the documentation I am filing the use case for extra. The
data in my database looks similar to the data here
https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-SQLJSON-
PATH. It is in a column called `data` in my database in a table called
`alerts` (model called Alert). If I have a key of `track.segments.*.HR`
(as shown in the example), I can't quite use a `Q` operator. If the data
was in the upper level, Q operators work just fine -- but nested arrays it
doesn't work.

The way to do this is really just using a QuerySet extra keyword so using
the example they've provided here that would translate to

{{{
Alert.objects.extra(where=[r"""data @? '$.track.segments[*].HR ? (@ >
130)'])
}}}

It is quite scary in this case if 130 OR path track.segments[*].HR was
obtained from the user because there is no way to send those in a params
tuple. Whenever params is used with the `where` clause, it adds a quote
which JSONPath doesn't like.

I looked at KeyTransform as documented in django.db.models.fields.json.
This can _somewhat_ be managed if there was only one nesting by messing
with the key and using `__contains`. But when there are multiple levels of
nesting like this one for example https://learn.microsoft.com/en-
us/graph/api/security-alert-get?view=graph-rest-1.0&tabs=http#response-1
and I am trying to find `'$.evidence[*].roles[*] ? (@ == "compromised")'`
there is no choice but to use `extra`.

I am happy to take this on if anyone has any ideas on how best to approach
this. I have plenty of data to test and am familiar with the general
organization of the Django project (have not contributed before).
--
Ticket URL: <https://code.djangoproject.com/ticket/35568>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 29, 2024, 6:21:09 PM6/29/24
to django-...@googlegroups.com
#35568: Searching nested JSON values
--------------------------------+--------------------------------------
Reporter: Vasu Nagendra | Owner: nobody
Type: Uncategorized | Status: closed
Component: Uncategorized | Version: 5.0
Severity: Normal | Resolution: invalid
Keywords: QuerySet.extra | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Changes (by Simon Charette):

* resolution: => invalid
* status: new => closed

Comment:

Hello Vasu,

As referenced in the docs you can simply use `RawSQL` assuming you
specifying an `output_field=BooleanField()`

{{{#!python
Alert.objects.filter(
RawSQL("data @? %s", ('$.track.segments[*].HR ? (@ > 130)',),
output_field=BooleanField())
)
}}}

In the case of lookups though [https://docs.djangoproject.com/en/5.0/howto
/custom-lookups/#a-lookup-example you can create your own] to avoid having
to use `RawSQL` in the first place

{{{#!python
from django.db.models import JSONField, Lookup


class AnyJSONPath(Lookup):
lookup_name = "anypath"

def as_postgresql(self, compiler, connection):
lhs_sql, lhs_params = self.process_lhs(compiler, connection)
rhs_sql, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + rhs_params
return f"{lhs_sql} @? {rhs_sql}", params

JSONField.register_lookup(AnyJSONPath)
}}}

Are there any admonition that could have been made to the existing
documentation that could have pointed you in the right direction without
requiring the creation of ticket? Would you be interested in submitting
them?
--
Ticket URL: <https://code.djangoproject.com/ticket/35568#comment:1>

Django

unread,
Jun 29, 2024, 6:48:09 PM6/29/24
to django-...@googlegroups.com
#35568: Searching nested JSON values
--------------------------------+--------------------------------------
Reporter: Vasu Nagendra | Owner: nobody
Type: Uncategorized | Status: closed
Component: Uncategorized | Version: 5.0
Severity: Normal | Resolution: invalid
Keywords: QuerySet.extra | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Comment (by Vasu Nagendra):

Hi Simon -
Thanks for the pointer. My apologies, I didn't realize `RawSQL` could be
written like that by specifying an `output_field`. I thought the only way
to do this was `extra` and the documentation suggested I file a ticket
with my specific use case for consideration during deprecation. I will
play with this a bit and then submit changes to docs as you suggested. I
think generally your example there is perfect to add to the docs -- I can
do that though, you don't need to. I think the right place where it
belongs is here https://docs.djangoproject.com/en/5.0/topics/db/queries
/#containment-and-key-lookups. This is the first place I looked to
understand what is possible with `KT()` expressions. If you have a
better/different place, that's OK too - please suggest.

I'll test this out a bit with different use cases and add an example.

In my specific use case I have a few keys/values like this that I am
trying to mess around with for the `filter` clause. For everything that
doesn't have a `*` I was easily able to get `Q` objects working with
multiple `AND` and `OR` conditions and it beautifully does the right
thing. I want to try out your example above where I'd prefer to write
something like this (simplified, since I have other conditions that the
PostgreSQL JSONPath operator doesn't support directly like `endswith`)...

{{{
conditions = []
for k, v in search.items():
if "*" in k:
conditions.append(Q(**{f"data__{k}__anypath": v}))
else:
conditions.append(Q(**{f"data__{k)__exact": v}))
for condition in conditions:
combined &= Q(condition)
Alert.objects.filter(combined)
}}}

I appreciate your patience and friendliness! Thank you!
--
Ticket URL: <https://code.djangoproject.com/ticket/35568#comment:2>
Reply all
Reply to author
Forward
0 new messages