[Django] #35885: JSONField does accept strings that look like dicts and incorrectly saves them as strings, breaking JSON filtering

12 views
Skip to first unread message

Django

unread,
Nov 4, 2024, 11:35:55 AM11/4/24
to django-...@googlegroups.com
#35885: JSONField does accept strings that look like dicts and incorrectly saves
them as strings, breaking JSON filtering
---------------------------+-----------------------------------------
Reporter: DataGreed | Type: Uncategorized
Status: new | Component: Uncategorized
Version: 5.0 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------+-----------------------------------------
Steps to reproduce:

1. Create a model with JSONField, create a migration and migrate
2. Create an instance of this model, set the value of JSON field to a
string generated by `json.dumps()` from any dictionary, e.g.
"{"foo":"bar"}
3. Save the model instance
4. Instance save successfully

Expected result:

Instance is saved with the JSONField set to a dictionary {"foo": "bar"},
and JSONField deep filtering works properly

Actual result:

Instance is saved with a JSONField value saved as a string. Filtering does
not work, and when you query this instance from a database, the JSONField
value will be a string, not a dict.

I tried it on PostgreSQL and sqlite - both working incorrectly.
Note that if I set JSONField value to an actual dict it works correctly,
but it is too easy to make a mistake and then waste hours of debugging the
reasons the JSON filtering does not work, since developers usually
sanitize the data at least by dumping the dict to json to make sure it is
even possible to dump.

It seems like the fix would be to parse the string value in JSONField
before saving.
My current workaround is to do the following on my models with JSONFields:



{{{
class SomeModel(models.Model):

json_field = models.JSONField(blank=True, null=True)

def clean(self):
try:
if isinstance(self.json_field, str):
# make sure we are not trying to save string to JSONField,
django allows this for some reason
# and it breaks filtering on json
self.arguments = json.loads(self.json_field)
except ValueError as e:
# reraise as validation error. We do this to get more
actionable description for the error
raise ValidationError(str(e))

def save(self, *args, **kwargs):

# ensure clean runs. Well, kind of - we can still directly update
the fields,
# and it will somewhat break data integrity, so just don't do that
maybe? thanks :)
self.full_clean()
super().save(*args, **kwargs)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35885>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 4, 2024, 12:47:21 PM11/4/24
to django-...@googlegroups.com
#35885: JSONField does accept strings that look like dicts and incorrectly saves
them as strings, breaking JSON filtering
-------------------------------------+-------------------------------------
Reporter: DataGreed | Owner: (none)
Type: Uncategorized | Status: closed
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* component: Uncategorized => Database layer (models, ORM)
* resolution: => invalid
* status: new => closed

Comment:

JSON accepts top level string as valid input; that is including JSON
serialized as a string.

> [...] since developers usually sanitize the data at least by dumping the
dict to json to make sure it is even possible to dump.

I don't think this is a normal behavior. Usually data transits through
through serialization layers that are in charge of parsing JSON strings as
Python objects just like for datetimes, decimal, float, and other types.

Having `JSONField` auto-magically perform a `json.loads` when provided a
string input is backward incompatible and prevents storing JSON de-
serializable strings as top level values (e.g. `"1", "false", "null"`).
--
Ticket URL: <https://code.djangoproject.com/ticket/35885#comment:1>

Django

unread,
Nov 21, 2024, 2:30:45 PM11/21/24
to django-...@googlegroups.com
#35885: JSONField does accept strings that look like dicts and incorrectly saves
them as strings, breaking JSON filtering
-------------------------------------+-------------------------------------
Reporter: DataGreed | Owner: (none)
Type: Uncategorized | Status: closed
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by DataGreed2):

Replying to [comment:1 Simon Charette]:
> JSON accepts top level string as valid input; that is including JSON
serialized as a string.
>
> > [...] since developers usually sanitize the data at least by dumping
the dict to json to make sure it is even possible to dump.
>
> I don't think this is a normal behavior. Usually data transits through
through serialization layers that are in charge of parsing JSON strings as
Python objects just like for datetimes, decimal, float, and other types.

So what do you suggest? I encounter this bug throughout the years and I
spend considerable amount of time trying to figure out the issue when the
json deep filtering suddenly does not work.

Why would you think this is not normal behaviour? Why IntegerField breaks
if you try to save "FOO" in it and JSONField doesn't break when you save
string in it? It should throw a validation error at least. Same goes with
other fields that work properly, unlike JSONField that allows you to write
data that cannot be parsed properly.

> Having `JSONField` auto-magically perform a `json.loads` when provided a
string input is backward incompatible and prevents storing JSON de-
serializable strings as top level values (e.g. `"1", "false", "null"`).

Well, you can jsut check it for these values before calling json.loads.
Although, I have no ide why would you even try to write "false" or "1" in
a JSONField.
--
Ticket URL: <https://code.djangoproject.com/ticket/35885#comment:2>

Django

unread,
Nov 22, 2024, 1:09:08 AM11/22/24
to django-...@googlegroups.com
#35885: JSONField does accept strings that look like dicts and incorrectly saves
them as strings, breaking JSON filtering
-------------------------------------+-------------------------------------
Reporter: DataGreed | Owner: (none)
Type: Uncategorized | Status: closed
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

> Why IntegerField breaks if you try to save "FOO" in it and JSONField
doesn't break when you save string in it? It should throw a validation
error at least.

Because a string is not a valid value for an integer field while a string
is a valid value for a JSON field? It's completely valid to store a JSON
de-serializable string such as `"true"` in a JSON document. Would you
expect `json.dumps(json.dumps({"foo": "bar"}}` to raise an exception
because you attempt to serialize a JSON de-serializable string?

> Well, you can jsut check it for these values before calling json.loads.
Although, I have no ide why would you even try to write "false" or "1" in
a JSONField.

JSON is a standard and `JSONField` adheres to it. Django doesn't offer a
JSON except top level values cannot be JSON de-serializable strings kind
of field. To me what you are asking for is analogous to the implicit bytes
to text conversion that Python 2 use to make when asked to shove bytes
into text expecting interfaces. It is much better from a data hygiene
perspective to deserialize data at input boundaries (forms, serializers)
and pass it around in its deserialized form through interfaces.
--
Ticket URL: <https://code.djangoproject.com/ticket/35885#comment:3>
Reply all
Reply to author
Forward
0 new messages