[Django] #32411: JSONField filter icontains is case sensitive in mysql

350 views
Skip to first unread message

Django

unread,
Feb 3, 2021, 2:45:50 AM2/3/21
to django-...@googlegroups.com
#32411: JSONField filter icontains is case sensitive in mysql
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: nobody
Type: Bug | Status: new
Component: Database | Version: 3.1
layer (models, ORM) |
Severity: Normal | Keywords: JSONField
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
given model

{{{
class Paper(models.Model):
...
authors = models.JSONField(default=list)
...
}}}


{{{
In [53]: p = Paper.objects.get(id=1)

In [54]: p.__dict__
Out[54]:
{'_state': <django.db.models.base.ModelState at 0x7f2aa74b3790>,
'id': 1,
...
'authors': [{'name': 'Rehmann Kim-Thomas',
'workplace': 'SAP SE, Walldorf, Germany'},
{'name': 'Folkerts Enno', 'workplace': 'SAP SE, Walldorf, Germany'}],
...
}

In [55]: print(Paper.objects.filter(authors__icontains="rehmann").count())
0

In [56]: print(Paper.objects.filter(authors__icontains="Rehmann").count())
1

In [57]: print(Paper.objects.filter(authors__icontains="Rehmann").query)
SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`,
`scholardata_paper`.`modified`, `scholardata_paper`.`doi`,
`scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`,
`scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`,
`scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`,
`scholardata_paper`.`mag_id`, `scholardata_paper`.`language`,
`scholardata_paper`.`url`, `scholardata_paper`.`abstract`,
`scholardata_paper`.`title`, `scholardata_paper`.`authors`,
`scholardata_paper`.`keywords`, `scholardata_paper`.`issued`,
`scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`,
`scholardata_paper`.`volume`, `scholardata_paper`.`issue`,
`scholardata_paper`.`page`, `scholardata_paper`.`references_count`,
`scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`,
`scholardata_paper`.`references_refreshed`,
`scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE
JSON_UNQUOTE(`scholardata_paper`.`authors`) LIKE %Rehmann% ORDER BY
`scholardata_paper`.`id` ASC

In [58]: print(Paper.objects.filter(authors__iregex="rehmann").count())
1

In [59]: print(Paper.objects.filter(authors__iregex="Rehmann").count())
1

In [60]: print(Paper.objects.filter(authors__iregex="Rehmann").query)
SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`,
`scholardata_paper`.`modified`, `scholardata_paper`.`doi`,
`scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`,
`scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`,
`scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`,
`scholardata_paper`.`mag_id`, `scholardata_paper`.`language`,
`scholardata_paper`.`url`, `scholardata_paper`.`abstract`,
`scholardata_paper`.`title`, `scholardata_paper`.`authors`,
`scholardata_paper`.`keywords`, `scholardata_paper`.`issued`,
`scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`,
`scholardata_paper`.`volume`, `scholardata_paper`.`issue`,
`scholardata_paper`.`page`, `scholardata_paper`.`references_count`,
`scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`,
`scholardata_paper`.`references_refreshed`,
`scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE
REGEXP_LIKE(JSON_UNQUOTE(`scholardata_paper`.`authors`), Rehmann, 'i')
ORDER BY `scholardata_paper`.`id` ASC

}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32411>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 3, 2021, 2:48:46 AM2/3/21
to django-...@googlegroups.com
#32411: JSONField filter icontains is case sensitive in mysql
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: JSONField | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by elonzh:

Old description:

New description:

given model

In [63]:
print(Paper.objects.filter(authors__0__name__icontains="rehmann").count())
1

In [64]:
print(Paper.objects.filter(authors__0__name__icontains="rehmann").query)


SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`,
`scholardata_paper`.`modified`, `scholardata_paper`.`doi`,
`scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`,
`scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`,
`scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`,
`scholardata_paper`.`mag_id`, `scholardata_paper`.`language`,
`scholardata_paper`.`url`, `scholardata_paper`.`abstract`,
`scholardata_paper`.`title`, `scholardata_paper`.`authors`,
`scholardata_paper`.`keywords`, `scholardata_paper`.`issued`,
`scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`,
`scholardata_paper`.`volume`, `scholardata_paper`.`issue`,
`scholardata_paper`.`page`, `scholardata_paper`.`references_count`,
`scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`,
`scholardata_paper`.`references_refreshed`,
`scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE

LOWER(JSON_UNQUOTE(JSON_EXTRACT(`scholardata_paper`.`authors`,
$[0]."name"))) LIKE LOWER(%rehmann%) ORDER BY `scholardata_paper`.`id` ASC

}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:1>

Django

unread,
Feb 3, 2021, 4:55:54 AM2/3/21
to django-...@googlegroups.com
#32411: JSONField filter icontains is case sensitive in mysql
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution: invalid

Keywords: JSONField | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* cc: Mariusz Felisiak (added)
* status: new => closed
* resolution: => invalid


Comment:

Hi. Thanks for the report.

Your usage is wrong for what you're trying to do. See the
[https://docs.djangoproject.com/en/3.2/topics/db/queries/#querying-
jsonfield Querying JSONField topic docs].

For JSONField `contains` takes key pairs, rather than a string:

{{{
>>> Dog.objects.filter(data__contains={'owner': 'Bob'})
}}}

`icontains` is not explicitly implemented on the field itself, so it's
falling back to the base implementation (wrongly — the
[https://database.guide/json_unquote-remove-quotes-from-a-json-document-
in-mysql/ JSON_UNQUOTE] gives us a string but we didn't generate the
`LOWER`) **but** it's not clear that makes much sense. (Mariusz: I'm
closing this as invalid, but do we want to accept in order to raise an
error here? 🤔)

As you've discovered the correct usage is to query by field
([https://docs.djangoproject.com/en/3.1/topics/db/queries/#key-index-and-
path-transforms as per here]):

{{{
>>> Dog.objects.filter(data__owner__name__icontains='bob')
}}}

For your example
`Paper.objects.filter(authors__name__icontains="rehmann")` should give you
what you want.
I hope that helps.

This seems related to the documentation issue #26511.

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:2>

Django

unread,
Feb 3, 2021, 8:34:07 AM2/3/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.

-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: JSONField mysql | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: closed => new
* cc: sage (added)
* type: Bug => Cleanup/optimization
* keywords: JSONField => JSONField mysql
* resolution: invalid =>
* stage: Unreviewed => Accepted


Comment:

This is not a documented usage, however it doesn't work only on MySQL, so
IMO we can fix this for consistency. MySQL handles strings used in `JSON`
context using the `utf8mb4_bin` binary collation, that's why comparison
of JSON values is case-sensitive and we need to use
`CaseInsensitiveMixin`. We can fix this be registering a custom case-
insensitive lookup for `JSONField`, e.g.
{{{
class JSONFieldIContains(CaseInsensitiveMixin, lookups.IContains):
pass


JSONField.register_lookup(JSONFieldIContains)
}}}
I don't think it makes sense to do the same for `__istartswith`,
`iendswith`, and `__iexact`.

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:3>

Django

unread,
Feb 5, 2021, 7:02:07 AM2/5/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: assigned

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: JSONField mysql | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Hasan Ramezani):

* owner: nobody => Hasan Ramezani
* status: new => assigned
* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/13979 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:4>

Django

unread,
Feb 5, 2021, 8:39:07 AM2/5/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: JSONField mysql | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:5>

Django

unread,
Feb 5, 2021, 10:39:05 AM2/5/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: closed

Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: JSONField mysql | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"63d239db037f02d98b7771c90422840bbb4a319a" 63d239db]:
{{{
#!CommitTicketReference repository=""
revision="63d239db037f02d98b7771c90422840bbb4a319a"
Fixed #32411 -- Fixed __icontains lookup for JSONField on MySQL.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:6>

Django

unread,
Feb 18, 2021, 9:55:35 PM2/18/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: closed
Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: JSONField mysql | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by elonzh):

Replying to [comment:2 Carlton Gibson]:


> Hi. Thanks for the report.
>
> Your usage is wrong for what you're trying to do. See the
[https://docs.djangoproject.com/en/3.2/topics/db/queries/#querying-
jsonfield Querying JSONField topic docs].
>
> For JSONField `contains` takes key pairs, rather than a string:
>
> {{{
> >>> Dog.objects.filter(data__contains={'owner': 'Bob'})
> }}}
>
> `icontains` is not explicitly implemented on the field itself, so it's
falling back to the base implementation (wrongly — the
[https://database.guide/json_unquote-remove-quotes-from-a-json-document-
in-mysql/ JSON_UNQUOTE] gives us a string but we didn't generate the
`LOWER`) **but** it's not clear that makes much sense. (Mariusz: I'm
closing this as invalid, but do we want to accept in order to raise an
error here? 🤔)
>
> As you've discovered the correct usage is to query by field
([https://docs.djangoproject.com/en/3.1/topics/db/queries/#key-index-and-
path-transforms as per here]):
>
> {{{
> >>> Dog.objects.filter(data__owner__name__icontains='bob')
> }}}
>
> For your example
`Paper.objects.filter(authors__name__icontains="rehmann")` should give you
what you want.
> I hope that helps.
>
> This seems related to the documentation issue #26511.

Thanks for your information, I know how to query field in a json object,
but the problem is the `authors` fields is a list and Django does not
support filter it.

`Paper.objects.filter(authors__name__icontains="rehmann")` won't give me
what I want.


{{{
>>> print(Paper.objects.filter(authors__name__icontains="rehmann").query)


SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`,
`scholardata_paper`.`modified`, `scholardata_paper`.`doi`,
`scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`,
`scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`,
`scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`,
`scholardata_paper`.`mag_id`, `scholardata_paper`.`language`,
`scholardata_paper`.`url`, `scholardata_paper`.`abstract`,
`scholardata_paper`.`title`, `scholardata_paper`.`authors`,
`scholardata_paper`.`keywords`, `scholardata_paper`.`issued`,
`scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`,
`scholardata_paper`.`volume`, `scholardata_paper`.`issue`,
`scholardata_paper`.`page`, `scholardata_paper`.`references_count`,
`scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`,
`scholardata_paper`.`references_refreshed`,
`scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE

LOWER(JSON_UNQUOTE(JSON_EXTRACT(`scholardata_paper`.`authors`, $."name")))
LIKE LOWER(%rehmann%) ORDER BY `scholardata_paper`.`id` ASC
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:7>

Django

unread,
Feb 19, 2021, 7:53:30 AM2/19/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: closed
Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: JSONField mysql | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> the problem is the authors fields is a list and Django does not support
filter it

That little to do with this ticket though. The ORM has no way of
''knowing'' `authors` is a JSON Array because `JSONField` are schema less.
If you want to filter out papers that have at least one author with a
matching name you'll have to use the MySQL equivalent of
`json_array_elements` and a subquery through a `Func` expression but
that's something you should ask about on
[https://docs.djangoproject.com/en/3.1/faq/help/#how-do-i-do-x-why-
doesn-t-y-work-where-can-i-go-to-get-help support channels].

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:8>

Django

unread,
Feb 19, 2021, 8:50:55 AM2/19/21
to django-...@googlegroups.com
#32411: Case-insensitive lookups on JSONField doesn't work on MySQL.
-------------------------------------+-------------------------------------
Reporter: elonzh | Owner: Hasan
Type: | Ramezani
Cleanup/optimization | Status: closed
Component: Database layer | Version: 3.1
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: JSONField mysql | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by elonzh):

Replying to [comment:8 Simon Charette]:


> > the problem is the authors fields is a list and Django does not
support filter it
>
> That little to do with this ticket though. The ORM has no way of
''knowing'' `authors` is a JSON Array because `JSONField` are schema less.
If you want to filter out papers that have at least one author with a

matching name you'll have to use the MySQL equivalent of PostgreSQL


`json_array_elements` and a subquery through a `Func` expression but
that's something you should ask about on
[https://docs.djangoproject.com/en/3.1/faq/help/#how-do-i-do-x-why-
doesn-t-y-work-where-can-i-go-to-get-help support channels].

Yes, that little to do with this ticket. I just want to tell Carlton
Gibson that he has given the wrong solution.

--
Ticket URL: <https://code.djangoproject.com/ticket/32411#comment:9>

Reply all
Reply to author
Forward
0 new messages