About the ORM icontains operator's disadvantage on PostgreSQL performance and query results.

479 views
Skip to first unread message

Mesut Öncel

unread,
Feb 27, 2021, 5:13:32 PM2/27/21
to Django developers (Contributions to Django itself)
Hi,
I opened the ticket https://code.djangoproject.com/ticket/3575. In this ticket, I wanted to prove the effect of UPPER function used in icontains operator on indexes and query result. UPPER function causes performance problems regarding indexing in PostgreSQL. Another issue is that it does not give correct results especially for Turkish characters.

Database administrators do not want to add individual indexes in this regard. Because adding more than one index to the same column in full text search will negatively affect cost and performance.

An expression index should be created to prevent performance loss in database queries to be used outside the framework. However, the existence of such an index should be notified to everyone who uses this database, and the use of "UPPER (name) ilike '% AA%" instead of "ilike" should be encouraged. This will cause PostgreSQL to block the "ilike" operator. 

If you give the necessary approval in this regard, I want to open a pull request for the field specified on the ticket.

Thanks.

Tom Forbes

unread,
Feb 27, 2021, 5:17:21 PM2/27/21
to django-d...@googlegroups.com
Hey,
I think it would be best if you opened a new ticket explaining your issue and the performance problems you’ve found (especially around Turkish characters). I’m not entirely clear what you wish to be changed: you wish for an expression index to be created automatically?

Tom

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ba167e5c-49dd-4d96-a2c4-855082639d4en%40googlegroups.com.

Mesut Öncel

unread,
Feb 28, 2021, 4:31:05 AM2/28/21
to django-d...@googlegroups.com
The reason I opened all the problems in a single ticket was to show the disadvantages of the "UPPER" function used in the "icontains" operator.I think that instead of using "UPPER (column) like" used in the Django ORM filter method, the "icontains" operator should use the "ilike" operator and this should be changed in the source code I specified in the ticket. From the query examples I added to the ticket, you can see that the query "UPPER (column) like" generated by the "icontains" operator has a performance problem and produces incorrect results. If you use "ilike" instead of "UPPER (column)" in the icontains operator, the 2 problems I mentioned will be solved.

Tom Forbes <t...@tomforb.es>, 28 Şub 2021 Paz, 01:17 tarihinde şunu yazdı:


--
İyi çalışmalar. Saygılarımla.

Mesut Öncel

Tom Forbes

unread,
Feb 28, 2021, 4:36:35 AM2/28/21
to django-d...@googlegroups.com
Unless I’m missing something, the ticket you linked was last updated 7 years ago with two now-broken images. Which ticket did you open with the examples?

Tom

Mesut Öncel

unread,
Feb 28, 2021, 5:23:23 AM2/28/21
to django-d...@googlegroups.com
Yes, 7 years ago I reviewed the ticket for the updated code, but there are problems there.https: //code.djangoproject.com/ticket/32485 The ticket I opened has the necessary details. If you want more details, I can make the necessary explanations.

Tom Forbes <t...@tomforb.es>, 28 Şub 2021 Paz, 12:36 tarihinde şunu yazdı:

Tom Forbes

unread,
Feb 28, 2021, 5:28:49 AM2/28/21
to django-d...@googlegroups.com
Thank you for the clarification! 

Tom Forbes

unread,
Feb 28, 2021, 5:39:32 AM2/28/21
to django-d...@googlegroups.com
Thank you for the clarification! I think the biggest argument for this change is the fact that uppercasing Unicode can cause incorrect results to be returned.

Given that we now have much better support for custom index types, perhaps we should change this? We need a custom expression index anyway, so it might not be a huge ask to say “now you should use a gin index”?

Tom

Mesut Öncel

unread,
Feb 28, 2021, 6:06:50 AM2/28/21
to django-d...@googlegroups.com
You're welcome. Yes, database administrators can create an index as desired. However, creating multiple cin indexes in a single column with different methods will cause performance issues. I don't think using expressions when using like and ilike is a correct use for query performance.
It doesn't hurt to add an extra expression string for the "exact" and "iexact" operators. However, it will be a huge cost for full text search.

Tom Forbes <t...@tomforb.es>, 28 Şub 2021 Paz, 13:39 tarihinde şunu yazdı:

Mesut Öncel

unread,
Feb 28, 2021, 6:47:09 AM2/28/21
to django-d...@googlegroups.com
A discussion was requested from here to make changes on this issue. Do you think it would be appropriate if I open a pull request for this change?

Mesut Öncel <mesuto...@gmail.com>, 28 Şub 2021 Paz, 14:06 tarihinde şunu yazdı:

Sky Lord

unread,
Feb 28, 2021, 6:59:28 AM2/28/21
to django-d...@googlegroups.com
Guys, am noob anyone gonna guide?

Mesut Öncel

unread,
Feb 28, 2021, 7:13:43 AM2/28/21
to django-d...@googlegroups.com
The ticket I posted in the first email does not belong to me. The ticket I opened is https://code.djangoproject.com/ticket/32485.

Sky Lord <skylo...@gmail.com>, 28 Şub 2021 Paz, 14:59 tarihinde şunu yazdı:

James Bennett

unread,
Feb 28, 2021, 8:05:56 PM2/28/21
to django-d...@googlegroups.com
On Sun, Feb 28, 2021 at 2:39 AM Tom Forbes <t...@tomforb.es> wrote:
>
> Thank you for the clarification! I think the biggest argument for this change is the fact that uppercasing Unicode can cause incorrect results to be returned.
>
> Given that we now have much better support for custom index types, perhaps we should change this? We need a custom expression index anyway, so it might not be a huge ask to say “now you should use a gin index”?

It's worth pointing out that case mapping and transformation in
Unicode is difficult and complex. I wrote up an intro to the problem a
while back:

https://www.b-list.org/weblog/2018/nov/26/case/

One thing that's important to note is that there is no generic
one-size-fits-all-languages option that Django can just do by default
and get the right results. For example, a case mapping that does the
right thing for Turkish will do the wrong thing for (to pick a random
example) French, and vice-versa. Unicode itself provides a basic "hope
for the best" set of default case mappings that do the right thing for
many cased scripts, but also is clear in saying that you may need to
use a locale-specific mapping to get what you really want.

Postgres has the ability to configure locale, and when configured it
does the "right thing" -- for example, when the locale is tr_TR or
another Turkish locale variant, the UPPER() function should correctly
handle dotted versus dotless 'i' as required for Turkish. But Postgres
also warns that this will have performance impact, which I think is
what's being noted in the ticket.

I'm not sure there will be an easy or obvious solution here.

Mesut Öncel

unread,
Mar 1, 2021, 1:06:59 AM3/1/21
to django-d...@googlegroups.com
Hi James,
Thanks for your explanations. However, I wanted to explain the disadvantage of using "UPPER like"  instead of "ilike" in icontains, istartswith and iendswith. The performance problem should not be ignored.

James Bennett <ubern...@gmail.com>, 1 Mar 2021 Pzt, 04:05 tarihinde şunu yazdı:
--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

Hannes Ljungberg

unread,
Mar 1, 2021, 3:31:03 AM3/1/21
to Django developers (Contributions to Django itself)
I kind of agree that using `UPPER` instead of `ILIKE` for `__icontains` on PostgreSQL isn’t optimal. But it is quite easy to create a functional trigram GIN-index which use `UPPER` to allow these lookups to use an index. This will be even easier in Django 3.2 where you can create functional indexes in your model definitions.

The recommended way to handle case insensitive searches in PostgreSQL is by using functional indexes (https://www.postgresql.org/docs/current/indexes-expressional.html).  But with trigram indexes this isn’t needed and they are required to get index scans for leading wildcard searches. Note that trigram GIN indexes do have drawbacks, for example they don’t support the `=`-operator and will not support search strings with less than 3 characters.

I’ve been in a similar position as you and to solve this I created a custom lookup which used `ILIKE` and used that instead of `__icontains`.

It might make sense to change the `__icontains`  and `__iendswith` lookups to use `ILIKE` instead of `UPPER` but I’m not really sure that it’s justified to break the indexed queries people already have in place. That is where people have the GIN trigram `UPPER` index opposed from the regular GIN trigram index. The performance issue you’re describing doesn’t really change if we use `ILIKE` over `UPPER`. You need to install an index anyway.

Regarding your issue with Turkish characters I think that it works because `ILIKE` internally uses some form of `LOWER` and `LOWER('İstanbul') = LOWER('istanbul')` would’ve worked in your case. As James wrote this behaviour depends on your configured locale. I think the one way to do these kind of searches without changing the locale is to use `tsvector` and `tsquery` with a Turkish configuration, you can even make them unaccented to allow matching both “Istanbul” and “İstanbul” with the search string “istanbul”.

Mesut Öncel

unread,
Mar 1, 2021, 5:28:17 AM3/1/21
to django-d...@googlegroups.com
Thank you for your detailed explanation. You are right, they have tried to shape the database suitable for this structure for a long time, but removing the expression indexes will not cause a crisis. People and products using the database created by Django will already have to create a standard index and then an expression index for "UPPER". I have presented the gin index as an example, but indexes can be produced for full text search in different structures. But regardless, an expression index must be added for the "UPPER" function. Creating an expression gin index in addition to a standard gin index can be costly and even cause performance problems in large databases. Of course, it is also necessary to consider its reindex. As a result, my main expectation is not to ignore the current structure of PostgreSQL. Or if you are using "icontains", be sure to include an expression string because we should say we prefer not to use "ilike". :)

Hannes Ljungberg <hannes.l...@gmail.com>, 1 Mar 2021 Pzt, 11:31 tarihinde şunu yazdı:

Mesut Öncel

unread,
Mar 3, 2021, 9:24:35 AM3/3/21
to django-d...@googlegroups.com
Does anyone have a different opinion or will this topic be closed?

Mesut Öncel <mesuto...@gmail.com>, 1 Mar 2021 Pzt, 13:27 tarihinde şunu yazdı:
Reply all
Reply to author
Forward
0 new messages