Case-insensitive non-deterministic collation

236 views
Skip to first unread message

Mike Dewhirst

unread,
Aug 5, 2023, 4:05:42 AM8/5/23
to Django users
The following warning triggered a bit of research which looks like a significant amount of study will be required to find the collation needed ...


django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1.
        HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead.


Does anyone have experience they would like to share? What replaces that ellipsis?

The primary use case is to establish case-insensitivity when checking names - including usernames, company names and abbreviations/acronyms. Maybe there is a better way to handle that?

This is my typical PostgreSQL database spec ...

CREATE DATABASE xxxx
    WITH
    OWNER = miked
    ENCODING = 'UTF8'
    LC_COLLATE = 'C'
    LC_CTYPE = 'C'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

Many thanks for any help

Cheers

Mike


  

Chetan Ganji

unread,
Aug 5, 2023, 6:00:07 AM8/5/23
to django...@googlegroups.com
Hi Mike

RE: The primary use case is to establish case-insensitivity when checking names - including usernames, company names and abbreviations/acronyms. 

I dont know anything about db_collation. 
Below 4 lookups should solve most common scenarios. 
https://docs.djangoproject.com/en/4.2/ref/models/querysets/#field-lookups

ksnip_20230805-152408.png

Regards,

Chetan Ganji
+91-900-483-4183


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2eccab9e-e296-55e0-05de-e8d4cf708262%40dewhirst.com.au.

Mike Dewhirst

unread,
Aug 6, 2023, 3:03:14 AM8/6/23
to django...@googlegroups.com
On 5/08/2023 7:58 pm, Chetan Ganji wrote:
Hi Mike

RE: The primary use case is to establish case-insensitivity when checking names - including usernames, company names and abbreviations/acronyms. 

I dont know anything about db_collation.

Me neither


Below 4 lookups should solve most common scenarios.

Actually that was how I did it originally. I switched to using the PostgreSQL CI field because it is all done in the database - much faster - and my code is much reduced and therefore fewer possibilities for bugs etc.

Judging from the Django release notes and the PostgreSQL docs there should be a straightforward answer to my question. Researching the correct answer is complex enough to make me ask here first.

Cheers

Mike

To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAMKMUjuxfeV5m4QiPz1jEyh7fRobqZn7SCp4dnXnjrSOBirh7Q%40mail.gmail.com.


-- 
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Your
email software can handle signing.
OpenPGP_signature

Chetan Ganji

unread,
Aug 6, 2023, 7:18:50 AM8/6/23
to django...@googlegroups.com
Check this out.
https://gist.github.com/hleroy/2f3c6b00f284180da10ed9d20bf9240a

# According to Django documentation, it’s preferable to use non-deterministic collations
# instead of the citext extension for Postgres > 12.
# Example migation to create the case insensitive collation

class Migration(migrations.Migration):

operations = [
CreateCollation(
'case_insensitive',
provider='icu',
locale='und-u-ks-level2',
deterministic=False
)
]


# Example model using the new db_collation parameter introduced with Django 3.2

class Tag(models.Model):
name = models.CharField(max_length=50, db_collation='case_insensitive')

class Meta:
ordering = ['name']

def __str__(self):
return self.name

Regards,
Chetan Ganji
+91-900-483-4183

Mithilesh Rawani

unread,
Aug 6, 2023, 12:39:11 PM8/6/23
to django...@googlegroups.com
Any help visit www.bansloi.com this is django website 

Mike Dewhirst

unread,
Aug 6, 2023, 10:29:28 PM8/6/23
to django...@googlegroups.com
On 6/08/2023 9:17 pm, Chetan Ganji wrote:

Thanks Chetan

I have seen that 'icu' and 'und-whatever...' in various places on the web - so it seems to be spreading - but I haven't had the brainspace to understand it yet.

I'll try an experiment with provider='C' and locale='C' because that is how most of my databases are already established. If that passes my tests I might move on to other things.

From what I can see, PostgreSQL are likely to deprecate citext as inelegant. That would be why Django has deprecated it.

Thanks again.

Mike

OpenPGP_signature

Mike Dewhirst

unread,
Aug 9, 2023, 2:57:23 AM8/9/23
to Django users
My tests stopped working so I have decided to abandon case-insensitive fields and do it all manually.

Thanks everyone.

Cheers

Mike

Mike Dewhirst

unread,
Aug 15, 2023, 3:30:48 AM8/15/23
to django...@googlegroups.com
Found a great article by Adam Johnson written in February ...

https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/

Covers all the bases.

Thank you Adam

Cheers

Mike
OpenPGP_signature

Vitor Freitas

unread,
Aug 17, 2023, 10:35:16 PM8/17/23
to django...@googlegroups.com
Hi Mike,

On Tue, Aug 15, 2023 at 4:30 AM Mike Dewhirst <mi...@dewhirst.com.au> wrote:

This is a great reference. It helped me out with the migration from postgresql ci fields to db collations.

Everything about this is new for me as well. I'm sure the db collation strategy is more powerful and I can see the benefits.

However, the postgresql ci fields were way easier to implement.

Right now I'm testing it out on a smaller project. One problem that I'm currently facing is that exposing some fields that have the db_collation configuration to django-filters or to Django Admin search parameters are causing an exception:

NotSupportedError
nondeterministic collations are not supported for LIKE

This is the collation that I'm using:

CreateCollation(
  "case_insensitive",
  provider="icu",
  locale="und-u-ks-level2",
  deterministic=False,
)

Anyway, all the icu / und-u-ks stuff look a little bit confusing. It would be good to have some guidelines or some quick recipes on the docs that would help us out making the migration :-)

Kind regards,
Vitor
 

Mike Dewhirst

unread,
Aug 18, 2023, 12:18:38 AM8/18/23
to django...@googlegroups.com
On 18/08/2023 12:34 pm, Vitor Freitas wrote:
Hi Mike,

On Tue, Aug 15, 2023 at 4:30 AM Mike Dewhirst <mi...@dewhirst.com.au> wrote:

This is a great reference. It helped me out with the migration from postgresql ci fields to db collations.

Everything about this is new for me as well. I'm sure the db collation strategy is more powerful and I can see the benefits.

However, the postgresql ci fields were way easier to implement.

I agree



Right now I'm testing it out on a smaller project. One problem that I'm currently facing is that exposing some fields that have the db_collation configuration to django-filters or to Django Admin search parameters are causing an exception:

NotSupportedError
nondeterministic collations are not supported for LIKE

Yes. I got the same error ...

https://code.djangoproject.com/ticket/33901   closed Bug (fixed)

From the discussion in that ticket I got the impression that maybe I should postpone using collations until after I upgrade from Django 3.2 to 4.2.

I was using a very similar collation.

I have already removed my CI fields so I won't put them back.

I manually (__iexact) check to prevent duplicate names for users and companies and the sorting is not critical for me at the moment.

I think I'll avoid collations for now. I'm running out of brainspace and don't have the time to do the exhaustive research needed to correctly define a bug for a new ticket. I'll press on with workarounds until I'm absolutely forced to upgrade to 4.2 and hope that some generous soul has sorted it out by then.

The fact is it is new for PostgreSQL as well as Django so it isn't surprising to see such wrinkles.

Cheers

Mike

OpenPGP_signature
Reply all
Reply to author
Forward
0 new messages