Deprecate CICharField, CIEmailField, CITextField

1,353 views
Skip to first unread message

Mariusz Felisiak

unread,
Jan 25, 2022, 7:39:32 AM1/25/22
to django-d...@googlegroups.com

Hi y'all,

    Django 3.2+ supports "db_collation" [1] for "CharField" and "TextField" along with migration operations ("CreateCollation()", "RemoveCollation") and the database function "Collate()" [3]. Moreover CI fields and the entire "citext" module are discouraged since PostgreSQL 12 [4] in favor of collations. I think it's time to deprecate CI fields from the "contrib.postgres" in favor of "CharField" and "TextField" with case insensitive collations (and remove them in Django 5.0).

Best,
Mariusz

[1] https://code.djangoproject.com/ticket/31777
[2] https://code.djangoproject.com/ticket/32046
[3] https://code.djangoproject.com/ticket/21181
[4] https://www.postgresql.org/docs/12/citext.html

Adam Johnson

unread,
Jan 25, 2022, 7:59:54 AM1/25/22
to Django developers (Contributions to Django itself)
My initial concern was around the minimum PostgreSQL version that Django 5.0 will support. According to https://en.wikipedia.org/wiki/PostgreSQL#Release_history , PostgreSQL 10 is supported until 2022-11-10 , and version 11 until 2023-11-09. With Django 5.0 expected in 2024-01, it should be fine to deprecate the CI fields for removal in Django 5.0. Users on old PostgreSQL versions can manage the deprecation warning until they upgrade.

So +1 from me.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/833bb13b-3db1-f35b-3d51-a2a4671b45a9%40gmail.com.

Paolo Melchiorre

unread,
Jan 25, 2022, 8:01:55 AM1/25/22
to django-d...@googlegroups.com
Hi Mariusz,

I agree with you on deprecating and then removing CI fields.

I only would suggest adding some examples of migrations from CI fields
to collations in the deprecation notes to help users to easily
migrate.

So +1 for me too.

Ciao,
Paolo
> --
> You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/833bb13b-3db1-f35b-3d51-a2a4671b45a9%40gmail.com.



--
Paolo Melchiorre

https://www.paulox.net

Tom Carrick

unread,
Jan 25, 2022, 9:17:07 AM1/25/22
to django-d...@googlegroups.com
Hi,

I'm not too sure about this.

While Postgres encourages using non-deterministic collations, they're not without their downsides. For example, you can't do a LIKE query on a field using a non-deterministic collation, but you can with CItext - although I don't believe there's a way to index it. Say, for example, you want to store your user emails CI - this is great for logging in but it makes it impossible to search for users with a particular email domain, or trying to find someone by the username part, say.

On the other hand, we could just say: don't use either, put an index on UPPER and make sure you always use iexact.

I don't have a very strong opinion.

Cheers,
Tom


Mariusz Felisiak

unread,
Jan 25, 2022, 2:25:00 PM1/25/22
to Django developers (Contributions to Django itself)
wtorek, 25 stycznia 2022 o 13:59:54 UTC+1 Adam Johnson napisał(a):
My initial concern was around the minimum PostgreSQL version that Django 5.0 will support. According to https://en.wikipedia.org/wiki/PostgreSQL#Release_history , PostgreSQL 10 is supported until 2022-11-10 , and version 11 until 2023-11-09. With Django 5.0 expected in 2024-01, it should be fine to deprecate the CI fields for removal in Django 5.0. Users on old PostgreSQL versions can manage the deprecation warning until they upgrade.

Yes, support for PostgreSQL 10 will be dropped in Django 4.2, and support for PostgreSQL 11 will be dropped in Django 5.0.
 

Mariusz Felisiak

unread,
Jul 28, 2022, 6:55:39 AM7/28/22
to Django developers (Contributions to Django itself)
> Support for PostgreSQL 10 was dropped in Django 4.1, and support for PostgreSQL 11 was dropped in Django 4.2.

When CICharField, CIEmailField, CITextField deprecation starts in Django 4.2 they will be removed in Django 5.1.

Mariusz Felisiak

unread,
Jul 28, 2022, 7:03:10 AM7/28/22
to Django developers (Contributions to Django itself)

Silvio

unread,
Aug 6, 2022, 3:32:05 PM8/6/22
to Django developers (Contributions to Django itself)
Since these are PotsgreSQL-specific fields that are being deprecated, might it make sense to provide a hint as to what the CreateCollation() call should be?

I'm looking around and it's not immediately obvious. The CreateCollation() examples in Django are for a German phone book, where in reality, I think 99% of cases are going to be for case insensitive emails.

The deprecation notice in the patch currently shows an ellipsis.
On Thursday, July 28, 2022 at 7:03:10 AM UTC-4 Mariusz Felisiak wrote:
https://code.djangoproject.com/ticket/33872

Mariusz Felisiak

unread,
Aug 8, 2022, 5:49:25 AM8/8/22
to Django developers (Contributions to Django itself)
As far as I'm aware you can use:

     CreateCollation("case_insensitive", "und-u-ks-level2", provider="icu", deterministic=False)

to create a case insensitive collation on PostgreSQL.

Johannes Maron

unread,
Apr 12, 2023, 5:26:38 AM4/12/23
to Django developers (Contributions to Django itself)
Hi there,

I am sorry that I missed this in the alpha. But to the best of my knowledge, CITEXT and non-deterministic collations are not the same. They don't support the same operations and their string comparison operations are similar, yet not identical.
Furthermore, PostgreSQL doesn't discourage the use of CITEXT, but hints towards a native alternative. That's maybe more than just a subtle difference.

99% of all use-cases might be email, but even email LIKE-queries would be affected (good for +-searches).

Unless we want to drop support for the CITEXT extension, collations might not be a sufficient replacement.

I'd caution to revert the deprecation and keep support unless we make an informed decision to drop CITEXT for a 3rd party integration.

Best
Joe!

Mariusz Felisiak

unread,
Apr 12, 2023, 6:09:06 AM4/12/23
to Django developers (Contributions to Django itself)
Hi

> Unless we want to drop support for the CITEXT extension, ...

What do you mean by that? As far as I'm now, we don't do anything special to support CITEXT extension 🤔.

> I'd caution to revert the deprecation and keep support ...

I'm obviously biased as the author of this proposition and patch, however, IMO, small differences between using CI fields and collations don't justify maintaining 3 additional fields that were mostly untested. Also, they are deprecated in a LTS so folks still have 3 more years to update their code. In the worst case someone can create 3rd party package with them.

Unless something is fundamentally broken I'm against reverting.

Best,
Mariusz

Tom Carrick

unread,
Apr 13, 2023, 3:12:45 AM4/13/23
to django-d...@googlegroups.com
Hi,

I wrote most of the code for collation support, and I also argued (softly) against deprecating citext support for the reasons you stated.

However, I've changed my mind on this now. As you can't index the citext column for LIKE queries, doing these types of searches on any real amount of data is going to be too slow in most cases. I actually think the best practice right now for having searchable case-insensitive emails is to do it old-school - have a regular EmailField with an index on UPPER("email") and then make sure you always use iexact, istartswith etc. and this will properly use the indexes and result in a faster search.

So I see very few advantages now to keeping CITEXT at all, and they're quite easy to add as a third party package as Mariusz suggested if anyone is so inclined.

Cheers,
Tom

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

Adam Johnson

unread,
Apr 14, 2023, 5:36:46 AM4/14/23
to django-d...@googlegroups.com
Just to note, for anyone that finds it useful, that I wrote a blog post on migrating to collations: https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/

But yes, I have also been thinking like Tom that indexing UPPER("email") seems to be the path of least complexity...

Johannes Maron

unread,
Apr 18, 2023, 3:52:20 AM4/18/23
to django-d...@googlegroups.com
Thanks Adam,

of course I read your well-written article before diving into this topic, thanks for sharing.

However, I don't agree about the index. The best solution is using the CITEXT db type, which is very much alive.
Should Django to deprecate support for the db type, a 3rd party package seems the bast choice for me.
With the downside of me having to maintain yet another package. But I can understand if the Django project has no interest in maintenance.

In any event, I opened a ticket: https://code.djangoproject.com/ticket/34501

Best Joe


You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/nDMnO98nexY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM3PuF92jEdTocGPtG%2Bq0n%2B%3DfRfZ05gTw8w3T6kZ5p6xBQ%40mail.gmail.com.

fly.a...@gmail.com

unread,
Apr 19, 2023, 9:48:56 AM4/19/23
to Django developers (Contributions to Django itself)
Hey everyone!

Thanks for the discussion.
And special thanks @Adam, for the great article, helped us with the migration.

What I am struggling now with is whenever I specify `db_collation="case_insensitive"` on the field and this field is used in `ModelAdmin.search_fields` - Django simply breaks (as it by default uses `icontains` lookup).
That is quite unfortunate for the big projects, as I have to come up with some generic solution to something that was not broken before this feature deprecation (and the docs does not mention this case).
Good that Adam covered it in the article, but I feel that this could be handled on a lower level than right now. Currently, we'd need to write a manual annotation for admin queryset in almost every project that uses usernames or emails (which my guess is something you'd want to be case-insensitive on a database level).

I wonder how we could move forward (in case reverting this is not an option) and reduce overall aftermath stress.
For example, in terms of documentation, we could add a note on `db_collation` to `icontains` page:

But I also feel that might not be enough.

Best,
Rust

Johannes Maron

unread,
May 6, 2023, 4:50:34 AM5/6/23
to django-d...@googlegroups.com
Hello again,

I trust Mariusz' assessment regarding the maintainability. In that case, I presume a separate package from a 3rd party with commercial interest might be the best option going forward.

Thanks for all the considerations and explanations.

Cheers!
Joe

Adam Johnson

unread,
May 12, 2023, 7:37:05 AM5/12/23
to django-d...@googlegroups.com

What I am struggling now with is whenever I specify `db_collation="case_insensitive"` on the field and this field is used in `ModelAdmin.search_fields` - Django simply breaks (as it by default uses `icontains` lookup).
That is quite unfortunate for the big projects, as I have to come up with some generic solution to something that was not broken before this feature deprecation (and the docs does not mention this case).
Good that Adam covered it in the article, but I feel that this could be handled on a lower level than right now. Currently, we'd need to write a manual annotation for admin queryset in almost every project that uses usernames or emails (which my guess is something you'd want to be case-insensitive on a database level).

Yes, I discovered this too. It's what prompted me to write the parametrized admin tests covered in my later blog post: https://adamj.eu/tech/2023/03/17/django-parameterized-tests-model-admin-classes/

Annotating appropriately is what I found to work.

Tom Carrick wrote earlier:

I actually think the best practice right now for having searchable case-insensitive emails is to do it old-school - have a regular EmailField with an index on UPPER("email") and then make sure you always use iexact, istartswith etc. and this will properly use the indexes and result in a faster search.

I also think this may be a better approach, now. But I haven't tried it.

Django 5.0 will hopefully come with generated fields: https://github.com/django/django/pull/16417 . We may then be able to store the user-provided email in "email_original_cased" (or whatever) and make "email" a GeneratedField(expression=Lower("email")), with the lowercase collation and a unique consrtaint. We'll have to see...
 
For example, in terms of documentation, we could add a note on `db_collation` to `icontains` page:
https://docs.djangoproject.com/en/4.2/ref/models/querysets/#icontains

That sounds like too specific a note to add. Many different data types in many different databases fail to work with some existing lookups, such as several custom fields in Django-MySQL: https://django-mysql.readthedocs.io/en/latest/model_fields/index.html


Johannes Maron

unread,
May 12, 2023, 11:20:20 AM5/12/23
to django-d...@googlegroups.com
Hi,

Yes, I hope Django will continue to expand expression support. I worked so hard on the SQL compiler to facilitate those kinds of features.
Anyhow, since db collations are not an adequate replacement for CI text, we will create an open-source backport of the CITEXT fields.
Once we are done, I will open a PR to alter the documentation, to point towards this option. This should allow users to choose, and will probably easy migration to Django 5 for some.

But first, I gotta play Tears of the Kindom….

Cheers!
Joe


Johannes Maron

unread,
Sep 6, 2023, 1:56:49 PM9/6/23
to Johannes Maron, django-d...@googlegroups.com
Hi again,

We started creating a 3rd party django-citext package, to ensure future support of PotsgreSQL's CITEXT extension under a corporate funding.
You can find the project here: https://github.com/voiio/django-citext

While doing so, I noticed a couple of small things, where I'd love some clarification to know which way to go:
  1. Will the CITextExtension stay? It's currently not deprecated, it's super class implements array support.
  2. The documentation currently can be misleading. Would you consider proposals for some changes:
    • There is a note about performance considerations, yet I couldn't find any. There are some limitations, which rightfully need to be considered when using the citext extension.
    • The deprecation hint towards collations. However, as previously explained, they are by no means an equal replacement. Would you accept a reference to a named or unnamed 3rd party solution for future support of the citext extension.
  3. Finally, Django's admin or rather lookups, don't play particularly nice with collations. Something to consider in the deprecation process.

I am happy to get some feedback, especially on the extension and array support, since we haven't implemented that yet.
If you have any other pointers, feel free to leave an issue report.

Thanks!
Joe

gw...@fusionbox.com

unread,
Dec 20, 2023, 12:58:57 PM12/20/23
to Django developers (Contributions to Django itself)
This breaks search_fields. I can annotate a deterministic collation for simple fields. I don't understand why I have to do workarounds to get builtin stuff to work.

There's no workaround I can figure out across joins though. I have `search_fields = ['owners__email']`. Using an annotated field `owners__email_deterministic` fails: django.core.exceptions.FieldError: Unsupported lookup 'email_deterministic' for ForeignKey or join on the field not permitted.

I don't understand why I have to do workarounds to get builtin stuff to work. 

> As you can't index the citext column for LIKE queries, doing these types of searches on any real amount of data is going to be too slow in most cases.

I have 100k users and want to search them in the admin. The unindexed query takes 100ms, which is completely fine for this purpose.

Also, you CAN index a citext column for LIKE queries with pg_trgm.
 

Matthew Graham

unread,
Dec 20, 2023, 1:16:01 PM12/20/23
to django-d...@googlegroups.com
I only started trying to move to collations instead of citext recently, and it broke the regex validation as as non deterministic collation can't support regex validator, like what, why are we replacing something with an alternative that actually cant do the job as a replacement

Silvio

unread,
Feb 21, 2024, 7:58:51 PMFeb 21
to Django developers (Contributions to Django itself)
Coming in again now that I've looked at upgrading.

@Adam: your post was useful. But can you actually say you prefer the new approach?

But I'm going to be honest, this is a lot of hoops and gotchas. What did we actually gain by deprecating this? I'm seeing maybe 15-20 lines of code that will be removed? Maybe a touch of ideological purity?

Is it unheard of to cancel plans to deprecate this, since it's still in the code base?

99% of the use case is for CIEmailField, and 99% of people want this to be searchable, likely within nested models as Matthew is trying to do.

So we took something that worked really well and removed it. I just don't see the gain.

I hope we can change your minds. This is the first deprecation in 15 years of usage that I just can't get behind.

Johannes Maron

unread,
Feb 22, 2024, 3:13:40 AMFeb 22
to django-d...@googlegroups.com
I just say it: The decision certainly had good intentions, but maybe wasn't fully informed. It happens.

But since we don't really have a processed to revert a deprecation, I would recommend using the django-citext package. It's a drop-in replacement with the same license as Django and a corporate sponsor to ensure maintenance.

It takes one pip install and a sed command to replace some imports, and you grab a coffee and forget about it.

Silvio

unread,
Feb 23, 2024, 8:13:52 PMFeb 23
to Django developers (Contributions to Django itself)
True, not the end of the world. Just ... another dependency. The NPM world has traumatized me. Many thanks for creating that.

If there's nothing that can be done, it's time to move on. But worth asking. 

(Interestingly, even with the deprecation, historical migration fields still need to be supported, so I think some code has to remain)

Reply all
Reply to author
Forward
0 new messages