Case-insensitive email as username

1,040 views
Skip to first unread message

Aymeric Augustin

unread,
Nov 22, 2015, 1:56:37 PM11/22/15
to django-d...@googlegroups.com
Hello,

I spent a good part of today implementing what must be the most common scenario for custom user models: case-insensitive email as username. (Yes. This horse has been beaten to death. Multiple times.)

Since it was the first time I implemented a custom user model from scratch by myself, I’d like to share my experience in case that’s useful to others. Do you think there’s a better solution? Do you have concrete ideas for improving Django in this area?

The main alternative I’m aware of is a custom email field based on PostgreSQL’s citext type. Perhaps I’ll try that next time. Anyway, here’s what I did this time.


1) The documentation is excellent

 I know a lot of effort has been put into improving it and it shows. Congratulations to everyone involved.


2) Custom indexes would be convenient

Since I want to preserve emails as entered by the users, I cannot simply lowercase them. That would have been too easy.

I ended up with this migration to add the appropriate unique index on LOWER(email). See the comments for details.

    operations = [
        migrations.CreateModel(
            name='User',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                # … 
                # unique=True was removed from the autogenerated line; a unique index is created below.
                ('email', models.EmailField(error_messages={'unique': 'A user with that email already exists.'}, max_length=254, verbose_name='email address')),
                # … 
            ],
            # … 
        ),
        migrations.RunSQL(
            # Based on editor._create_index_sql(User, [User._meta.get_field('email')], '_lower')
            sql='CREATE UNIQUE INDEX "blabla_user_email_f86edd9d_lower" ON "blabla_user" (LOWER("email"))',
            reverse_sql='DROP INDEX "blabla_user_email_f86edd9d_lower"',
            state_operations=[
                migrations.AlterField(
                    model_name='user',
                    name='email',
                    field=models.EmailField(error_messages={'unique': 'A user with that email already exists.'}, max_length=254, unique=True, verbose_name='email address')
                ),
            ],
        ),
    ]

It took me some time to get there. At first I tried simply removing unique=True on the email field but that didn’t work well.

I know there’ve been discussions about custom indexes. They would make this use case much easier.


3) Redefining forms isn’t too bad

I was getting quite bored of copy-paste-tweaking snippets (custom model, custom manager, custom admin, …) when I got to defining custom forms. Fortunately, a small mixin was all I needed.

(Read on for why this code uses `User.objects.get_by_natural_key(email)`.)

from django.contrib.auth.forms import UserChangeForm as BaseUserChangeForm
from django.contrib.auth.forms import UserCreationForm as BaseUserCreationForm
from django.core.exceptions import ValidationError


class CaseInsensitiveUniqueEmailMixin:
    """
    ModelForm mixin that checks for email unicity, case-insensitively.

    """

    def clean_email(self):
        email = self.cleaned_data['email']
        User = self._meta.model
        field = User._meta.get_field('email')
        try:
            User.objects.get_by_natural_key(email)
        except User.DoesNotExist:
            return email
        else:
            raise ValidationError(
                message=field.error_messages['unique'],
                code='unique',
            )


class UserChangeForm(CaseInsensitiveUniqueEmailMixin, BaseUserChangeForm):
    pass


class UserCreationForm(CaseInsensitiveUniqueEmailMixin, BaseUserCreationForm):
    pass


4) The ugly hack

My first ideas was to write a custom authentication backend to look up users by email case-insensitively. But I was getting bored and I noticed that django.contrib.auth uses `UserModel._default_manager.get_by_natural_key` to look up users. So...

class UserManager(BaseUserManager):
    """
    Manager for the User class defined below.

    Quite similar to django.contrib.auth.models.UserManager.

    """

    # ...

    def get_by_natural_key(self, email):
        qs = self.annotate(email_lower=Lower('email'))
        return qs.get(email_lower=email.lower())

/!\ This is entirely dependent on implementation details of django.contrib.auth. It can break when you upgrade Django; don’t blame it on me. /!\

That said, the nice side effect of this implementation is that it makes the unicity check in createsuperuser work as expected. I’m not aware of any other way to fix it with the database schema I chose.

I suppose an implementation of custom unique indexes with support for checking unicity constraints would make that point moot.


Best regards,

-- 
Aymeric.



Carl Meyer

unread,
Nov 23, 2015, 5:53:36 PM11/23/15
to django-d...@googlegroups.com
Hi Aymeric,

On 11/22/2015 11:56 AM, Aymeric Augustin wrote:
> I spent a good part of today implementing what must be the most common
> scenario for custom user models: case-insensitive email as username.
> (Yes. This horse has been beaten to death. Multiple times.)
>
> Since it was the first time I implemented a custom user model from
> scratch by myself, I’d like to share my experience in case that’s useful
> to others. Do you think there’s a better solution? Do you have concrete
> ideas for improving Django in this area?
>
> The main alternative I’m aware of is a custom email field based on
> PostgreSQL’s citext type. Perhaps I’ll try that next time. Anyway,
> here’s what I did this time.

I've implemented the CITEXT-based solution a couple times; I think for a
PostgreSQL-based project it's the preferable option overall.

The main complexity is just in adding a dependency on a Postgres
extension, which is unfortunately non-trivial in general -- you can't
easily just stick "CREATE EXTENSION" in a migration since creating the
extension requires database super-user privileges, which the Django db
user may not have. I usually just punt by documenting that the project
requires you to set up you database with the extension installed.

I suppose the best we could do to ease this would be to add a
CreateExtension migration operation in contrib.postgres that, if lacking
super-user permissions, simply errors out and tells you what SQL you
need to run manually as a super-user?

I also had to add a connection_created handler to register a conversion
to unicode for CITEXT fields, as psycopg2 otherwise returned them as str
(this was Python 2 - that issue may not exist in Python 3). This issue
would be easy for Django to fix if we want to bake in any level of
CITEXT support in contrib.postgres.

Once you clear those initial hurdles, the CITEXT solution is smooth
sailing. All the rest of the issues covered in your post go away
entirely, since the database handles all the case-management (original
case preserved but ignored in lookups and indexes) internally, and the
ORM doesn't need to know anything about it.

Carl

signature.asc

Carl Meyer

unread,
Nov 23, 2015, 6:18:35 PM11/23/15
to django-d...@googlegroups.com
On 11/23/2015 03:52 PM, Carl Meyer wrote:
...
> I suppose the best we could do to ease this would be to add a
> CreateExtension migration operation in contrib.postgres that, if lacking
> super-user permissions, simply errors out and tells you what SQL you
> need to run manually as a super-user?

...and now that I look, it appears we already have the `CreateExtension`
operation :-) It doesn't appear to do anything special to handle the
permissions problem. Maybe there's not much useful that can be done.

Carl

signature.asc

Podrigal, Aron

unread,
Nov 23, 2015, 10:41:10 PM11/23/15
to Django developers (Contributions to Django itself)

Why not creating the index as LOWER(email) and do the lookup as LOWER?

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/56539EB1.7060802%40oddbird.net.
For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Nov 24, 2015, 6:33:54 AM11/24/15
to django-d...@googlegroups.com
2015-11-23 23:52 GMT+01:00 Carl Meyer <ca...@oddbird.net>:

I've implemented the CITEXT-based solution a couple times; I think for a
PostgreSQL-based project it's the preferable option overall.

Perhaps we should add native support in contrib.postgres?

I'm forseeing a small difficulty in terms of API. This is a behavior I'd like
to "mix in" to some fields but I can't say if that will be easy to implement.

The general ideas would be:

# A mixin

class CITextField(TextField):
    # ...

# Case-insensitive versions of some built-in Django default fields
# (if we consider that makes sense)

class CIEmailField(CITextField, EmailField):
    pass

# The possibility for users to make custom fields case insensitive

class CITagField(CITextField, TagField):
    pass

--
Aymeric.

Chris Foresman

unread,
Nov 24, 2015, 11:30:51 AM11/24/15
to Django developers (Contributions to Django itself)
We usually just handle this with a custom serializer (or form) field that converts all input to lowercase. That way we don't have to change any lookups or anything; all emails that come in to the system are already lowercase. Of course, that doesn't preserve what users enter but IME anything uppercase is just a fault of using the wrong text box on iOS or Android.

Carl Meyer

unread,
Nov 24, 2015, 2:47:26 PM11/24/15
to django-d...@googlegroups.com
On 11/24/2015 04:33 AM, Aymeric Augustin wrote:
> 2015-11-23 23:52 GMT+01:00 Carl Meyer <ca...@oddbird.net
> <mailto:ca...@oddbird.net>>:
>
> I've implemented the CITEXT-based solution a couple times; I think for a
> PostgreSQL-based project it's the preferable option overall.
>
> Perhaps we should add native support in contrib.postgres?

I'm in favor. I think it's likely as common a need (if not more common)
as the other utilities provided there -- even if most people who need it
are currently probably using an ORM-level or form-level instead of
db-level solution.

> I'm forseeing a small difficulty in terms of API. This is a behavior I'd
> like
> to "mix in" to some fields but I can't say if that will be easy to
> implement.
>
> The general ideas would be:
>
> # A mixin
>
> class CITextField(TextField):
> # ...
>
> # Case-insensitive versions of some built-in Django default fields
> # (if we consider that makes sense)
>
> class CIEmailField(CITextField, EmailField):
> pass
>
> # The possibility for users to make custom fields case insensitive
>
> class CITagField(CITextField, TagField):
> pass

Here's the entire implementation of my CiEmailField:

class CiEmailField(models.EmailField):
"""An EmailField that uses the CITEXT Postgres column type."""
def db_type(self, connection):
return 'CITEXT'

So I think that's quite amenable to a mixin implementation.

Carl

signature.asc
Reply all
Reply to author
Forward
0 new messages