#20824 - Email-based auth contrib module; some help required

376 views
Skip to first unread message

Russell Keith-Magee

unread,
Feb 26, 2014, 2:33:24 AM2/26/14
to Django Developers
Hi all,

tl;dr; I've been working on a last little feature for 1.7 - an email-based User model for contrib, but I need some help/guidance on the last step - getting the tests working.

The idea here is that we just ship a normal auth.User, but with email identification rather than username. This is essentially the 80% case for requests for custom User models anyway, and it provides an example of "how to do it" for anyone that wants to follow the lead.

I've also taken the opportunity to do a bit of refactoring of auth forms and the AbstractUser to make re-use of common parts a bit easier.

Code is available here. Other than bikeshed stuff (name of module, name of classes etc) it shouldn't be too controversial:


There are two things still pending on this patch:

 1) Documentation. There isn't much to document, but obviously this is required.

 2) Tests that work. 

It's the second point where I need some assistance (or at the very least, some suggestions of directions I can go with this).

There isn't that much that requires testing -- after all, it's a fairly vanilla User model. A bit of light testing of forms, some checks to make sure that the models can be created, and that they play well with ModelBackend, and so on.

However, there's a problem with including these tests as part of Django's own testing infrastructure, due to clashes with Django's default User model.

In a production app, you either want a normal Django User model, in which case you don't include contrib.emailauth in your app; or you include contrib.emailauth, and you set AUTH_USER_MODEL must be set (I haven't added a formal check for this, but I could). As a result, there's no confusion over which User model is active. I've done some manual testing, and the user model all works as expected.

While running Django's test suite, however, the suite installs both contrib.auth and contrib.emailauth, but *doesn't* set AUTH_USER_MODEL to emailauth -- it can't, because we need to test default User behaviours. 

Both auth.User and emailauth.User have relations with Group, and the names clash, so the test suite won't start. 

If you deliberately silence the model integrity system checks, you get other errors, caused by the fact that there's confusion over which AUTH_USER_MODEL is active at any given time. 

So - I'm looking for suggestions for how this can be handled in Django's test suite. 

I've thought about explicitly excluding contrib.emailauth from the test suite's INSTALLED_APPS, but then the tests in emailauth don't get discovered and executed. 

Even if you reorganise the tests to make them part of the "django" tests, rather than tests that are part of the contrib app, you get problems because you end up with two models with the same relation name to Group.

So - anyone got any ideas how to handle this? Is there some feature of the new app loading code that I can exploit here?

Yours,
Russ Magee %-)

Tilman Koschnick

unread,
Feb 26, 2014, 4:39:53 AM2/26/14
to django-d...@googlegroups.com
Hi Russell,

On Wed, 2014-02-26 at 15:33 +0800, Russell Keith-Magee wrote:

> The idea here is that we just ship a normal auth.User, but with email
> identification rather than username.

I have just implemented an email based User model as well, and was
wondering about case sensitivity in the local part of the address.

Most if not all final MTAs treat email addresses as case insensitive,
e.g. User.Name@... ist handled the same way as user.name@..., and some
users might not take care or notice of which form they used during
registration. A common recommendation is to preserve but subsequently
ignore case.

I am handling this by using iexact lookups for users:

def get_by_natural_key(self, username):
return self.get(email__iexact=username)

In the database, uniqueness is enforced by an expression index:

CREATE UNIQUE INDEX app_user_email_key ON app_user (upper(email))

I am not sure if DBMSs other than PostgreSQL support expression indexes;
either way, there is not Django support (yet).

I don't know if this is of only theoretical concern, or a real world
issue - maybe others using email based users could share their
experience (my project is not deployed yet).

Kind regards, Til


Wim Feijen

unread,
Feb 26, 2014, 8:06:55 AM2/26/14
to django-d...@googlegroups.com
Hi Tilman,

Thanks for bringing this up. I lowercase my e-mail addresses every time - and when I forget I am wading through a pile of shit. (excusez le mot)

Your solution looks neater, because it maintains user input. I have had users who used Stefan....@gmail.com once and later on: stefan....@gmail.com to log in. (not using his actual name)

In addition, we would need a proper form error when we try to create a user having a case-insensitive duplicate e-mail.

Wim 

Mark Lavin

unread,
Feb 26, 2014, 8:55:15 AM2/26/14
to django-d...@googlegroups.com
Welcome to the pain that third party app maintainers have already been dealing with for the past year. I ran into this problem as well when I added custom user support to one of the apps I maintain. As you noted you can't simply change the AUTH_USER_MODEL setting because once the test DB is created the FK references are set. The only solution I found was to use tox (which I was already using to test versus multiple Python/Django versions) and run the full suite once with the standard user model and again with a custom user model. Some tests are skipped when the default user is used and others are skipped with the custom user is in use. You can see it in action here: https://github.com/mlavin/django-all-access

I'd love to see a more general solution to this but I have a feeling anything that attempts to modify the DB tables between tests is destined to fail.

Best,

Mark

Aymeric Augustin

unread,
Feb 26, 2014, 9:33:00 AM2/26/14
to django-d...@googlegroups.com
2014-02-26 8:33 GMT+01:00 Russell Keith-Magee <rus...@keith-magee.com>:
Is there some feature of the new app loading code that I can exploit here?

Unfortunately not. One of the main differences between Arthur Koziel's GSoC and what I committed is that I removed the "reload()" feature for the app cache. That feature would have solved your problem.

It was implemented by removing models modules from sys.modules and re-importing them. Since imports can have side effects in Python, and considering the amount of pain duplicate imports have caused before Carl fixed the projet layout, I didn't want to reimport modules. I wasn't comfortable with hacking sys.modules either because smart hacks tend to backfire.

I've been thinking about creating a copy of auth.Group tied to EmailUser -- which would just mean executing of copy of the definition of the Group model while AUTH_USER_MODEL points to EmailUser -- and injecting it into the app registry by abusing private APIs. But that would require a copy of auth.Permission and any related model, recursively, which is impractical.

You could hack into auth.Group internals to remove and recreate its relations, but that's hardly setting a good precedent.

The current implementation of the ORM assumes that your models don't change at runtime and I'm not sure you can work around that sustainably.

--
Aymeric.

Russell Keith-Magee

unread,
Feb 26, 2014, 7:11:47 PM2/26/14
to Django Developers

This is a good point - I hadn't fully considered the issue of case in email addresses. 

I agree that a case-insensitive index would be ideal; however, that's not an option we have available across databases. 

Wim's suggestion to do case insensitive searches in forms, get_by_natural_key etc sounds like the only option that would be available to us. We could go the extra step to cleanse data on the way into the database (e.g., forcing lowercase on save) but we can't guarantee that data isn't finding it's way into the user table from other sources, so we shouldn't rely on the fact that the database contains a "known to be in lower case" representation.

Yours,
Russ Magee %-)

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/fab46e55-7dad-4d7d-be55-ea13b606431d%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Curtis Maloney

unread,
Feb 26, 2014, 7:31:57 PM2/26/14
to django-d...@googlegroups.com
Doesn't the UserManager already have a "normalize_email" method which lower-cases the domain and leaves the mailbox name alone?

IMHO It's "proper" to leave it this way by default, and probably mention in the docs it's used so if you want to change it, that's the hook.

[We went through this recently with a client.  They opted for all lower case but only because of legacy data.]

--
Curtis



Russell Keith-Magee

unread,
Feb 26, 2014, 7:43:35 PM2/26/14
to Django Developers
On Thu, Feb 27, 2014 at 8:31 AM, Curtis Maloney <cur...@acommoncreative.com> wrote:
Doesn't the UserManager already have a "normalize_email" method which lower-cases the domain and leaves the mailbox name alone?

IMHO It's "proper" to leave it this way by default, and probably mention in the docs it's used so if you want to change it, that's the hook.

It does - assuming you use User.objects.create_user() to create all your users. However, the UserCreationForm doesn't use this (and hasn't ever used this); it also doesn't account for fixtures, or any other path into the database that might exist. 

So - while normalising case is probably a good idea, and should probably be added to the Create/Update User form, the searches will still need to be case insensitive.

Yours,
Russ Magee %-)

Atul Bhouraskar

unread,
Feb 26, 2014, 8:01:40 PM2/26/14
to django-d...@googlegroups.com
Won't normalize_email will allow two distinct users us...@example.com and Us...@example.com to be created? Case insensitive searches will return multiple users for a 'get'.

Perhaps the closest we can get is to ensure that any user created using Django functions is saved with a consistent case transformation and then perform an *exact* search after applying the same transformation to the input?

One idea could be to add a 'transform_email' or similar hook that by default calls normalize_email and ensure that it is applied both to data that is about to be saved and to search terms. Projects that wish to change the behaviour can simply override transform_email to perform for example lowercase conversion if so desired.

Atul


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Rafał Pitoń

unread,
Feb 26, 2014, 11:21:26 PM2/26/14
to django-d...@googlegroups.com
I've got plenty ideas for approaching issue of enforcing case insensitive uniqueness of email on DB level while still maintaining original case of name part. They differ in level of hackiness or amount of change needed, but if somebody has to make fool of himself proposing them, please accept my humble sacrifice. ;)

1. Create extra model unique field containing lowercase email or some hash of it - this looks very hacky thing to do that would give plenty of people "ahaaa!" moment when they run into problems... and they won't be pleased aftherwards.

2. Add ci_unique option to the ORM fields - this will need changes in the ORM's and migrations implementation, plus note in docs that it won't fly on certain DB's and other means would be used.

3. Add "case_sensitive=True" kwarg to create_unique for use in migrations that would make created unique constraint case insensitive. Then, add note to the user auth doc's about gotcha with case sensitivity and provide extra documentation with examples for solving this problem via that function and custom migrations. My favorite because it appears to be least invasive towards existing codebase, may be reauseable for other custom use-aces, but may be considered secret knowledge by people only just beginning with Django if not presented correctly in docs.

Camilo Torres

unread,
Feb 27, 2014, 6:50:38 PM2/27/14
to django-d...@googlegroups.com
normalize_email will indeed allow both us...@example.com and Us...@example.com to be different entities. From the user perspective, this is an error. Most probably the user enters some day their email in all upper case because he pressed the CapsLock key, or copy pasted a transcript of his email, etc., the user could not be able to figure out the difference and simply could not log in.

Is preferable to think, in this case, in protecting the user from their own mistake (from our perspective as programmers), and do as Atul Bhouraskar says: transform the email data in a consistent way before insert/update/search to the DB.

waylan

unread,
Feb 27, 2014, 9:25:31 PM2/27/14
to django-d...@googlegroups.com
On Thursday, February 27, 2014 6:50:38 PM UTC-5, Camilo Torres wrote:
normalize_email will indeed allow both us...@example.com and Us...@example.com to be different entities. From the user perspective, this is an error. Most probably the user enters some day their email in all upper case because he pressed the CapsLock key, or copy pasted a transcript of his email, etc., the user could not be able to figure out the difference and simply could not log in.

How about a more common real world situation -- on mobile devises? They will often "helpfully" assume the first letter of the first word in a field should be uppercase and do so for you. You have to be paying attention and specifically tap the shift key before typing to turn it off, then type your email address.
 
This is an annoyance of mine because native UIs will often recognize that you are typing in an email field and not do this (even giving you a custom keyboard with the @ symbol), but web pages will not. Perhaps on some devices <input type=email/> does the right thing -- but in my experience it isn't consistent (based on general observations only).
 
The point is that the user will inadvertently type User@example.com on their mobile device after registering with us...@example.com on their desktop/laptop. And they won't understand why they can't log in if it is case sensitive. Given that both addresses will deliver email to the same email account, it seems to me that both should be valid for the same account as an email based username on a sign-in form.
 
Waylan Limberg

schinckel

unread,
Feb 27, 2014, 11:11:26 PM2/27/14
to django-d...@googlegroups.com


On Friday, February 28, 2014 12:55:31 PM UTC+10:30, waylan wrote:
On Thursday, February 27, 2014 6:50:38 PM UTC-5, Camilo Torres wrote:
normalize_email will indeed allow both us...@example.com and Us...@example.com to be different entities. From the user perspective, this is an error. Most probably the user enters some day their email in all upper case because he pressed the CapsLock key, or copy pasted a transcript of his email, etc., the user could not be able to figure out the difference and simply could not log in.


[snip]
 
 
The point is that the user will inadvertently type User@example.com on their mobile device after registering with us...@example.com on their desktop/laptop. And they won't understand why they can't log in if it is case sensitive. Given that both addresses will deliver email to the same email account, it seems to me that both should be valid for the same account as an email based username on a sign-in form.

But there's the rub. Whilst for _most_ email servers, this will indeed send mail to the same account, there's nowhere that says this _must_ happen.


Matt. 

Tilman Koschnick

unread,
Feb 28, 2014, 4:48:50 AM2/28/14
to django-d...@googlegroups.com
On Thu, 2014-02-27 at 20:11 -0800, schinckel wrote:

> But there's the rub. Whilst for _most_ email servers, this will indeed
> send mail to the same account, there's nowhere that says this _must_
> happen.

I agree. Reading RFC 822, it says

The domain-dependent
string is uninterpreted, except by the final sub-domain; the rest
of the mail service merely transmits it as a literal string.

and

The local-part of an addr-spec in a mailbox specification
(i.e., the host's name for the mailbox) is understood to be
whatever the receiving mail protocol server allows.

So normalize_email() already does the right thing by normalising the
domain part, and leaving the local part alone. It is my understanding
that normalising the local part (by e.g. lowercasing) is never the right
thing.

There are three possible situations:

1. Receiving MTA does not care about case: Users can enter whatever form
they like, and expect to receive their mail. No problem with
normalisation in Django, although it might be good to preserve the
user's preferred form. User@... and user@... are actually the same
address on the MTA, so enforcing uniqueness with iexact lookup is no
problem, and acutally necessary to catch all variants the user might
enter on login.

2. Receiving MTA does care about case: Django must preserve case in
local part if we care about such users receiving email from us. We can
trust the users of such a MTA to enter the right case. Enforcing
uniqueness across case is not necessary, but not a problem either
unless...

3. ... there are actually distinct accounts with User@... and user@...
(and USER@... etc.) on that MTA, and more than one such user tries to
register with a Django app.

There is no way to support all three, and I think case 1. is by far the
most common, followed by 2., and I am willing to bet that 3. does not
happen in the wild.

My preferred solution would be to leave normalize_email() as is, and to
enforce case insensitive uniqueness by whatever means Django and/or the
DBMS offers.

Kind regards, Til


Tilman Koschnick

unread,
Feb 28, 2014, 5:05:18 AM2/28/14
to django-d...@googlegroups.com
On Thu, 2014-02-27 at 08:43 +0800, Russell Keith-Magee wrote:

> It does - assuming you use User.objects.create_user() to create all
> your users. However, the UserCreationForm doesn't use this (and hasn't
> ever used this); it also doesn't account for fixtures, or any other
> path into the database that might exist.

Would it be possible to additionally guard in the save() method of the
email User class against duplication by checking with "not
filter(email__iexact=...).exists()"?

The documentation could also explain how to manually add a constraint at
the database level where this is supported. (The current kickstarter
project for improved PostgreSQL support seems likely to add expression
indexes to Django proper one day.)

> So - while normalising case is probably a good idea, and should
> probably be added to the Create/Update User form, the searches will
> still need to be case insensitive.

While there is no way to enforce this, it could at least be added to the
documentation that get_by_natural_key() should be used for email based
users, or filter(email__iexact=...).

Kind regards, Til

Val Neekman

unread,
Feb 28, 2014, 11:24:40 AM2/28/14
to django-d...@googlegroups.com
The following may be a viable solution for the case-insensitive fields. (email, username, alias ... etc.)

Look at lines: 23, 66 and 159.


Val




--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

JJ Zolper

unread,
Sep 7, 2014, 2:26:49 PM9/7/14
to django-d...@googlegroups.com
Hey guys,

I can only imagine how busy everyone has been with Django 1.7 and getting that wrapped up with all the great new features it includes. I know I have been spending a good bit of time updating my work.

I know Russell said here that we were thinking it could be put out in 1.7 but I totally understand from all that has been going on that it didn't get released just yet. I looked through and from what I reviewed I didn't see it just yet which is totally fine. I thought it was fairly simple but I can see I certainly don't know enough as I see there are a lot of details that arose in this thread.

I'm by no means trying to come off as impatient, just if anyone could give me an update on where things are that would be great as I'm just really curious about this little app and what changes I would then make.

If it's not too much to ask and it is even doable based on how things are built I was hoping to put in my 2 cents below:

When Russell said "Both auth.User and emailauth.User have relations with Group, and the names clash, so the test suite won't start." I have a comment.

My best case scenario preference for the name of the app would be "django.contrib.auth.EmailUser". Would that then solve the issue of the names clashing as then it would be "auth.User" and "auth.EmailUser"? I also think it is cleaner to not put "emailauth" and put the home for both "User" and "EmailUser" in "auth".

If anyone has any input on whether this can be done that would be great! Keep up the great work as always, I have been thoroughly enjoying migrations!

JJ


On Wednesday, February 26, 2014 2:33:24 AM UTC-5, Russell Keith-Magee wrote:

Russell Keith-Magee

unread,
Sep 8, 2014, 8:34:12 PM9/8/14
to Django Developers
Hi JJ,

On Mon, Sep 8, 2014 at 2:26 AM, JJ Zolper <jzt...@gmail.com> wrote:
Hey guys,

I can only imagine how busy everyone has been with Django 1.7 and getting that wrapped up with all the great new features it includes. I know I have been spending a good bit of time updating my work.

I know Russell said here that we were thinking it could be put out in 1.7 but I totally understand from all that has been going on that it didn't get released just yet. I looked through and from what I reviewed I didn't see it just yet which is totally fine. I thought it was fairly simple but I can see I certainly don't know enough as I see there are a lot of details that arose in this thread.

I'm by no means trying to come off as impatient, just if anyone could give me an update on where things are that would be great as I'm just really curious about this little app and what changes I would then make.

It's been a while since I've looked at the code, but to the best of my recollection, the code itself was in pretty good shape (after all, there isn't *that* much that needs to be done) - but there were two issues.

The first was related to testing, which you've highlighted:
 
If it's not too much to ask and it is even doable based on how things are built I was hoping to put in my 2 cents below:

When Russell said "Both auth.User and emailauth.User have relations with Group, and the names clash, so the test suite won't start." I have a comment.

My best case scenario preference for the name of the app would be "django.contrib.auth.EmailUser". Would that then solve the issue of the names clashing as then it would be "auth.User" and "auth.EmailUser"? I also think it is cleaner to not put "emailauth" and put the home for both "User" and "EmailUser" in "auth".

Yes, this makes the problem go away. However, I'm not especially happy about this as an approach. 

On the bikeshed scale, I'm not especially happy about the name "emailauth.EmailUser" - repeating the "email" bit strikes me as a redundancy. 

The other, bigger issue is the problem that this reveals with swappable models in general. On some level, I consider contrib.auth to be a testbed for the broader feature of swappable models, and while we *could* work around this problem by holding our collective noses about the name, I'd like to take a stab at addressing the bigger problem (or at least establishing that the bigger problem *can't* be solved, so we know nose-holding is a required engineering approach :-)

The second problem revealed with a generic solution to email-based auth was the handling around case sensitivity in email usernames (raised by Tilman). Truly RFC-compliant case sensitivity in emails requires a whole bunch of logic that doesn't map well to Django's capabilities; slightly less RFC-compliant solutions are possible, but at the time I proposed the patch, I didn't have enough time to run through all the consequences of these changes.

Compounding all this is a lack of time on my part. I'm still keen to see a solution to this problem, and I'm willing to commit to some review and feedback, but I don't have enough cycles at the moment to dedicate to doing this myself.

I appreciate this is quite frustrating; it seems like such a little feature, and it keeps getting deferred. 

Yours,
Russ Magee %-)
Reply all
Reply to author
Forward
0 new messages