Issue with Django Password Normalization

96 views
Skip to first unread message

Arun S

unread,
Apr 20, 2016, 6:44:49 AM4/20/16
to Django users
Hi,

As a Part of a very big project for a Company, we follow CSDL rules and we use Django Extensively.

As a part of Django, we would like to Follow certain Normalisation Process for all the Passwords during User Login.
In the Django Documentation, there isn’t any information on whether Django already follows any Particular Normalisation Process.

Could anyone please let me know if Django already Follows certain Normalisation process.
In case it does, what kind of Process is used and at which point is this applied?

In case, there is no Normalisation process applied, then can a middleware be added to support this?


Regards
Arun.

Arun S

unread,
Apr 20, 2016, 8:41:37 AM4/20/16
to Django users
basically I would like to know if the latest version of django already supports any kind of normalization for the login passwords.

Erik Cederstrand

unread,
Apr 20, 2016, 9:03:03 AM4/20/16
to Django Users
> Den 20. apr. 2016 kl. 14.41 skrev Arun S <arun...@gmail.com>:
>
> basically I would like to know if the latest version of django already supports any kind of normalization for the login passwords.

What exactly do you mean by "password normalization"? Do you want passwords to be case-insensitive? If so, you can subclass AuthenticationForm and override clean_password(), and set_password() on your user model, and put any transformations of the raw password there.

If you want to enforce certain password rules (length, must contain numbers and special chars, etc) then override AuthenticationForm.clean_password() and raise ValidationError() for your rules.

Erik

Michal Petrucha

unread,
Apr 20, 2016, 9:06:48 AM4/20/16
to django...@googlegroups.com
On Wed, Apr 20, 2016 at 05:41:37AM -0700, Arun S wrote:
> basically I would like to know if the latest version of django
> already supports any kind of normalization for the login passwords.

Most of us probably have no idea what kind of normalization you are
talking about. At least I know I don't. I did try to search for
resources on password normalization, but I didn't find anything that
seemed relevant in the least bit; all I found was a ton of information
on database normalization, which does not have anything with password.

Django does hash passwords using cryptographically secure password
hashing algorithms, if that's what you're asking. If you're interested
in anything else, then the question is, what is your use case, and is
there any reason why a custom hassing function would not do the job?

Regards,

Michal
signature.asc

Simon Charette

unread,
Apr 20, 2016, 9:14:39 AM4/20/16
to Django users
Hi Arun,

I'm not sure this is what you are referring to but Django 1.9 ships
with password validation[1] which can be configured to:

  1. Prevent field similarity (e.g. password == username)
  2. Enforce minimum password length
  3. Prevent usage of common password (e.g. "password")
  4. Enforce the usage of at least one numeric character

You could also define your own rules and add them to AUTH_PASSWORD_VALIDATORS,


Simon

[1] https://docs.djangoproject.com/en/1.9/topics/auth/passwords/#module-django.contrib.auth.password_validation

Arun S

unread,
Apr 20, 2016, 9:16:18 AM4/20/16
to Django users
let me try to clear my question.

please correct me if am wrong.
basically all I want to know is that there already exists a number of Unicode normalization forms.
Reference

Unicode normalization forms: http://unicode.org/reports/tr15/#Norm_Forms

so as I said as a part of a company norms, the project needs to follow certain csdl standards and according to that it states that all passwords shall be normalised according to the ref mentioned and then convert then to a utf8 which then follows thru the hashing process.

so since the major part of the project uses djangos frameworks, I believe that the user authentication methods used already applies the hashing algorithms.

but what I could not figure out is that
1: does django apply any such normalization process for the user passwords.
2: how is it different between a normalised password and then hashed with djangos hashing algorithm s and a non normalised password just saved after hashing.


Michal Petrucha

unread,
Apr 20, 2016, 9:20:53 AM4/20/16
to django...@googlegroups.com
On Wed, Apr 20, 2016 at 03:02:06PM +0200, Erik Cederstrand wrote:
> Do you want passwords to be case-insensitive? If so, you can
> subclass AuthenticationForm and override clean_password(), and
> set_password() on your user model, and put any transformations of
> the raw password there.

A more robust way to implement this would be to write a custom
password hasher, as described in the docs:
https://docs.djangoproject.com/en/1.9/topics/auth/passwords/#writing-your-own-hasher

That way, you wouldn't risk missing any code paths where you'd forget
to make the password transformation manually, and it would allow you
to use the built-in views and forms instead of having to roll your
own.

That being said, making passwords case-insensitive sounds like a bad
idea, and you should only ever do that if you know really well what
you are doing.

> If you want to enforce certain password rules (length, must contain
> numbers and special chars, etc) then override
> AuthenticationForm.clean_password() and raise ValidationError() for
> your rules.

Again, Django already provides a more robust mechanism for that, which
has been introduced in version 1.9:
https://docs.djangoproject.com/en/1.9/topics/auth/passwords/#module-django.contrib.auth.password_validation

Cheers,

Michal
signature.asc

Avraham Serour

unread,
Apr 20, 2016, 9:21:18 AM4/20/16
to django-users
in summary: "Unicode Normalization Forms are formally defined normalizations of Unicode strings which make it possible to determine whether any two Unicode strings are equivalent to each other"

as I see this would be highly unsecure for passwords, this is something like converting special characters to latin characters, or forcing lower case only



--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/69f70909-215e-4daa-a770-a10b3c2de63a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arun S

unread,
Apr 20, 2016, 9:32:16 AM4/20/16
to Django users
Does that mean that Unicode Normalisation is a very weak and unsecure way for passwords?

In this case, what is the actual Usage of Unicode Normalization ?
Why exactly do we need something like a Unicode Normalization ?

Offcourse django provides various ways to strengthen and vallidate the passwords.
that can be used.

But also Observed is that the Django Code does the Unicode Normalization for User names and Email Ids using NKFD Normalisation Algorithm.

Michal Petrucha

unread,
Apr 20, 2016, 9:43:01 AM4/20/16
to django...@googlegroups.com
> On Wed, Apr 20, 2016 at 4:16 PM, Arun S <arun...@gmail.com> wrote:
>
> > let me try to clear my question.
> >
> > please correct me if am wrong.
> > basically all I want to know is that there already exists a number of
> > Unicode normalization forms.
> > Reference
> >
> > Unicode normalization forms: http://unicode.org/reports/tr15/#Norm_Forms
> >
> > so as I said as a part of a company norms, the project needs to follow
> > certain csdl standards and according to that it states that all passwords
> > shall be normalised according to the ref mentioned and then convert then to
> > a utf8 which then follows thru the hashing process.
> >
> > so since the major part of the project uses djangos frameworks, I believe
> > that the user authentication methods used already applies the hashing
> > algorithms.
> >
> > but what I could not figure out is that
> > 1: does django apply any such normalization process for the user passwords.
> > 2: how is it different between a normalised password and then hashed with
> > djangos hashing algorithm s and a non normalised password just saved after
> > hashing.

Ah, unicode normalization, now your question makes a lot more sense.

A quick grep through the sources of Django didn't reveal any explicit
normalization of plaintext passwords. However, you can implement your
own hashing function that will use the unicodedata module from the
standard library to perform unicode normalization, and then call one
of the built-in password hashers to do the actual work.

On Wed, Apr 20, 2016 at 04:20:35PM +0300, Avraham Serour wrote:
> in summary: "Unicode Normalization Forms are formally defined
> normalizations of Unicode strings which make it possible to determine
> whether any two Unicode strings are equivalent to each other"
>
> as I see this would be highly unsecure for passwords, this is something
> like converting special characters to latin characters, or forcing lower
> case only

I don't agree – when a user enters the letter Å, you almost never care
wheher it is represented as U+00C5, or U+0041 U+030A – they represent
the same character, just in two different ways. Unicode normalization
is a tool to recognize those two representations as the same thing.

As a user, if I use a password containing non-ASCII characters, I
definitely wouldn't expect that with the same correct password, I
would be able to authenticate using a browser that sends the data in a
composed normal form, but not with another browser that sends the same
string in a decomposed normal form.

Regards,

Michal
signature.asc

Avraham Serour

unread,
Apr 20, 2016, 9:45:54 AM4/20/16
to django-users
actually upon further reading the document it seems it specifies on how to handle unicode, it tells how unicode strings whould be stored.

if that's the case then it is not a django problem but a python problem.

if you are on python 3 then you are using unicode strings, python handles that for you.

if you are running on python 2 I believe django uses explicit unicode strings, you should double check that.



Rick Leir

unread,
Apr 20, 2016, 10:22:27 AM4/20/16
to Django users
There is also a new issue in Trac on this topic. I added two links to Stackoverflow discussions there.

The issue: supposing a password is mañana. Depending on what client you use, input methods can give you two different UTF8 characters for ñ. As a first step, let's add test case, and check whether it fails.

My guess (tho I am new to this) is that this is a Django issue not Python.
Cheers-- Rick

Arun S

unread,
Apr 20, 2016, 10:42:26 AM4/20/16
to Django users
For ex, adding the Django Code Snippet for handling User names in the Login Page :

default_username = (unicodedata.normalize('NFKD', default_username)
So Django does follow Normalizing of Usernames usign NFKD Algorithm.
Then applies Hashing Algorithms on this.

But the same is never followed for Passwords.
Is this done on Purpose that the HASHING algorithm takes care of whatever required and Normalization isnt quite required for such purpose.

Even the Django Documentation does'nt talk about Unicode Normalizing on Passwords but you can still find it for Other forms of Text inputs.

Michal Petrucha

unread,
Apr 20, 2016, 2:02:56 PM4/20/16
to django...@googlegroups.com
On Wed, Apr 20, 2016 at 07:42:26AM -0700, Arun S wrote:
> For ex, adding the Django Code Snippet for handling User names in the Login
> Page :
>
> default_username = (unicodedata.normalize('NFKD', default_username)
> So Django does follow Normalizing of Usernames usign NFKD Algorithm.
> Then applies Hashing Algorithms on this.

Not really – the line of code you quoted above is only used to
generate the default username in the createsuperuser management
command, based on the current system account. The purpose there is to
turn a string that potentially contains diacritics or other non-ASCII
characters into a stripped-down ASCII-only version.

The only other uses of Unicode normalization I found in Django are a
similar case when slugifying strings, code that truncates strings to a
certain length (where normalization is used to ensure that combining
marks do not count as separate characters), and the handling of the
decimal separator in decimal numbers.

So no, Django does not normalize usernames. Django does not normalize
anything, other than when stripping out all non-ASCII characters and
diacritic marks from strings.

> *But the same is never followed for Passwords.*
> Is this done on Purpose that the HASHING algorithm takes care of whatever
> required and Normalization isnt quite required for such purpose.
>
> Even the Django Documentation does'nt talk about Unicode Normalizing on
> Passwords but you can still find it for Other forms of Text inputs.

For the record, I personally think Unicode normalization is a
reasonable feature request for Django, if nothing else, then at least
because of the example with Unicode in passwords. However, I'm not
certain at the moment which layers of Django deal with bytestrings,
and which handle Unicode objects, and I have no idea where such
handling would belong.

I think it might be a good idea to bring this up on django-developers@
to see if other people think it is worth including in Django core or
not. I would recommend describing specific cases where normalization
is necessary. If this is just a hypothetical request, “just in case”
some client sends denormalized requests (or with unusual
normalization), but there are no actual existing client
implementations that would to that, it's probably not worth the
effort.

Regards,

Michal
signature.asc

Rick Leir

unread,
Apr 21, 2016, 10:30:07 AM4/21/16
to Django users
Here are the Stackoverflow discussions I mentioned Ñ )oops I have the Espanol keyboard selected=

http://stackoverflow.com/questions/2798794/how-do-i-properly-implement-unicode-passwords

Maybe we should not permit unicode passwords: 
   http://stackoverflow.com/questions/1797777/should-i-support-unicode-in-passwords

One issue for passwords is that you might have different Input Methods when you use different browsers, making it more difficult to login. Are Input Methods much different among browsers?
 We only need to consider browsers, clearly, not other UI's. (please correct me if there is any other, say Qt GUI)
The issue for usernames is that you could spoof someone else's username, and appear to be (impersonate) another person. The attacker can easily enter a character which looks the same but has a different Unicode point.  Michal, as you say, we would want to normalize the chars. And as you say, it is a topic for the dev list.

But how important is this issue? Yes, it is security related. But it is far from critical in my mind.

Michal Petrucha

unread,
Apr 21, 2016, 10:42:34 AM4/21/16
to django...@googlegroups.com
On Thu, Apr 21, 2016 at 07:30:07AM -0700, Rick Leir wrote:
> Here are the Stackoverflow discussions I mentioned Ñ )oops I have the
> Espanol keyboard selected=
>
> http://stackoverflow.com/questions/16173328/what-unicode
> -normalization-and-other-processing-is-appropriate-for-passwords-w
> http://stackoverflow.com/questions/2798794/how-do-i-properly-implement-
> unicode-passwords
>
> Maybe we should not permit unicode passwords:
>
> http://stackoverflow.com/questions/1797777/should-i-support-unicode-in-passwords
>
> One issue for passwords is that you might have different Input Methods when
> you use different browsers, making it more difficult to login. Are Input
> Methods much different among browsers?
> We only need to consider browsers, clearly, not other UI's. (please
> correct me if there is any other, say Qt GUI)
>
> - Chrome: use input tools http://www.google.com/inputtools/ on Mac,
> Linux, and Windows
> - Mobile Android: long-press then slide to select a char
> - Mobile Ios:
> - I.E.: Microsoft has a few ways to enter Hex codes (unfriendly in my
> mind) https://en.wikipedia.org/wiki/Unicode_input#In_Microsoft_Windows
> - Firefox: there are 5 addons
> available https://addons.mozilla.org/en-US/firefox/tag/input%20method%20editor
> - Opera, Konqueror, .. .. ..
>
> The issue for usernames is that you could spoof someone else's username,
> and appear to be (impersonate) another person. The attacker can easily
> enter a character which looks the same but has a different Unicode point.
> Michal, as you say, we would want to normalize the chars. And as you say,
> it is a topic for the dev list.
>
> But how important is this issue? Yes, it is security related. But it is far
> from critical in my mind.

It's not important until this happens:
https://labs.spotify.com/2013/06/18/creative-usernames/

Question is whether this is something that Django should handle by
default, or it's up to each application developer to take care of it.

A quick and superficial search through the archives of
django-developers didn't yield much on this topic, I only found one
thread about this from way back in 2008, and as I skimmed through the
thread, it doesn't seem the security aspects were considered:
https://groups.google.com/d/topic/django-developers/WW28RIVyU3k/discussion

Michal
signature.asc

Arun S

unread,
Apr 21, 2016, 10:47:43 AM4/21/16
to Django users
thanks for some very useful information.

I did raise this in the dev forum but it was not agreed to be a question in that forum to discuss whether this should be taken up.

I guess with all this input, this can be suggested tough.

Michal Petrucha

unread,
Apr 21, 2016, 11:09:44 AM4/21/16
to django...@googlegroups.com
As far as I can see, you only created a ticket in Trac, not a new
thread on the django-developers@ mailing list, and the ticket was
pretty much a repost of your original question here.

What I suggest is raising the broader issue of Unicode normalization
of user input, and whether it is the job of Django to do that, or the
application developer's responsibility, and I believe the mailing list
is the right medium for that discussion.

Cheers,

Michal
signature.asc

Rick Leir

unread,
Apr 21, 2016, 11:11:27 AM4/21/16
to Django users
username = models.CharField(
   _
('username'),
   
max_length=150,
   unique=True,
   help_text=_('Required. 150 characters or fewer. Letters, digits and @/./+/-/_ only.'),
   validators=[ validators.RegexValidator( r'^[\w.@+-]+$',
It looks as if you could just clear the LOCALE and UNICODE flags, to restrict the allowable characters. 

I don't think you raised this in the dev mailing list https://groups.google.com/forum/#!searchin/django-developers/password
You raised an issue in Trac, which is different. I agree with Michal that this is worth looking at, and will pop it into a post in the dev list.
cheers -- Rick

django/contrib/auth/models.py line 308 or so

https://docs.python.org/2/library/re.html
\w
When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.

Avraham Serour

unread,
Apr 21, 2016, 11:45:27 AM4/21/16
to django-users
so it seems it would only need to use unicodedata.normalize(input, 'NFKD') on usernames and passwords ?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Rick Leir

unread,
Apr 21, 2016, 12:11:08 PM4/21/16
to Django users


On Thursday, 21 April 2016 10:47:43 UTC-4, Arun S wrote:
Reply all
Reply to author
Forward
0 new messages