Unicode usernames?

223 views
Skip to first unread message

John Hensley

unread,
Jan 28, 2008, 7:34:08 PM1/28/08
to django-d...@googlegroups.com
Usernames in django.contrib.auth are restricted to ASCII
alphanumerics. Allowing Unicode seems fairly simple: compile the
validator's regular expression with the re.UNICODE flag.

To a Midwesterner with hardly any language competency beyond English,
it seems like an obvious improvement -- surely everyone who builds
Django sites in Russian or Chinese or Japanese would love to let
people sign in with their real names. But I don't see much clamor
about it, so I'm guessing there are good reasons this hasn't been
done, or at least people have workarounds they're happy with.

There was a fairly pessimistic thread on the users list back in 2005,
the upshot of which was that people were used to being forced into 7-
bit ASCII anyway, so you might as well not bother, or if you must,
create another field in a user profile or something.

What's the consensus on Unicode usernames? Is the current restriction
intentional, or just left over from before the Unicode overhaul? Are
the developers who could really use these (as opposed to those who
just want to be courteous and future-proof) interested, or do you have
other solutions you're happy with?

Collin Grady

unread,
Jan 28, 2008, 7:58:30 PM1/28/08
to django-d...@googlegroups.com
John Hensley said the following:

> What's the consensus on Unicode usernames?

Personally, I'd love to see the username validation expanded - unicode,
email addresses, etc, are all fair game in my opinion :)

Registration forms could easily be limited if people wanted to restrict
allowed characters, but you can't really go the other direction :)

--
Collin Grady

Pause for storage relocation.

Eren Türkay

unread,
Jan 29, 2008, 12:43:49 AM1/29/08
to django-d...@googlegroups.com
On 29 Jan 2008 Tue 02:58:30 Collin Grady wrote:
> Personally, I'd love to see the username validation expanded - unicode,
> email addresses, etc, are all fair game in my opinion :)
>
> Registration forms could easily be limited if people wanted to restrict
> allowed characters, but you can't really go the other direction :)

I think, making unicode available would be great because there is no obstacles
in controlling it. When we don't want to unicode character names, we can just
control it with newform validation.

[~/tmp]> python
Python 2.4.4 (#1, Jan 4 2008, 00:58:13)
[GCC 3.4.6] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> foo = "not unicode"
>>> bar = u"unicode iĞü"
>>> isinstance(foo, unicode)
False
>>> isinstance(bar, unicode)
True
>>>

Ivan Illarionov

unread,
Jan 29, 2008, 8:43:17 AM1/29/08
to Django developers
No! Unicode usernames is a bad idea. Usernames can be used for URLs,
paths and other technical stuff. Unicode usernames is a source of
unlimited potential bugs. It might be a good idea to add the unicode
'nickname' field that can be used instead of username in views - and
still use 'username' for usrls and paths. There's no Russian site that
would allow you to enter the login/username in Russian.

Ivan Illarionov

unread,
Jan 29, 2008, 9:09:40 AM1/29/08
to Django developers
There are a lot of good reasons to have username ascii-only
1. You may want to login to your site from non-national keyboard/OS
2. URLs with unicode characters look ugly
3. Paths and filenames may be broken
4. It's not hard to transliterate your real name with ascii-only
letters

John Hensley

unread,
Jan 31, 2008, 4:15:17 PM1/31/08
to django-d...@googlegroups.com
Add to that list:

http://en.wikipedia.org/wiki/Internationalized_domain_name#ASCII_Spoofing_and_squatting_concerns

It's not too hard to imagine other sorts of mischief that lookalike
characters could cause.

Thanks for the feedback, guys.

Eren Türkay

unread,
Feb 1, 2008, 1:32:15 AM2/1/08
to django-d...@googlegroups.com
On 31 Jan 2008 Thu 23:15:17 John Hensley wrote:
> http://en.wikipedia.org/wiki/Internationalized_domain_name#ASCII_Spoofing_a
>nd_squatting_concerns

Hmm, I didn't think about it. It looks like a serious problem for django. Now,
I really prefer not to use unicode usernames :)

Ivan Illarionov

unread,
Feb 1, 2008, 2:13:00 AM2/1/08
to Django developers
Yes, they look the same and here they are if anyone is curious:
>>> 'ETOPAHKXCBMeopaxc' == 'ЕТОРАНКХСВМеорахс'
False
>>> 'ETOPAHKXCBMeopaxc'
'ETOPAHKXCBMeopaxc'
>>> 'ЕТОРАНКХСВМеорахс'
'\xd0\x95\xd0\xa2\xd0\x9e\xd0\xa0\xd0\x90\xd0\x9d\xd0\x9a
\xd0\xa5\xd0\xa1\xd0\x92\xd0\x9c\xd0\xb5\xd0\xbe
\xd1\x80\xd0\xb0\xd1\x85\xd1\x81'

Eren Türkay

unread,
Feb 1, 2008, 2:25:51 AM2/1/08
to django-d...@googlegroups.com
On 01 Feb 2008 Fri 09:13:00 Ivan Illarionov wrote:
> >>> 'ETOPAHKXCBMeopaxc' == 'ЕТОРАНКХСВМеорахс'
>
> False
>
> >>> 'ETOPAHKXCBMeopaxc'
>
> 'ETOPAHKXCBMeopaxc'
>
> >>> 'ЕТОРАНКХСВМеорахс'
>
> '\xd0\x95\xd0\xa2\xd0\x9e\xd0\xa0\xd0\x90\xd0\x9d\xd0\x9a
> \xd0\xa5\xd0\xa1\xd0\x92\xd0\x9c\xd0\xb5\xd0\xbe
> \xd1\x80\xd0\xb0\xd1\x85\xd1\x81'

It's really insteresting and stunning :)

Alexander Chemeris

unread,
Feb 9, 2008, 3:57:39 AM2/9/08
to django-d...@googlegroups.com
On 1/29/08, Ivan Illarionov <ivan.il...@gmail.com> wrote:
> No! Unicode usernames is a bad idea. Usernames can be used for URLs,
> paths and other technical stuff. Unicode usernames is a source of
> unlimited potential bugs. It might be a good idea to add the unicode
> 'nickname' field that can be used instead of username in views - and
> still use 'username' for usrls and paths. There's no Russian site that
> would allow you to enter the login/username in Russian.

Don't say for all. phpBB3 allow unicode usernames and it use fancy
way to get rid of usernames which look the same. E.g. they use a list
of confusables (equaly looking chars), so they substitute all of them
one prefered one when comparing usernames. MediaWiki also allows
usernames to be in unicode, MoinMoin is here too.

I'm running a site with phpBB3, MediaWiki and homebrew Django
application, all intergrated to authenticate against phpBB3 user table
and it works perfectly fine with unicode usernames. Well, except
Django admin pages, which are claiming that usernames are not valid.
Nut I'm not very concerned about this, as I use it very rarely.

So, here is other point of view. I think full support of unicode usernames
will be very helpful, but a way to limit them to ASCII (or whatever else)
would be a good thing to have too.

--
Regards,
Alexander Chemeris.

SIPez LLC.
SIP VoIP, IM and Presence Consulting
http://www.SIPez.com
tel: +1 (617) 273-4000

Reply all
Reply to author
Forward
0 new messages