[Django] #34654: Post-normalization performed on the Username field leading to the bypass of the whitespace stripping

8 views
Skip to first unread message

Django

unread,
Jun 14, 2023, 7:58:30 AM6/14/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
----------------------------------------+------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: 4.1
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 1
UI/UX: 0 |
----------------------------------------+------------------------
**Summary**

In the Django codebase
[L74](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/contrib/auth/forms.py#L74),
when "strip" is enabled
[L265](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L265),
the username should not contain any trailing or leading whitespace.
However, after stripping those whitespace
[L278..L279](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L278..L279),
normalization is performed. That means stripping whitespaces could be
bypassed in the username using similar Unicode characters as "\u2800"

Take the following case:

result = unicodedata.normalize("NFKC",
"\u2800\u2800sim4n6\u2800\u2800".strip())

In case, the stripped string "sim4n6" contains a leading and trailing
whitespace, that would be deleted and work perfectly.
In case, the stripped string "\u2800\u2800sim4n6\u2800\u2800" contains
similar Unicode characters to the whitespace which could be copied for
instance from https://emptycharacter.com.

**Impact**

There is a security concern regarding including spaces in the username,
plus that, the username is attacker-controlled input.
This means the whitespace stripping on the username could be bypassed and
a remote user could use unintended white space.


@Sim4n6

--
Ticket URL: <https://code.djangoproject.com/ticket/34654>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 14, 2023, 8:19:32 AM6/14/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------

Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by Natalia Bidart):

* version: 4.1 => dev
* easy: 1 => 0
* stage: Unreviewed => Accepted


Comment:

Accepting following the conversation with the security team. Though, do
note that the issue is not considered a security concern but could lead to
potential impersonation incidents.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:1>

Django

unread,
Jun 14, 2023, 8:26:31 AM6/14/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Natalia Bidart):

There wasn't a concrete fix discussed, though some voices said:

* normalize_username() already applies NFKC normalization and Python
updates the [https://unicode.org/ucd/ Unicode Character Database] in each
release.
* It'd make sense to update the normalization to include these newly
identified empty characters.
* But, likely Django shouldn't duplicate the UCD and it should rely on
Python's.

Personally, I was wondering if we could signal, visually, the being/end of
the username plus, perhaps, showing the username length? This could be a
small but straightforward improvement (of course this only makes sense in
the templates/forms that Django ships), and perhaps it could serve as a
guide to users to apply to their own forms.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:2>

Django

unread,
Jun 14, 2023, 2:37:47 PM6/14/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Sim4n6):

Btw I still believe that's maybe medium to low severity vulnerability.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:3>

Django

unread,
Jun 14, 2023, 2:39:36 PM6/14/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Sim4n6):

One way to fix this is normalizing first and then perform the stripping.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:4>

Django

unread,
Jun 16, 2023, 3:50:23 PM6/16/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Ashutosh singh):

Normalizing first and then performing striping doesn't seem to work.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:5>

Django

unread,
Jun 17, 2023, 6:59:57 AM6/17/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Sim4n6):

Indeed. Next is my understanding.

In the strip() function doc
https://docs.python.org/3/library/stdtypes.html?highlight=removing%20whitespace#str.strip
If the intended character is omitted, the function removes the whitespace.

The problem is that the stripping of the leading and the trailing
whitespace is performed before the Unicode normalization.
This case fits the attack "using Unicode encoding to bypass Validation
logic" : https://capec.mitre.org/data/definitions/71.html

I have done some initial fuzzing (nothing fancy) the Unicode character
BRAILLE PATTERN BLANK fits partially the criteria.
Visually looks like a space. But this one remains the same when
normalized.

The idea is there may be a Unicode character that when stripped using the
strip() function, it does not get removed. But after normalization with
the NFKC form, it may come back as whitespace.

If a user is able to set a username with whitespace, it kind of looks a
bit "fishy". Two usernames "sim4n6" and "sim4n6 " (with a trailing space).

And also some fuzzing leads to this code points too:

Code Point: 0xa8 Fuzzed: [ ̈sim4n6 ̈].
Code Point: 0xaf Fuzzed: [ ̄sim4n6 ̄].
Code Point: 0xb4 Fuzzed: [ ́sim4n6 ́].
Code Point: 0xb8 Fuzzed: [ ̧sim4n6 ̧].
Code Point: 0x2d8 Fuzzed: [ ̆sim4n6 ̆].
Code Point: 0x2d9 Fuzzed: [ ̇sim4n6 ̇].
Code Point: 0x2da Fuzzed: [ ̊sim4n6 ̊].
Code Point: 0x2db Fuzzed: [ ̨sim4n6 ̨].
Code Point: 0x2dc Fuzzed: [ ̃sim4n6 ̃].
Code Point: 0x2dd Fuzzed: [ ̋sim4n6 ̋].
Code Point: 0x37a Fuzzed: [ ͅsim4n6 ͅ].
Code Point: 0x384 Fuzzed: [ ́sim4n6 ́].
Code Point: 0x385 Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1fbd Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fbf Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fc0 Fuzzed: [ ͂sim4n6 ͂].
Code Point: 0x1fc1 Fuzzed: [ ̈͂sim4n6 ̈͂].
Code Point: 0x1fcd Fuzzed: [ ̓̀sim4n6 ̓̀].
Code Point: 0x1fce Fuzzed: [ ̓́sim4n6 ̓́].
Code Point: 0x1fcf Fuzzed: [ ̓͂sim4n6 ̓͂].
Code Point: 0x1fdd Fuzzed: [ ̔̀sim4n6 ̔̀].
Code Point: 0x1fde Fuzzed: [ ̔́sim4n6 ̔́].
Code Point: 0x1fdf Fuzzed: [ ̔͂sim4n6 ̔͂].
Code Point: 0x1fed Fuzzed: [ ̈̀sim4n6 ̈̀].
Code Point: 0x1fee Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1ffd Fuzzed: [ ́sim4n6 ́].
Code Point: 0x1ffe Fuzzed: [ ̔sim4n6 ̔].
Code Point: 0x2017 Fuzzed: [ ̳sim4n6 ̳].
Code Point: 0x203e Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0x309b Fuzzed: [ ゙sim4n6 ゙].
Code Point: 0x309c Fuzzed: [ ゚sim4n6 ゚].
Code Point: 0xfc5e Fuzzed: [ ٌّsim4n6 ٌّ].
Code Point: 0xfc5f Fuzzed: [ ٍّsim4n6 ٍّ].
Code Point: 0xfc60 Fuzzed: [ َّsim4n6 َّ].
Code Point: 0xfc61 Fuzzed: [ ُّsim4n6 ُّ].
Code Point: 0xfc62 Fuzzed: [ ِّsim4n6 ِّ].
Code Point: 0xfc63 Fuzzed: [ ّٰsim4n6 ّٰ].
Code Point: 0xfe49 Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4a Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4b Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4c Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe70 Fuzzed: [ ًsim4n6 ً].
Code Point: 0xfe72 Fuzzed: [ ٌsim4n6 ٌ].
Code Point: 0xfe74 Fuzzed: [ ٍsim4n6 ٍ].
Code Point: 0xfe76 Fuzzed: [ َsim4n6 َ].
Code Point: 0xfe78 Fuzzed: [ ُsim4n6 ُ].
Code Point: 0xfe7a Fuzzed: [ ِsim4n6 ِ].
Code Point: 0xfe7c Fuzzed: [ ّsim4n6 ّ].
Code Point: 0xfe7e Fuzzed: [ ْsim4n6 ْ].
Code Point: 0xffe3 Fuzzed: [ ̄sim4n6 ̄].

Those code points can be used as trailing and leading in usernames and
result in whitespace, after Unicode normalization the way Django does it.

char = chr(code_point)
# Username input with potential trialing and leading
whitespace
username = char + "sim4n6" + char
# this is the way it is done by Django
s = unicodedata.normalize("NFKC", username.strip())
if contains_whitespace(s):
# Print the Unicode character and its code point for
demonstration purposes
print(f"Code Point: {hex(code_point)}\tResult: [{s}].")

With those code points, it is not only visually a space but a regular one.

Voila

PS: I wrote about this once: https://sim4n6.beehiiv.com/p/unicode-
characters-bypass-security-checks

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:6>

Django

unread,
Jun 17, 2023, 8:00:49 AM6/17/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Ashutosh singh):

So what's the best solution here.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:7>

Django

unread,
Jun 22, 2023, 3:34:55 AM6/22/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Sim4n6):

I suggest normalizing first and then performing the stripping as
previously mentioned, that would reduce the odds and then handle the usual
cases.

PS: I still insist on the fact there is a security risk here.

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:8>

Django

unread,
Jun 26, 2023, 11:06:05 AM6/26/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by SecondPort):

I think the correct solution to this security error was to add a
validation step that rejects user names with certain characters, I hope
this is the correct solution and if something else needs to be changed,
please let me know, thank you very much.
PR: https://github.com/django/django/pull/17014

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:9>

Django

unread,
Jun 26, 2023, 11:24:39 AM6/26/23
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+------------------------------------
Reporter: Sim4n6 | Owner: nobody
Type: Bug | Status: new
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------

Comment (by Sim4n6):

The `contains_unwanted_characters()` would fail more often than you
expect.

{{{
import unicodedata

def contains_unwanted_characters(username):
"""Check if the username contains characters that normalize to
whitespace."""
normalized_username = unicodedata.normalize("NFKC", username)
return username != normalized_username.strip()

usernames = ["sim4n6", " ̈sim4n6 ̈", "℻", "Sim4n6","Sim4n6"]
for username in usernames:
print(contains_unwanted_characters(username))

}}}

Between I still believe there is a security concern here. Anyway, I have a
security background and this issue is classified as non-related. So, that
would be my last comment regarding this issue.

Regards,

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:10>

Django

unread,
Jul 10, 2024, 3:09:27 PM7/10/24
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+--------------------------------------------
Reporter: Sim4n6 | Owner: George Kussumoto
Type: Bug | Status: assigned
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+--------------------------------------------
Changes (by George Kussumoto):

* owner: nobody => George Kussumoto
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:11>

Django

unread,
Jul 10, 2024, 3:56:04 PM7/10/24
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+--------------------------------------------
Reporter: Sim4n6 | Owner: George Kussumoto
Type: Bug | Status: assigned
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+--------------------------------------------
Comment (by Sim4n6):

In case you change your mind regarding this issue , please keep me posted
on its outcome via https://hackerone.com/sim4n6?type=user

Thanks
--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:12>

Django

unread,
Jul 10, 2024, 5:59:23 PM7/10/24
to django-...@googlegroups.com
#34654: Post-normalization performed on the Username field leading to the bypass of
the whitespace stripping
------------------------------+--------------------------------------------
Reporter: Sim4n6 | Owner: George Kussumoto
Type: Bug | Status: assigned
Component: contrib.auth | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+--------------------------------------------
Comment (by George Kussumoto):

Here are some ideas to discuss:

- To the main point of bypassing white space removal: while this is true
for the `UsernameField`, most forms using this field also have another
validation in the model level, such as `UnicodeUsernameValidator` and
`EmailValidator`. These validators run on post-normalized values, so
symbols like `BRAILLE PATTERN BLANK` or white space (regardless of
position) will raise errors.

- The `AuthenticationForm` might not invalidate such inputs, but to
successfully exploit this, one should have bypassed the creation form
first (I think). I'm not a security expert, but maybe there's an
additional condition of using non-unique usernames.

- The case for the password reset is already addressed in
https://www.djangoproject.com/weblog/2019/dec/18/security-releases/

- When using custom user models things are different, the application code
should include the appropriate validators. Depending on the
implementation, the normalization steps too.

- At first, I thought about adding `UnicodeUsernameValidator` to the
`UsernameField`. It was promising, all tests passed. Then I realized it
not only breaks compatibility, but we shouldn't make this assumption on
the username since the validator is particular about what it accepts,
compromising the flexibility to set `USERNAME_FIELD` in custom models. For
example, `UnicodeUsernameValidator` is stricter than `EmailValidator` (at
least at first glance).

- Another possibility was to add the validator in the form definition
instead of the form field (since it's expected some customization when
using custom models). Possible candidates were `UserCreationForm` and
`UserChangeForm`
([https://docs.djangoproject.com/en/dev/topics/auth/customizing/#custom-
users-and-the-built-in-auth-forms custom-users-and-built-in-auth-forms
doc]) since they are more tied to the default user model. But this is
already accomplished by having the validator in the model.

- As mentioned before in the comments, re-running `.strip()` after
normalization or comparing string length doesn't seem to address the
problem reported.

So far, I don't have a conclusion, but I lean lightly on the side that no
code changes are required. I wanted to share my thoughts and hear others.
I don't have a security background so I might have overlooked/simplified
things a bit. Please keep that in mind.
--
Ticket URL: <https://code.djangoproject.com/ticket/34654#comment:13>
Reply all
Reply to author
Forward
0 new messages