[Django] #32778: Compile alphanumeric regex for CSRF middleware at module level.

6 views
Skip to first unread message

Django

unread,
May 24, 2021, 5:31:09 AM5/24/21
to django-...@googlegroups.com
#32778: Compile alphanumeric regex for CSRF middleware at module level.
-------------------------------------+-------------------------------------
Reporter: Abhyudai | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: CSRF | Version: 3.2
Severity: Normal | Keywords: middleware, csrf
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I was looking into the source code of the middleware for some reason, and
saw that the regular expression is compiled inside the module. I think
compiling it a module level could potentially save some time as the
function `_sanitize_token` is called twice inside the function
`process_view` for the `CsrfMiddleware` class.

This is the intended patch.


{{{
diff --git a/django/middleware/csrf.py b/django/middleware/csrf.py
index f323ffb..deaf7d8 100644
--- a/django/middleware/csrf.py
+++ b/django/middleware/csrf.py
@@ -22,6 +22,8 @@ from django.utils.log import log_response

logger = logging.getLogger('django.security.csrf')

+ASCII_ALPHANUMERIC_RE = re.compile('[^a-zA-Z0-9]')
+
REASON_BAD_ORIGIN = "Origin checking failed - %s does not match any
trusted origins."
REASON_NO_REFERER = "Referer checking failed - no Referer."
REASON_BAD_REFERER = "Referer checking failed - %s does not match any
trusted origins."
@@ -107,7 +109,7 @@ def rotate_token(request):

def _sanitize_token(token):
# Allow only ASCII alphanumerics
- if re.search('[^a-zA-Z0-9]', token):
+ if ASCII_ALPHANUMERIC_RE.search(token):
return _get_new_csrf_token()
elif len(token) == CSRF_TOKEN_LENGTH:
return token
}}}

I'm not sure how exactly to profile this change. I tried using the
[https://github.com/django/djangobench/ djangobench] package after some
tinkering to its source code. Since it was reporting changes even on
queries, I wasn't sure to trust it. Any leads on this front would be
great.

I would be happy to make the change, if this seems reasonable.

--
Ticket URL: <https://code.djangoproject.com/ticket/32778>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 24, 2021, 5:58:03 AM5/24/21
to django-...@googlegroups.com
#32778: Compile alphanumeric regex for CSRF middleware at module level.
-------------------------------------+-------------------------------------
Reporter: Abhyudai | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: CSRF | Version: 3.2
Severity: Normal | Resolution:

Keywords: middleware, csrf | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Abhyudai):

For what's it worth, the results for the `default_middleware` section when
run on my machine when comparing the patch branch to the main branch were:
{{{
Control: Django 4.0.dev20210524043148 (in git branch main)
Experiment: Django 4.0.dev20210524043148 (in git branch feat/optimize-
csrf)

Running 'default_middleware' benchmark ...
Min: -0.000020 -> -0.000111: 0.1772x faster
Avg: 0.000751 -> 0.000740: 1.0160x faster
Not significant
Stddev: 0.00529 -> 0.00522: 1.0144x smaller (N = 50)
}}}

Although, I'm not sure if this just tests the changes for the `Csrf` class
since the default middleware includes a lot more things.

--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:1>

Django

unread,
May 24, 2021, 6:55:10 AM5/24/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
--------------------------------------+------------------------------------
Reporter: Abhyudai | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: CSRF | Version: 3.2
Severity: Normal | Resolution:
Keywords: middleware, csrf | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Mariusz Felisiak):

* stage: Unreviewed => Accepted


Comment:

Thanks, sounds good. Would you like to prepare patch? Please use
`_lazy_re_compile()` to avoid compilation when importing the module.

--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:2>

Django

unread,
May 24, 2021, 6:55:32 AM5/24/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
--------------------------------------+------------------------------------
Reporter: Abhyudai | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: CSRF | Version: 3.2
Severity: Normal | Resolution:
Keywords: middleware, csrf | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0

--------------------------------------+------------------------------------
Changes (by Mariusz Felisiak):

* easy: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:3>

Django

unread,
May 24, 2021, 10:11:02 AM5/24/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
--------------------------------------+------------------------------------
Reporter: Abhyudai | Owner: Abhyudai
Type: Cleanup/optimization | Status: assigned
Component: CSRF | Version: 3.2
Severity: Normal | Resolution:
Keywords: middleware, csrf | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Abhyudai):

* owner: nobody => Abhyudai
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:4>

Django

unread,
May 24, 2021, 12:55:17 PM5/24/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
--------------------------------------+------------------------------------
Reporter: Abhyudai | Owner: Abhyudai
Type: Cleanup/optimization | Status: assigned
Component: CSRF | Version: 3.2
Severity: Normal | Resolution:
Keywords: middleware, csrf | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Abhyudai):

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/14442 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:5>

Django

unread,
May 25, 2021, 4:00:01 AM5/25/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
-------------------------------------+-------------------------------------
Reporter: Abhyudai | Owner: Abhyudai
Type: | Status: assigned

Cleanup/optimization |
Component: CSRF | Version: 3.2
Severity: Normal | Resolution:
Keywords: middleware, csrf | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:6>

Django

unread,
May 25, 2021, 4:29:23 AM5/25/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
-------------------------------------+-------------------------------------
Reporter: Abhyudai | Owner: Abhyudai
Type: | Status: closed

Cleanup/optimization |
Component: CSRF | Version: 3.2
Severity: Normal | Resolution: fixed

Keywords: middleware, csrf | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"866dccb65075159c7e99e8d165e52761965f3625" 866dccb6]:
{{{
#!CommitTicketReference repository=""
revision="866dccb65075159c7e99e8d165e52761965f3625"
Fixed #32778 -- Avoided unnecessary recompilation of token regex in
_sanitize_token().
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:7>

Django

unread,
May 28, 2021, 7:19:27 PM5/28/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
-------------------------------------+-------------------------------------
Reporter: Abhyudai | Owner: Abhyudai
Type: | Status: closed
Cleanup/optimization |
Component: CSRF | Version: 3.2
Severity: Normal | Resolution: fixed
Keywords: middleware, csrf | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Chris Jerdonek):

I added a follow-up PR here that renames `token_re`:
https://github.com/django/django/pull/14461

--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:8>

Django

unread,
May 29, 2021, 6:53:59 AM5/29/21
to django-...@googlegroups.com
#32778: Avoided unnecessary recompilation of token regex in _sanitize_token().
-------------------------------------+-------------------------------------
Reporter: Abhyudai | Owner: Abhyudai
Type: | Status: closed
Cleanup/optimization |
Component: CSRF | Version: 3.2
Severity: Normal | Resolution: fixed
Keywords: middleware, csrf | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by GitHub <noreply@…>):

In [changeset:"d270dd584e0af12fe6229fb712d0704c232dc7e5" d270dd58]:
{{{
#!CommitTicketReference repository=""
revision="d270dd584e0af12fe6229fb712d0704c232dc7e5"
Refs #32778 -- Improved the name of the regex object detecting invalid
CSRF token characters.

This also improves the comments near where the variable is used.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32778#comment:9>

Reply all
Reply to author
Forward
0 new messages