[Django] #36483: IntegerField will accept non-ASCII digits, which leads to the same page appearing at many URLs

19 views
Skip to first unread message

Django

unread,
Jun 27, 2025, 2:45:24 PM6/27/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-----------------------------+-----------------------------------------
Reporter: Morgan Wahl | Type: Bug
Status: new | Component: Uncategorized
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------+-----------------------------------------
Hello,

I was recently surprised to find that a simple detail view URL with a
model ID in it was also accessible at a URL using "full width" digit
characters. For example the page at "/pizza/123" could also be returned
from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13.
It turns out this is ultimately because the model `IntegerField` is using
`int` to get an integer from the string that was originally in the URL.
And I was surprised to find Python's `int` constructor uses
`unicodedata.decimal` (or some equivalent) to translate from characters in
a string to decimal digits.

That was a cool accidental feature to discovery, however now I'm concerned
about URL canonicalization. Python 3.13.3 accepts _68_ different
characters for each digit. This means the same content is hypothetically
accessible from many, many URLs. I've heard that can make a site look
spammy to search engines. And maybe this could be an element of a security
hole if something is assuming there is only one URL for a given page.

The SEO problem could be addressed by setting a `<link rel=canonical>` in
the page to point to `Pizza.objects.get(pk=id).get_absolute_url()` or some
similar logic, or you could address the problem as a whole by setting up
redirects or 404 responses, but all those approaches require a separate
implementation for every view, since the view code ultimately doesn't know
which parts of the URL are going to be treated as values of a
`IntegerField`.

Possible solutions I can think of are either:

1. make some mechanism to very easily canonicalize URLs, by allowing users
to somehow mark this situation explicitly in the URL conf, and then Django
can set a property on the request object with the "canonicalized" URL.
Then redirects or 404s or <link> tags could be implemented just once for
all such URLs. (Redirects and 404s in a middleware, <link> tags in a base
template.)
2. Don't just pass strings to `int` in the model `IntegerField`. Instead
only allow strings with ASCII digits to be used.
--
Ticket URL: <https://code.djangoproject.com/ticket/36483>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 27, 2025, 2:46:26 PM6/27/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-------------------------------+--------------------------------------
Reporter: Morgan Wahl | Owner: (none)
Type: Bug | Status: new
Component: Uncategorized | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Old description:
New description:

Hello,

I was recently surprised to find that a simple detail view URL with a
model ID in it was also accessible at a URL using "full width" digit
characters. For example the page at "/pizza/123" could also be returned
from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13.
It turns out this is ultimately because the model `IntegerField` is using
`int` to get an integer from the string that was originally in the URL.
And I was surprised to find Python's `int` constructor uses
`unicodedata.decimal` (or some equivalent) to translate from characters in
a string to decimal digits.

That was a cool accidental feature to discover, however now I'm concerned
about URL canonicalization. Python 3.13.3 accepts _68_ different
characters for each digit. This means the same content is hypothetically
accessible from many, many URLs. I've heard that can make a site look
spammy to search engines. And maybe this could be an element of a security
hole if something is assuming there is only one URL for a given page.

The SEO problem could be addressed by setting a `<link rel=canonical>` in
the page to point to `Pizza.objects.get(pk=id).get_absolute_url()` or some
similar logic, or you could address the problem as a whole by setting up
redirects or 404 responses, but all those approaches require a separate
implementation for every view, since the view code ultimately doesn't know
which parts of the URL are going to be treated as values of a
`IntegerField`.

Possible solutions I can think of are either:

1. make some mechanism to very easily canonicalize URLs, by allowing users
to somehow mark this situation explicitly in the URL conf, and then Django
can set a property on the request object with the "canonicalized" URL.
Then redirects or 404s or <link> tags could be implemented just once for
all such URLs. (Redirects and 404s in a middleware, <link> tags in a base
template.)
2. Don't just pass strings to `int` in the model `IntegerField`. Instead
only allow strings with ASCII digits to be used.

--
Comment (by Morgan Wahl):

(Fix typo in description.)
--
Ticket URL: <https://code.djangoproject.com/ticket/36483#comment:1>

Django

unread,
Jun 27, 2025, 2:49:05 PM6/27/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-------------------------------+--------------------------------------
Reporter: Morgan Wahl | Owner: (none)
Type: Bug | Status: new
Component: Uncategorized | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Old description:

> Hello,
>
> I was recently surprised to find that a simple detail view URL with a
> model ID in it was also accessible at a URL using "full width" digit
> characters. For example the page at "/pizza/123" could also be returned
> from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13.
> It turns out this is ultimately because the model `IntegerField` is using
> `int` to get an integer from the string that was originally in the URL.
> And I was surprised to find Python's `int` constructor uses
> `unicodedata.decimal` (or some equivalent) to translate from characters
> in a string to decimal digits.
>
implementation for every view, since Django's code ultimately doesn't know
which parts of the URL are going to be treated as values of a
`IntegerField`. (Django's `DetailView` could implement some logic for
this, however in reality there are lots of other situations where people
take a string from the URL use it to look up a record with an
`IntegerField`.)

Possible solutions I can think of are either:

1. make some mechanism to very easily canonicalize URLs, by allowing users
to somehow mark this situation explicitly in the URL conf, and then Django
can set a property on the request object with the "canonicalized" URL.
Then redirects or 404s or <link> tags could be implemented just once for
all such URLs. (Redirects and 404s in a middleware, <link> tags in a base
template.)
2. Don't just pass strings to `int` in the model `IntegerField`. Instead
only allow strings with ASCII digits to be used.

--
Comment (by Morgan Wahl):

(Clarified possible workarounds.)
--
Ticket URL: <https://code.djangoproject.com/ticket/36483#comment:2>

Django

unread,
Jun 30, 2025, 10:26:16 AM6/30/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-------------------------------------+-------------------------------------
Reporter: Morgan Wahl | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Core (URLs) | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* component: Uncategorized => Core (URLs)
* type: Bug => Cleanup/optimization

Comment:

Hello Morgan Wahl, thank you for your ticket.

The suggested option 2 would be backwards incompatible so I don't think
that's an option we want to pursue. As to whether this needs fixing at
all, I'm not entirely sure. I understand the potential cache and SEO
invalidation issues, so I have shared this report with the security team
to evaluate possible options.

I will update the ticket when I get more information.
--
Ticket URL: <https://code.djangoproject.com/ticket/36483#comment:3>

Django

unread,
Jun 30, 2025, 3:26:54 PM6/30/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-------------------------------------+-------------------------------------
Reporter: Morgan Wahl | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Core (URLs) | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

Morgan, could you please provide details of how you are defining your view
and urlconf? There was previous work in #28268 that would ensure that
modern usages of URL definitions would not allow for these fullwidth
unicode digits. So patterns like this:

* `path("something/<int:pk>/", something_detail)`
* `re_path(r"^something/(?P<pk>[0-9]+)/$", something_detail)`

properly raise 404 for non-ascii digits.
--
Ticket URL: <https://code.djangoproject.com/ticket/36483#comment:4>

Django

unread,
Jun 30, 2025, 4:37:17 PM6/30/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-------------------------------------+-------------------------------------
Reporter: Morgan Wahl | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Core (URLs) | Version: 5.2
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Morgan Wahl):

* resolution: => wontfix
* status: new => closed

Comment:

Ah! That was a good question. In the case where I ran into this, I see
we're doing

` re_path(r'^(?P<pk>\d+)/$', browse_views.material,
name='browse.material'),`

This particular code was written sometime before 2013, so `path` was not
an option, and was originally running on Python 2, where `\d` only matches
ASCII unless you passed the unicode flag.

I see we just need to use `path` with `<int:...>` or limit the regex.

Thanks for the help and sorry to take up your time!
--
Ticket URL: <https://code.djangoproject.com/ticket/36483#comment:5>

Django

unread,
Jun 30, 2025, 7:19:54 PM6/30/25
to django-...@googlegroups.com
#36483: IntegerField will accept non-ASCII digits, which leads to the same page
appearing at many URLs
-------------------------------------+-------------------------------------
Reporter: Morgan Wahl | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Core (URLs) | Version: 5.2
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* resolution: wontfix => duplicate

Comment:

Thank you for your quick reply Morgan! I'll adjust the resolution
slightly to mark this as duplicate of #28268.
--
Ticket URL: <https://code.djangoproject.com/ticket/36483#comment:6>
Reply all
Reply to author
Forward
0 new messages