[Django] #36452: DomainNameValidator forbids digits in TLDs

6 views
Skip to first unread message

Django

unread,
Jun 10, 2025, 2:36:31 PM6/10/25
to django-...@googlegroups.com
#36452: DomainNameValidator forbids digits in TLDs
-------------------------------------+-------------------------------------
Reporter: Shai Berger | Type: Bug
Status: new | Component: Core
| (Other)
Version: dev | Severity: Normal
Keywords: validation domain | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
I think there's a small bug in the domain validator, that has been lurking
quietly for years, and is now biting me a little. The issue is digits in
top-level-domains -- e.g. `email.com1`. As far as I can read the
definition in [https://www.rfc-editor.org/rfc/rfc1035 RFC 1035 (page 8)],
this is a perfectly valid domain name, but
[https://github.com/django/django/blob/2714bc3f2c8675d32caae764c874ac381c836c7f/django/core/validators.py#L82
our regex, as I write this,] allows only letters. This is the regex for
i18n-supporting domains; there's an "ascii_only_tld" regex right next to
it, which does allow digits -- this makes me quite certain that it's a
bug.

Of note: The class `DomainNameValidator` is relatively new - only added
about a year ago -- but it inherits the regex from older `URLValidator`,
which, it seems, has forbidden digits in TLDs at least since Django 2.x.
Since `EmailValidator` now also uses the regexes from
`DomainNameValidator`, it is also affected.
--
Ticket URL: <https://code.djangoproject.com/ticket/36452>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 10, 2025, 10:20:04 PM6/10/25
to django-...@googlegroups.com
#36452: DomainNameValidator forbids digits in TLDs
-----------------------------------+--------------------------------------
Reporter: Shai Berger | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: validation domain | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-----------------------------------+--------------------------------------
Comment (by David Sanders):

I suppose technically it's incorrect however I didn't see any registered
TLDs with digits and the folks that were involved with the recent update
were hesitant to touch the existing regex for fear of breaking something.

I'm just wondering "what would Carlton decide here" lmao
--
Ticket URL: <https://code.djangoproject.com/ticket/36452#comment:1>

Django

unread,
Jun 11, 2025, 3:40:34 AM6/11/25
to django-...@googlegroups.com
#36452: DomainNameValidator forbids digits in TLDs
-----------------------------------+--------------------------------------
Reporter: Shai Berger | Owner: (none)
Type: Bug | Status: closed
Component: Core (Other) | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: validation domain | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-----------------------------------+--------------------------------------
Changes (by Sarah Boyce):

* cc: Claude Paroz, Mike Edmunds (added)
* resolution: => needsinfo
* status: new => closed

Comment:

I was trying to check if `email.com1` should be valid.

Looking at this list of top level domains (https://www.icann.org/en
/contracted-parties/registry-operators/resources/list-of-top-level-
domains), the only numeric top level domains are prefixed with `XN--`.
This is allowed by our validator.

I think I agree that looking at the RFC, the definition isn't this strict
and digits are allowed without hyphens:
{{{
<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9
}}}

Before we continue, I think we should get confirmation that tlds like
`com1` are valid
--
Ticket URL: <https://code.djangoproject.com/ticket/36452#comment:2>

Django

unread,
Jun 11, 2025, 1:11:20 PM6/11/25
to django-...@googlegroups.com
#36452: DomainNameValidator forbids digits in TLDs
-----------------------------------+--------------------------------------
Reporter: Shai Berger | Owner: (none)
Type: Bug | Status: closed
Component: Core (Other) | Version: dev
Severity: Normal | Resolution: invalid
Keywords: validation domain | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-----------------------------------+--------------------------------------
Changes (by Mike Edmunds):

* resolution: needsinfo => invalid

Comment:

Replying to [comment:2 Sarah Boyce]:
> Before we continue, I think we should get confirmation that tlds like
`com1` are valid

I believe `com1` is [https://stackoverflow.com/questions/9071279/number-
in-the-top-level-
domain/53875771#53875771:~:text=Specially%20note%20the%3A%20The%20ASCII%20label%20must%20consist%20entirely%20of%20letters%20(alphabetic%20characters%20a%2Dz)
not a valid TLD] under current ICANN rules. Since ICANN decides what's a
valid gTLD, their policies override whatever RFC 1035 may seem to allow.

There's a pretty thorough review here:
https://stackoverflow.com/questions/9071279/number-in-the-top-level-
domain/53875771.

That said, I haven't personally reviewed RFC 1035 and all 29(!) RFCs that
modify it. The ICANN gTLD policies are from 2012; there's a new gTLD
policy in draft form now, and I haven't reviewed that either. So if
someone finds a newer policy that would allow digits in TLDs—or better
yet, real-world evidence of a (non-IDNA) TLD containing digits—then we
should revisit this.

The exception, as Sarah noted, is an IDNA-encoded TLD starting with
`xn--`. ICANN allows those, and so does Django's DomainNameValidator.

(Also, I suppose there could be ''internal-use-only'' TLDs containing
digits, which ''might'' be valid under the RFCs but wouldn't be usable on
the public Internet. That seems pretty niche, and anyone having that use
case could subclass Django's DomainNameValidator to cover it.)
--
Ticket URL: <https://code.djangoproject.com/ticket/36452#comment:3>
Reply all
Reply to author
Forward
0 new messages