Discuss ticket 20264: URLValidator should allow underscores in local hostname

349 views
Skip to first unread message

Pavel Savchenko

unread,
Mar 24, 2020, 8:46:29 AM3/24/20
to Django developers (Contributions to Django itself)
Hi Folks,

I've just encountered this issue, and it seems Django's URLValidator regex for host is trying to abide to RFC 1034 recommendation , when there are many sites in the wild that use underscore in their domain name.

Can we please discuss this issue here, so we can eventually decide to reopen the ticket (or not) and perhaps allow for a pull-request to fix it?

I found this stackoverflow question helpful, with many answers/comments with additional references: https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it

Best regards,
Pavel

Adam Johnson

unread,
Mar 24, 2020, 9:36:33 AM3/24/20
to django-d...@googlegroups.com
Hi Pavel

The ticket ( https://code.djangoproject.com/ticket/20264 ) doesn't mention any specific use cases, and nor have you. What has this behaviour blocked for you?

Thanks,

Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6982245f-2b5a-4a32-8fe5-a063c7459b7c%40googlegroups.com.


--
Adam

1337 Shadow Hacker

unread,
Mar 24, 2020, 10:14:00 AM3/24/20
to django-d...@googlegroups.com
> when there are many sites in the wild that use underscore in their domain name.

Can you share some examples please ?

In general, we should abide by standards unless we have a really good reason.

In my experience I always had to replace underscores by dashes for a reason or another in hostnames that were setup by people who don't read RFCs anyway, so I'm not sure Django itself can make a big difference.

Nonetheless, can't you override the validation on your side ?

Best

Pavel Savchenko

unread,
Mar 24, 2020, 12:41:52 PM3/24/20
to Django developers (Contributions to Django itself)
Hey folks,

Sorry for not providing a more specific scenario before, was short on time and just wanted to kick this off.

The most common scenario that I can think of (and the one that most similar to our usage) would be a form field on a Django site, that allows users to input a URL which is saved and later displayed as a link to other users (e.g in blogs, comments, CMS systems, etc).

Here's an example of a site, though clearly not a very reputable one: http://online_casino_news.hundredpercentgambling.com/ . Note that google groups automatically converted this one to a URL for me, and I was able to click and follow it both on Chrome and Firefox.

In the above use case, by validating the correctness of the URL, we protect a user from making a mistake, but we don't really care about adhering to standards beyond that, the usability wins.

There are other use cases, that might care about RFC 952/1034 guidelines about hostname. For example, if we're building a hosting or a name server management system, or maybe SSL certificates vendor.
In such cases, it might actually benefit the user if the platform alerts on the validity of the hostname chosen by the user (at the very least to advise the users).

However, I would guess that the first use case, of taking a URL to store and render it as a link, would be more common and thus more frequently needing to override the class.

I can also propose a solution that would still work for both: (deprecate and) rename the current class to StrictURLValidator (or URLValidatorRFC1034), to still be easily used for the less common scenarios.

What do you think?

Best Regards,
Pavel


On Tuesday, March 24, 2020 at 2:36:33 PM UTC+1, Adam Johnson wrote:
Hi Pavel

The ticket ( https://code.djangoproject.com/ticket/20264 ) doesn't mention any specific use cases, and nor have you. What has this behaviour blocked for you?

Thanks,

Adam

On Tue, 24 Mar 2020 at 12:46, Pavel Savchenko <asfa...@gmail.com> wrote:
Hi Folks,

I've just encountered this issue, and it seems Django's URLValidator regex for host is trying to abide to RFC 1034 recommendation , when there are many sites in the wild that use underscore in their domain name.

Can we please discuss this issue here, so we can eventually decide to reopen the ticket (or not) and perhaps allow for a pull-request to fix it?

I found this stackoverflow question helpful, with many answers/comments with additional references: https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it

Best regards,
Pavel

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam

Adam Johnson

unread,
Mar 25, 2020, 2:27:58 PM3/25/20
to django-d...@googlegroups.com
You're right there are two use cases here. It does sound like the pragmatic approach is to allow underscores in URL's normally, but to preserve the existing behaviour for those with stricter use cases, like you say.

I can also propose a solution that would still work for both: (deprecate and) rename the current class to StrictURLValidator (or URLValidatorRFC1034), to still be easily used for the less common scenarios.

This sounds reasonable to me. I'm not sure we'd need the deprecation period, given we'd only be adding one character to URLValidator. A release note is typically enough in this situation, but I normally defer to the fellows for this.

I think that would make Florian happy, although it *has* been seven years since his closing comment on the ticket.

To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/2506854e-9566-444a-8f83-e227215613ea%40googlegroups.com.


--
Adam

Florian Apolloner

unread,
Mar 26, 2020, 5:48:06 AM3/26/20
to Django developers (Contributions to Django itself)
Hi Adam,


On Wednesday, March 25, 2020 at 7:27:58 PM UTC+1, Adam Johnson wrote:
I think that would make Florian happy, although it *has* been seven years since his closing comment on the ticket.

You should know me better :D No this would not make Florian happy and he is still against it. By all means add a lenient=False flag which can be turned to True to enable lenient parsing but the defaults should imo stay.

It might be true that for the sole purpose of __displaying__ URLs that an underscore will not hurt, but in the greater scheme of things it simply does not work:

 * java.net.URI will not parse it: new java.net.URI("http://test_host.com").getHost -> null
 * While you laugh about me mentioning java the more relevant argument is that we are going towards a HTTPs world and there you have to play by a different set of rules namely CA/Browser Forum Baseline Requirements. These requirements require you to follow RFCs (especially RFC 5280) which in turn requires subjectAltNames to follow the preferred style of RFC 1034 which finally disallows the use of underscores. So for this reason CAs won't allow you to issue certs for those hostnames, you can only make those work via wildcard certs, which in turn only work for subdomains and not TLDs.

So this limits the usefulness of underscores in URLs to mainly http-only sites or sites that went around extra hoops to get it working. In that sense I do not see a strong requirement to be lenient in parsing by default.

Cheers,
Florian

Carlton Gibson

unread,
Mar 26, 2020, 5:58:09 AM3/26/20
to Django developers (Contributions to Django itself)
> By all means add a lenient=False flag which can be turned to True to enable lenient parsing...

I don't think we should even allow this. The extra API surface area complicates the matter for all users, almost all of whom are never going to set the new flag to anything but the default. (Of those that do, how many wouldn't really have thought it through/mean it?)

Folks wanting this can subclass URLValidator.

C.

James Bennett

unread,
Mar 26, 2020, 1:29:18 PM3/26/20
to django-d...@googlegroups.com
I'm also in the "I don't think this should be allowed" camp. People
who really need it can set up their own validator easily enough, and I
worry about the security implications of supporting non-standard
behavior in something as crucial as hostname validation -- Django's
been bitten by that sort of thing several times in the past.

Adam Johnson

unread,
Apr 16, 2020, 5:37:55 PM4/16/20
to django-d...@googlegroups.com
Folks wanting this can subclass URLValidator.

For anyone who does want this, the subclass is not so much work. You can inherit the regex pieces from URLValidator and edit them to insert _ as a valid character:

In [18]: import re
    ...:
    ...: from django.core.validators import URLValidator
    ...:
    ...:
    ...: class LenientURLValidator(URLValidator):
    ...:     hostname_re = URLValidator.hostname_re.replace('0-9]', '0-9_]').replace('0-9-]', '0-9-_]')
    ...:     domain_re = URLValidator.domain_re.replace('0-9-]', '0-9-_]')
    ...:     host_re = '(' + hostname_re + domain_re + URLValidator.tld_re + '|localhost)'
    ...:
    ...:     regex = re.compile(
    ...:         r'^(?:[a-z0-9.+-]*)://'  # scheme is validated separately
    ...:         r'(?:[^\s:@/]+(?::[^\s:@/]*)?@)?'  # user:pass authentication
    ...:         r'(?:' + URLValidator.ipv4_re + '|' + URLValidator.ipv6_re + '|' + host_re + ')'
    ...:         r'(?::\d{2,5})?'  # port
    ...:         r'(?:[/?#][^\s]*)?'  # resource path
    ...:         r'\Z', re.IGNORECASE)
    ...:

In [19]: LenientURLValidator()('http://online_casino_news.hundredpercentgambling.com/')  # no ValidationError

It's a little tricky in the re.compile step that's copied form the superclass, but it works.

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.


--
Adam

Pavel Savchenko

unread,
Apr 16, 2020, 6:22:18 PM4/16/20
to Django developers (Contributions to Django itself)
Thank you Adam,

This is more or less what I ended up doing, sans the replace call, very neat!

And thanks a lot for the expert advice, everyone!

For the time being at least, it seems we have an agreement on not allowing non-strict validation into Django and I have to agree it just makes sense to keep the stricter default.

Stay safe,
Pavel
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam
Reply all
Reply to author
Forward
0 new messages