#36131: URLValidator does not allow URLs without a top level domain (except for
localhost)
-------------------------------+-----------------------------------------
Reporter: Ludwig Kraatz | Owner: Ludwig Kraatz
Type: Bug | Status: closed
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution: duplicate
Keywords: URL Validator | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------+-----------------------------------------
Changes (by Ludwig Kraatz):
* easy: 1 => 0
* type: New feature => Bug
Comment:
Thank you Sarah, for putting in the effort - and taking the time for this
ticket.
== Really a Bug || hope you reconsider || though i do see the point (to a
degree)
I do not consider this a new feature - I consider this **correcting a
misleading/''outdated'' implementation** of URLValidator.
I hoped to have pointed out, that how URLs are defined via RFC 3986 and
the context of RFCs sourrounding it - is not the same as how the django
URLValidator validates them.
https://github.com/django/django/blob/9cc3970eaaf603832c075618e61aea9ea430f719/docs/ref/validators.txt#L182
Even consider how the django docs talk about the URLValidator.
There stated, plain and simple ("in other words"): it does not validate
URLs, but what django considers relevant and "URL-Looking".
{{{
A :class:`RegexValidator` subclass that ensures a value looks like a
URL,
...
Values starting with ``file:///`` will not pass validation
even
}}}
I do understand the downstream implications. Which is why I added this
ticket after my initial "pull request" (not talking about the quality of
that request here..) - and the tests it broke.
I do see how one could argue, that django operates mostly in the Web
domain and as such the URLField validating those kind of URLs seems
reasonable.
But then again - adding a hardcoded 'localhost' really is making this
argument a little obsolete. Because, try {{{
https://localhost}}} on a
machine that is not set-up to serve that resource.. It's the exact same as
{{{
https://printer}}}. If it is set-up - it works. if not, it does not.
Only difference - it is easier for developers using django.
Buut that makes it superficial - no offense. I made that argument before..
I will elaborate on why "superficial naming" is a **bug**
Just to make the inconsistency clear:
{{{localhost}}} was reserved with RFC 6761.
The mDNS, which django's URLValidator (as an EXAMPLE) 'violates', was RFC
6762.
Both from 02.2013.
The URLValidator is from before that. Im pretty sure i used it back in
2010/11.
The field is not called a "DNSURLField" or "WebURLField" or
"CustomURLField" - and as such, the implementation simply does not match
what it claims to be.
Which, in turn, leads to very circumventable problems.
Oh my..
https://github.com/django/django/blob/9cc3970eaaf603832c075618e61aea9ea430f719/django/core/validators.py#L169
i just realized django "actually" implemented a
"CustomWeb2010"URLValidator - calling it URLValidator..
I **really** want to "emphasize", that naming some "thing" plays a crucial
role in what that thing should "do" or what that thing will be "expected
to do". And as such - this plays a crucial role Quality-wise, especially
for a "framework for perfectionists (with deadlines)" -- if thats still
what django is labeled, as it was back when i started.
== Why it matters || projects depend on django being 'reliable' || a
concrete example
The thing is, django is a framework, not a "simple project".
Projects like Authentik depend on the conformity of things like the
URLValidator in django:
https://github.com/goauthentik/authentik/blob/3daa39080a7866d83fad0fb3691e9e31397e0f6c/authentik/providers/saml/models.py#L43
We use authentik in our intranet. We use it as a SAML IDP.
It interacts with other Third-Party tools in our intranet. But - the
intranet aspect is actually irrelevant here.
The SAML Service Provider dictates its ACS-URL.
As such, using the URLs in a way the URLValidator currently (falsefully)
rejects, is mandatory for us.
Its a Software that defines its ACS-URL as "
https://example/resource"
(other words, but thats the layout. BTW: do you see how the URL is
accepted here as URL. I did not decalre it as some sort of link.)
And that works just fine, because this URL is only used for validation,
that a SAML request is meant for the endpoint it is handled at. No
URL-"Calling" happens.
Its a simple reference, in the form of an URL. A reference for a resource
via its location.
The location is locally-scoped, sure. But we live in 2025+. Not back in
pre-2010, where the only things happening was in the WEB or FS. (stupid
remark, i know. ''i am sorry''.)
And as this software handles that all internally - it does work without
anything to be setup on (m)DNS / Host or else.
Its just another application running - as if a normal user would install
something from some software-store (that just as well might run as
"
https://localhost"... but in this case, it simply doesn't)
== Why it matters, even in standard configurations
Again, RFC 6762 talks about mDNS -> which is a standard that gains (as
part of a whole zeroconf thing) more and more relevance in everyday IT-
usage.
It is natively supported in MacOS, ships in the default of some Linux
distros and just take Apple's Bonjur (as example mDNS implementation).
file:/// - is a URL. it might not be a "WebURL" or "DNSURL", but - it is a
**URL**.
Developers expecting django to handle URLs might suffer headaches,
because.. it simply "does not".
And they might by completely caught off guard, because this DjangoURL-
Field is so non URLy.
Naming things falsefully, creating a misconception of what they do, is
hindering us and others in simply utilizing a well organized
infrastructure of software.
Which i would say is the goal of open-source at its core..
Which - brings me to, why this is a **bug**, **not a feature**.
A software that so obviously does that - is **bug**gy.
I came here, to fix that, or at least make sure i did my part - as good as
i can.
== Possible Proposition
This is how i would do it. I get, if a {{{URLField(mode_feature='raw')}}}
- is something you are leaning towards.
I simply would not do it that way, as I see it as an evolutionary
adaption&correction, not just a feature.
1. Deprecation of URLField (as it never was, what it was called)
2. creation of
- {{{RawURLField + RawURLValidator}}}
- {{{WebURLField + WebURLValidator // CustomURLField/Validator}}}
- subclassing Raw*,
- adding lazy-scheme, other restrictions and default stuff for
backwards compatability
(even though i would prefer something like
{{{URL_Web__Field/URL_Raw__Field}}} - or even better
{{{Field__URL_Web}}}/.., but thats totally irrelevant in a django
context...)
Benefits:
- it is very clear (at least clearer) what one would get
- one has to make a conscious decision about what the own usecase is
- it would have to be a development-change that is to be made, instead of
another option on the URLField/Validator, that is probably ignored more
often than good.
This way - sure - there are some hurdles when updating a projects django
dependency.
But thats to be expected in a changing environment or when implementation
simply was not spot on.
This way, at least everybody can decide consciously, with which to go for.
And I do see, how you would want to include the forum on this topic - but
i simply don't.
I see a bug - i report it, i put in effort for people to see and
understand it and i offer my help fixing it.
But i dont "do people". Sry, but not sry. And most certainly no offense (:
I have to look in the mirror from time to time - so, i'm no different then
the rest of us.
But, i don't lobby or discuss options - especially not with the django
community.
I tried that back then - i was shut down, pretty similar to how this
almost ended up. Just to see my suggestion being implemented after little
time has past - because it was the only thing that made sense....
It does not work "for/with me" - if "you" have to consult your community
for "how you" want to approach this. I'm totally ok with that.
I just brought a bug to your attention, and offered my help.
As this ticket remains closed, so does my involvement. (: not unhappy -
its just that my role seems no longer of need.
== Additional remark about this FieldTest || Test Structuring suboptimal
|| Also requires some fixing IMHO
The initial issue with my misconception of the test-issue, was based on
the following situation:
1. I changed the host_re, to allow for more versatile "localhost"
variants. (roughly "localhost" <=> [a-z0-9-]{2-63})
2. (besides other, understandbly failed tests) a test failed, because now
{{{value='foo'}}} -> did break the existing tests.
3. this was, because the lazy-scheme feature of the field kicked in,
allowing for 'foo' to pass, because it was lazy interprated as
"
https://foo"
=> this is an **ISSUE**!
I never worked close to the lazy-scheming feature, yet tests that are
influenced by it fail.
Usually - i would expect the URL-Validator to be tested rigorously, so
that the URLFields additional features can be tested more cleanly - and
focused.
What i propose:
The URLField "Validation" should be tested in its "accordance with
URLValidator", except when it comes to the lazy scheme feature.
Thats, where the URLField would allow for less strict validation.
If it was handled this way, the issue in the FieldTest (point 2.) "would
not have failed", because the URLValidator would have rejected it, same as
the URLField without lazy scheme or the reverse - if passed with scheme,
the URLValidator would have passed it, same as the URLField.
As such - the "lazy scheming" feature would be tested on the field (where
it originates), in a way that does not deliver false negatives as it
"somehow did" in this situation (because URLValidator was being tested
"VIA" URLField)
--
Ticket URL: <
https://code.djangoproject.com/ticket/36131#comment:13>