Invalid URLs passing validation by URLValidator

58 views
Skip to first unread message

Tim Bell

unread,
Jun 21, 2018, 8:54:02 p.m.2018-06-21
to Django users
Hi,

I've come across some strings which I think aren't valid URLs that nevertheless pass validation by django.core.validators.URLValidator in Django 2.0.6 and 1.11.13. I know URL validation is very tricky, but these seemed to me that they should obviously fail.

http://#FOO#/b...@example.com

I believe that this is passing validation because "#FOO#/bar" is being treated as a username, with "example.com" as the hostname. However, "#FOO#/bar" shouldn't be valid as a username because the "#" and "/" characters aren't percent-encoded.


Similarly, I think this passes validation not because "FOO" is being treated as a valid hostname, but because "FOO/bar" is considered a username, even though "/" isn't percent-encoded.

Should this be considered a bug? (Sure, it's pretty obscure, but this has actually come up in my particular use case.)

Thanks,

Tim Bell

Jason

unread,
Jun 22, 2018, 6:53:54 a.m.2018-06-22
to Django users
Interesting find.. the only time I've used that kind of URL convention is by connecting to redis with the python redis library.  It also fits db url connection strings too.

What's the actual use case for the URL schema?

You could also report this to the https://groups.google.com/forum/#!forum/django-developers group which is the core framework dev group, or report it as a bug on the django bug tracker.

Melvyn Sopacua

unread,
Jun 22, 2018, 7:51:01 a.m.2018-06-22
to django...@googlegroups.com

On vrijdag 22 juni 2018 02:50:08 CEST Tim Bell wrote:

 

> http://#FOO#/b...@example.com

>

> I believe that this is passing validation because "#FOO#/bar" is being

> treated as a username, with "example.com" as the hostname. However,

> "#FOO#/bar" shouldn't be valid as a username because the "#" and "/"

> characters aren't percent-encoded.

 

You are right about the slash. Not about the pound sign:

"The user name (and password), if present, are followed by a

commercial at-sign "@". Within the user and password field, any ":",
"@", or "/" must be encoded." - RFC 1738, section 3.1.

This is because the pound sign doesn't have a special meaning until after the hostname. However, officially, HTTP urls do not allow for username and password as outlined in section 3.3:

 

An HTTP URL takes the form:


      http://<host>:<port>/<path>?<searchpart>

where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80.  No user name or password is
allowed.

 

So then, the parsing becomes:

scheme = http

host = foo

path = /b...@example.com/

 

Which also brings us to the reserved character portion:

Many URL schemes reserve certain characters for a special meaning:

their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded.  The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.

 

Which means, that in http scheme, @ is not reserved and as such does not have to be encoded.

 

That said - Django still validates the ftp variant as being correct, so the bug is still there and nice catch!

--

Melvyn Sopacua

Tim Bell

unread,
Jun 27, 2018, 6:59:23 p.m.2018-06-27
to Django users

Just picking up on a few points...

On Friday, 22 June 2018 21:51:01 UTC+10, Melvyn Sopacua wrote:

  

However, officially, HTTP urls do not allow for username and password as outlined in section 3.3:

 

An HTTP URL takes the form:


      http://<host>:<port>/<path>?<searchpart>

where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80.  No user name or password is
allowed.

Except Django already decided they would accept them: https://code.djangoproject.com/ticket/20003

 

So then, the parsing becomes:

scheme = http

host = foo

path = /b...@example.com/


But "foo" is not a valid host, as it's not fully-qualified. (That's how the validator treats it, anyway.)

That said - Django still validates the ftp variant as being correct, so the bug is still there and nice catch!


I've filed a bug (and created a pull request): https://code.djangoproject.com/ticket/29528

Cheers,

Tim Bell

Tim Bell

unread,
Jun 27, 2018, 7:02:45 p.m.2018-06-27
to Django users


On Friday, 22 June 2018 20:53:54 UTC+10, Jason wrote:
Interesting find.. the only time I've used that kind of URL convention is by connecting to redis with the python redis library.  It also fits db url connection strings too.

What's the actual use case for the URL schema?

The use case is that I've extracted URLs like that from spam emails. (I work for an agency that regulates spam.) Clearly the software creating the emails had a bug that resulted in the invalid URLs, but nevertheless, I don't want the invalid URLs to break my system, so I check they're valid before further processing.

 

You could also report this to the https://groups.google.com/forum/#!forum/django-developers group which is the core framework dev group, or report it as a bug on the django bug tracker.

Reported as a bug (with a fix): https://code.djangoproject.com/ticket/29528

Cheers,

Tim Bell
Reply all
Reply to author
Forward
0 new messages