[Django] #32050: reverse() and request.get_full_path() escape URL paths differently

7 views
Skip to first unread message

Django

unread,
Sep 28, 2020, 11:43:16 AM9/28/20
to django-...@googlegroups.com
#32050: reverse() and request.get_full_path() escape URL paths differently
-------------------------------------+-------------------------------------
Reporter: Jack | Owner: nobody
Cushman |
Type: Bug | Status: new
Component: Core | Version: master
(URLs) | Keywords: reverse,
Severity: Normal | get_full_path, escape
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
`reverse()` and `request.get_full_path()` apply different escaping rules
to the path section of URLs. `reverse()` applies `iri_to_uri` and escapes
everything except `/#%[]=:;$&()+,!?*@'~`, while `get_full_path` applies
`escape_uri_path` and escapes everything except `/:@&+$,-_.!~*'()`. (In
other words, get_full_path() escapes some characters like semicolon that
reverse() does not.) I'm not sure which rules are correct, but it seems
like the two should escape URL paths the same way. In particular, maybe
reverse() ought to apply escape_uri_path to its path section?

Here's a one-file app that demonstrates the mismatch. Visiting
localhost:8000/1/wrongtitle will successfully redirect to the correct
title, but visiting localhost:8000/2/wrongtitle will start an infinite
loop printing `/2/Some%20semicolon;.pdf does not match
/2/Some%20semicolon%3B.pdf!` because the title contains a semicolon, which
is escaped by get_full_path() but not by reverse().

(I realize there are other ways to implement this demo program, but it
highlights the underlying issue that the paths produced by the two
functions should be escaped the same way, whichever way is correct.)

{{{
import sys

from django.conf import settings
from django.core.management import execute_from_command_line
from django.core.wsgi import get_wsgi_application
from django.http import HttpResponse, HttpResponseRedirect
from django.urls import path, reverse

settings.configure(ALLOWED_HOSTS=["*"], ROOT_URLCONF=__name__,
SECRET_KEY="foo", DEBUG=True)

files = {1: 'Some title.pdf', 2: 'Some semicolon;.pdf'}

def deliver_file(request, file_id, title):
correct_title = files[file_id]
correct_url = reverse('deliver_file', args=[file_id, correct_title])
if correct_url != request.get_full_path():
print(f"{correct_url} does not match {request.get_full_path()}!")
return HttpResponseRedirect(correct_url)
return HttpResponse(correct_title)


urlpatterns = [
path("<int:file_id>/<str:title>", deliver_file, name='deliver_file'),
]

app = get_wsgi_application()
execute_from_command_line(sys.argv)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32050>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 30, 2020, 5:45:37 AM9/30/20
to django-...@googlegroups.com
#32050: reverse() and request.get_full_path() escape URL paths differently
-------------------------------------+-------------------------------------
Reporter: Jack Cushman | Owner: nobody
Type: Bug | Status: closed
Component: Core (URLs) | Version: master
Severity: Normal | Resolution: duplicate
Keywords: reverse, | Triage Stage:
get_full_path, escape | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: new => closed
* resolution: => duplicate


Comment:

Thanks for the report.

This is delicate, at least. Adjusting the behaviour here leads to various
test failures, so to that extent, current behaviour is at least expected.

Immediately, you can escape your URL parameters in the call to `reverse()`
to avoid the issue:

In your script:

{{{


correct_url = reverse('deliver_file', args=[file_id,

escape_uri_path(correct_title)])
}}}

I **think** that this is going to be the required way to go.

See particularly the discussion on #18456 — the difference in behaviour
between `get_full_path()` and `uri_to_iri()` is known, and it's not clear
that `get_full_path()` **can** be implemented to handle all cases.

See also #11522, #13260, and #22223, which touch on this.

I'm going to close this as a duplicate of #18456. If you want to
investigate fully and approach the DevelopersMailingList, if in conclusion
you think there's a way to improve the behaviour all things considered
then, that would be welcome if you're able to summarize the difficulties.

--
Ticket URL: <https://code.djangoproject.com/ticket/32050#comment:1>

Reply all
Reply to author
Forward
0 new messages