Potentially inconsistent behavior between test client and normal WSGI requests

Matt Hooks

unread,

Apr 24, 2015, 4:51:12 PM4/24/15

to django-d...@googlegroups.com

Hi all,

As I was fixing an issue in our API related to url encodings, I noticed a problem should have been caught by a test that was somehow passing. (Remember, make sure your test can fail!)

If you had some path /some/path/Spam%20Ham, and a URL pattern to capture /some/path/(?P<foo>.+)$, it's not unreasonable to think to think your named capture would pick up "Spam Ham" (with an actual space) and send that to your view. And indeed it does exactly that, when you make that request through the Django test client.

This is because there's an explicit call to unquote in django.test.client.RequestFactory before proceeding to build a WSGIRequest.

(The behavior of the development server is similar to the test client, although I haven't investigated for what exact reason.)

But when an actual WSGI server makes the same call, WSGIHandler doesn't make that same call to unquote; it passes the exact URL through to WSGIRequest. This leads to a scenario where, in the above example, views will see "Spam Ham" as the value of foo in unit tests, but will see "Spam%20Ham" when run in production.

This strikes me as bug worthy. I don't have particularly strong feelings on which is the correct behavior, but both behaviors should be the same.

I'd be willing to take on the patch once I get some input from others.

Thoughts?

Florian Apolloner

unread,

Apr 24, 2015, 5:07:59 PM4/24/15

to django-d...@googlegroups.com

On Friday, April 24, 2015 at 10:51:12 PM UTC+2, Matt Hooks wrote:

(The behavior of the development server is similar to the test client, although I haven't investigated for what exact reason.)

To be honest, even on my production machines I have this behavior -- so the question is which "production" server you are using and which versions of $stuff.

Cheers,
Florian

Matt Hooks

unread,

Apr 25, 2015, 2:46:58 AM4/25/15

to django-d...@googlegroups.com

Took me a few minutes to narrow down the right environment to recreate this.

I'm seeing this "issue" with Gunicorn (latest version ,19.3.0), and only when using the gaiohttp async worker (again latest version of aiohttp, 0.15.3). I've tracked it down to the troublesome line in aiohttp, but I imagine that isn't terrible relevent here.

I'm not really familiar with the design ideals of the Django devs or the WSGI spec. From what I can tell, the spec doesn't specify whether the url should be unquoted before passing it to the application. This leaves us with the possibility of changing behavior when moving among different WSGI servers. While any decent developer should know there will be differences when they reconfigure their stack, it might make sense to ensure consistency for this particular detail. The only concern would be backwards compatibility, but from what I can tell, most of the gunicorn worker types currently already behave this way**, along with uwsgi. I imagine the majority of people are using one of those, so I doubt any signficant number are relying on this behavior, but I dont have much to back that up with. I havent checked any other implementations, like mod_wsgi.

I suppose this boils down to whether or not Django should be normalizing the url path (PATH_INFO) from the WSGI server, or should just go with whatever it is provided.

(** All the ones I checked, specifically sync, eventlet and tornado. The gevent ones dont play nice with my python3 install.)

Florian Apolloner

unread,

Apr 25, 2015, 3:02:50 AM4/25/15

to django-d...@googlegroups.com

On Saturday, April 25, 2015 at 8:46:58 AM UTC+2, Matt Hooks wrote:

I'm not really familiar with the design ideals of the Django devs or the WSGI spec. From what I can tell, the spec doesn't specify whether the url should be unquoted before passing it to the application.

I think the WSGI spec somewhat follows the CGI spec which says:
Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot contain path-segment parameters.

So to my understanding this seems to be a bug in whatever you use, no?

Cheers,
Florian

Matt Hooks

unread,

Apr 26, 2015, 11:50:57 PM4/26/15

to django-d...@googlegroups.com

I've checked against the wsgiref module and you are most certainly correct: unquoting PATH_INFO is the job of the wsgi server.

I'll being up the issue with with the aiohttp folks. Thanks for your time.

Reply all

Reply to author

Forward