And the Rack spec should explicitly specify this behaviour, as the CGI
spec does:
http://hoohoo.ncsa.uiuc.edu/cgi/env.html
Cheers,
Sam
diff --git a/lib/rack/lint.rb b/lib/rack/lint.rb
index 7eb0543..66d252b 100644
--- a/lib/rack/lint.rb
+++ b/lib/rack/lint.rb
@@ -88,7 +88,9 @@ module Rack
## within the application. This may be an
## empty string, if the request URL targets
## the application root and does not have a
- ## trailing slash.
+ ## trailing slash. This information should be
+ ## decoded by the server if it comes from a
+ ## URL.
>> The same is not true for Thin and Mongrel so I think it's a bug. I
>> fixed the issue in a fork of the rack repository:
>> http://github.com/bahuvrihi/rack/commit/f88976c314dbab84a001610996e5f69f4dad25eb
>
> And the Rack spec should explicitly specify this behaviour, as the CGI
> spec does:
Both applied, thanks.
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org
I think the implementation and spec are at odds now, no? The spec says
PATH_INFO should be decoded but the handlers all leave PATH_INFO
encoded. Am I reading this wrong?
Thanks,
Ryan
> Rack can specify whatever behaviour it likes, but the problem if we
> say that handlers should *not* decode PATH_INFO is that in some cases
> it may have already been done (e.g. when Rack is running as a CGI).
When would it be useful to have it not decoded?
--
stadik.net
The Rack spec adopts the definitions of CGI in terms of what it passes
in the request.
PATH_INFO comes from the CGI spec, and it says it should be decoded.
It also says a server MAY reject a request as invalid that has URL
encoded '/' characters, because (as you point out), it causes loss of
information.
The server MAY
impose restrictions and limitations on what values it permits for
PATH_INFO, and MAY reject the request with an error if it encounters
any values considered objectionable. That MAY include any requests
that would result in an encoded "/" being decoded into PATH_INFO, as
this might represent a loss of information to the script.
- http://www.ietf.org/rfc/rfc3875.txt, section 4.1.5
Maybe the PATH_INFO should obey the CGI spec, but there should be a
rack-specific env variable ("rack.path_info") that either doesn't
url-decode the path?
Might be worth looking at wsapi to see what they do, probably a wealth
of experience there.
Cheers,
Sam
Except there's this: "/foo%252Fbar".
Thanks,
Ryan
> Except there's this: "/foo%252Fbar".
Escaping is hell on earth.
Probably, though it might depend on why the middleware is modifying
the env. I'd think doing so would generally be a bad idea. There is a
lot of redundancy in the env as passed by Apache, anyway. Middleware
doesn't have a good chance of meaningfully rewriting it all.
> So the only characters which cannot appear unencoded are / ? # [ ]
And %.
> So actually, it's safe to unencode everything *except* %2F. This could
> be achieved by:
And %25, which used to arrive in the PATH_INFO decoded, so this seems
to be an attempt to make handling / in path components unambiguously
possible, at the expense of making % harder.
Also, how would you reconstruct the original URL from such a
"partially encoded" PATH_INFO? This would break:
http://www.python.org/dev/peps/pep-0333/#url-reconstruction
If rack just follows the CGI spec for CGI vars, and tries to present
the original undecoded data elsewhere we have standard conformance and
non-loss of data.
I totally sympathize with your goal of making the rack spec allow
stuff you can theoretically do with HTTP, but I don't think partially
encoded PATH_INFO will really help.
The app I'm working on relies on URL reconstruction. It also would
benefit very much from being able to use a full URL as a
path-component... but even though HTTP's escaping rules would allow
that, its pretty clear that it's chance of working with actually
deployed code is low.
I wanted to do:
http://example.com/ics/http:%2f%2fsome.site.com%2fcalendars%2fevents.ics/atom
But since I will never (famous last words) have more than a single URL
in my path, anyway, I just dump it after the ? as the query info,
which works fine:
http://example.com/ics/atom?http://some.site.com/calendars/events.ics
And ends up easier to construct, anyway.
This, btw, is how I found that the query info was being inject into
the ARGV... I was getting server 500 errors and rackup complaining
that "http://some.site.com/calendars/events.ics" was not a valid
configuration, because it was ARGV[0], and rackup was trying to open
it as a config file.
Sam
> I just came across a practical example of this.
Since most webservers leave it with escapes and we have a patch to fix
webrick to make it escaped as well, I reverted 7a3d21f4b469d5ce; web
frameworks now have to escape for themselves.
I clarified the SPEC accordingly.