Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Bug: WEBrick handler does not set the unescaped PATH_INFO
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  18 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Simon Chiang  
View profile  
 More options Mar 8 2009, 3:15 am
From: Simon Chiang <simon.a.chi...@gmail.com>
Date: Sat, 7 Mar 2009 23:15:36 -0800 (PST)
Local: Sun, Mar 8 2009 3:15 am
Subject: Bug: WEBrick handler does not set the unescaped PATH_INFO
For instance with this:

  require 'rack'
  app = lambda {|env| [200, {}, [env['PATH_INFO']]] }
  Rack::Handler::WEBrick.run(app, :Port => 8080)

A request to 'http://localhost:8080/percent%3Aencoding' returns:

  percent:encoding

Rather than:

  percent%3Aencoding

The same is not true for Thin and Mongrel so I think it's a bug.  I
fixed the issue in a fork of the rack repository:
http://github.com/bahuvrihi/rack/commit/f88976c314dbab84a001610996e5f...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Roberts  
View profile  
 More options Mar 8 2009, 1:52 pm
From: Sam Roberts <vieuxt...@gmail.com>
Date: Sun, 8 Mar 2009 10:52:22 -0700
Local: Sun, Mar 8 2009 1:52 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

On Sun, Mar 8, 2009 at 12:15 AM, Simon Chiang <simon.a.chi...@gmail.com> wrote:
> A request to 'http://localhost:8080/percent%3Aencoding' returns:

>  percent:encoding

> Rather than:

>  percent%3Aencoding

> The same is not true for Thin and Mongrel so I think it's a bug.  I
> fixed the issue in a fork of the rack repository:
> http://github.com/bahuvrihi/rack/commit/f88976c314dbab84a001610996e5f...

And the Rack spec should explicitly specify this behaviour, as the CGI
spec does:

http://hoohoo.ncsa.uiuc.edu/cgi/env.html

Cheers,
Sam

diff --git a/lib/rack/lint.rb b/lib/rack/lint.rb
index 7eb0543..66d252b 100644
--- a/lib/rack/lint.rb
+++ b/lib/rack/lint.rb
@@ -88,7 +88,9 @@ module Rack
       ##                      within the application. This may be an
       ##                      empty string, if the request URL targets
       ##                      the application root and does not have a
-      ##                      trailing slash.
+      ##                      trailing slash. This information should be
+      ##                      decoded by the server if it comes from a
+      ##                      URL.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Neukirchen  
View profile  
 More options Mar 8 2009, 5:37 pm
From: Christian Neukirchen <chneukirc...@gmail.com>
Date: Sun, 08 Mar 2009 22:37:44 +0100
Local: Sun, Mar 8 2009 5:37 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

Sam Roberts <vieuxt...@gmail.com> writes:
>> The same is not true for Thin and Mongrel so I think it's a bug.  I
>> fixed the issue in a fork of the rack repository:
>> http://github.com/bahuvrihi/rack/commit/f88976c314dbab84a001610996e5f...

> And the Rack spec should explicitly specify this behaviour, as the CGI
> spec does:

Both applied, thanks.

--
Christian Neukirchen  <chneukirc...@gmail.com>  http://chneukirchen.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
candlerb  
View profile  
 More options Mar 10 2009, 10:23 am
From: candlerb <b.cand...@pobox.com>
Date: Tue, 10 Mar 2009 07:23:14 -0700 (PDT)
Local: Tues, Mar 10 2009 10:23 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
In the context of Rack, would it be clearer to say "should be decoded
by the application" rather than "should be decoded by the server"?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Tomayko  
View profile  
 More options Mar 10 2009, 7:01 pm
From: Ryan Tomayko <r...@tomayko.com>
Date: Tue, 10 Mar 2009 16:01:19 -0700
Local: Tues, Mar 10 2009 7:01 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
On Sun, Mar 8, 2009 at 2:37 PM, Christian Neukirchen

<chneukirc...@gmail.com> wrote:

> Sam Roberts <vieuxt...@gmail.com> writes:

>>> The same is not true for Thin and Mongrel so I think it's a bug.  I
>>> fixed the issue in a fork of the rack repository:
>>> http://github.com/bahuvrihi/rack/commit/f88976c314dbab84a001610996e5f...

>> And the Rack spec should explicitly specify this behaviour, as the CGI
>> spec does:

> Both applied, thanks.

I think the implementation and spec are at odds now, no? The spec says
PATH_INFO should be decoded but the handlers all leave PATH_INFO
encoded. Am I reading this wrong?

Thanks,
Ryan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
candlerb  
View profile  
 More options Mar 11 2009, 7:52 am
From: candlerb <b.cand...@pobox.com>
Date: Wed, 11 Mar 2009 04:52:22 -0700 (PDT)
Local: Wed, Mar 11 2009 7:52 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

> I think the implementation and spec are at odds now, no? The spec says
> PATH_INFO should be decoded but the handlers all leave PATH_INFO
> encoded. Am I reading this wrong?

The current implementation is what makes sense to me. Without it, an
application wouldn't be able to tell the different between /foo%2Fbar/
and /foo/bar/ (which are semantically different)

However this may differ from CGI practice. Let me just test this with
an old-skool CGI under apache 2.2.8 (Ubuntu Hardy):

#!/usr/bin/ruby
puts "Content-Type: text/plain"
puts
puts "PATH_INFO = #{ENV['PATH_INFO'].inspect}"

Hmm, strange.
    http://localhost/cgi-bin/test-cgi
    http://localhost/cgi-bin/test-cgi/foo
    http://localhost/cgi-bin/test-cgi/foo/bar
all work as expected. But
    http://localhost/cgi-bin/test-cgi/foo%2Fbar
gives a 404 error!

    http://localhost/cgi-bin/test.cgi/foo%2Abar
does work, and gives a result of
    PATH_INFO = "/foo*bar"

Interestingly, with this test, Firefox updated its URL bar to .../
foo*bar as well. However Apache logs show that the request was
received using %2A, and a 200 response was sent, not a redirect.

So it seems a bit of a mess.

Rack can specify whatever behaviour it likes, but the problem if we
say that handlers should *not* decode PATH_INFO is that in some cases
it may have already been done (e.g. when Rack is running as a CGI).

B.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Neukirchen  
View profile  
 More options Mar 11 2009, 8:49 am
From: Christian Neukirchen <chneukirc...@gmail.com>
Date: Wed, 11 Mar 2009 13:49:49 +0100
Local: Wed, Mar 11 2009 8:49 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

candlerb <b.cand...@pobox.com> writes:
> Rack can specify whatever behaviour it likes, but the problem if we
> say that handlers should *not* decode PATH_INFO is that in some cases
> it may have already been done (e.g. when Rack is running as a CGI).

When would it be useful to have it not decoded?

--
Christian Neukirchen  <chneukirc...@gmail.com>  http://chneukirchen.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
candlerb  
View profile  
 More options Mar 11 2009, 3:55 pm
From: candlerb <b.cand...@pobox.com>
Date: Wed, 11 Mar 2009 12:55:03 -0700 (PDT)
Local: Wed, Mar 11 2009 3:55 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
On Mar 11, 12:49 pm, Christian Neukirchen <chneukirc...@gmail.com>
wrote:

> candlerb <b.cand...@pobox.com> writes:
> > Rack can specify whatever behaviour it likes, but the problem if we
> > say that handlers should *not* decode PATH_INFO is that in some cases
> > it may have already been done (e.g. when Rack is running as a CGI).

> When would it be useful to have it not decoded?

/invoices/2009%2F1234/print

From RFC 3986:

  "The purpose of reserved characters is to provide a set of
delimiting
   characters that are distinguishable from other data within a URI.
   URIs that differ in the replacement of a reserved character with
its
   corresponding percent-encoded octet are not equivalent.  Percent-
   encoding a reserved character, or decoding a percent-encoded octet
   that corresponds to a reserved character, will change how the URI
is
   interpreted by most applications."

Or consider this:

helpers do
  def build_path(*path_components)
    path_components.map { |c| escape(c) }.join("/")
  end

  # If the path has already been decoded, we cannot
  # implement the inverse function accurately:
  def split_path(path)
    path.split("/").map { |c| unescape(c) }
  end
end

However there are sufficiently many broken HTTP implementations around
that can't parse this properly, that it would be unsurprising if Rack
were similarly broken. So I won't push too hard for it.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Scytrin dai Kinthra  
View profile  
 More options Mar 12 2009, 2:19 pm
From: Scytrin dai Kinthra <scyt...@gmail.com>
Date: Thu, 12 Mar 2009 11:19:44 -0700
Local: Thurs, Mar 12 2009 2:19 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
I'd promote on this piece to leave it encoded so that it isn't broken
as standard.
Following "standard convention" is nice, but I'd rather follow the standards.

--
stadik.net

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Roberts  
View profile  
 More options Mar 13 2009, 12:09 am
From: Sam Roberts <vieuxt...@gmail.com>
Date: Thu, 12 Mar 2009 21:09:36 -0700
Local: Fri, Mar 13 2009 12:09 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
On Thu, Mar 12, 2009 at 11:19 AM, Scytrin dai Kinthra <scyt...@gmail.com> wrote:

> I'd promote on this piece to leave it encoded so that it isn't broken
> as standard.
> Following "standard convention" is nice, but I'd rather follow the standards.

The Rack spec adopts the definitions of CGI in terms of what it passes
in the request.

PATH_INFO comes from the CGI spec, and it says it should be decoded.

It also says a server MAY reject a request as invalid that has URL
encoded '/' characters, because (as you point out), it causes loss of
information.

   The server MAY
   impose restrictions and limitations on what values it permits for
   PATH_INFO, and MAY reject the request with an error if it encounters
   any values considered objectionable.  That MAY include any requests
   that would result in an encoded "/" being decoded into PATH_INFO, as
   this might represent a loss of information to the script.

   - http://www.ietf.org/rfc/rfc3875.txt, section 4.1.5

Maybe the PATH_INFO should obey the CGI spec, but there should be a
rack-specific env variable ("rack.path_info") that either doesn't
url-decode the path?

Might be worth looking at wsapi to see what they do, probably a wealth
of experience there.

Cheers,
Sam


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
candlerb  
View profile  
 More options Mar 13 2009, 5:30 am
From: candlerb <b.cand...@pobox.com>
Date: Fri, 13 Mar 2009 02:30:32 -0700 (PDT)
Local: Fri, Mar 13 2009 5:30 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

> Maybe the PATH_INFO should obey the CGI spec, but there should be a
> rack-specific env variable ("rack.path_info") that either doesn't
> url-decode the path?

In that scenario, any middleware which alters PATH_INFO will also have
to be careful to make corresponding changes to rack.path_info. Another
option would be to have rack.path_info how we want it, and have a CGI
compat middleware which you can stick on the top of the stack just
below the application.

But let's reconsider the CGI definition, given that we are talking
only about the PATH_INFO portion. RFC 3986 allows non-reserved
characters to be unescaped. Also, the definition of path in section
3.3 allows sub-delims and : and @ to appear unencoded.

So the only characters which cannot appear unencoded are / ? # [ ]

The server will already have dealt with ? and # by trimming off the
query string and anchor.

As for [ and ]

  "A host identified by an Internet Protocol literal address, version
6
   [RFC3513] or later, is distinguished by enclosing the IP literal
   within square brackets ("[" and "]").  This is the only place where
   square bracket characters are allowed in the URI syntax."

So actually, it's safe to unencode everything *except* %2F. This could
be achieved by:

  path.split(/%2F/i).map { |p| unencode(p) }.join("%2F")

When I say "safe" here I mean "unambiguous". If you were to use the
PATH_INFO to construct another URI, e.g. when proxying to another
server, you would have to remember that certain characters seen in
plain form in PATH_INFO must actually have been encoded in the
original request and therefore need re-encoding.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Tomayko  
View profile  
 More options Mar 13 2009, 7:38 am
From: Ryan Tomayko <r...@tomayko.com>
Date: Fri, 13 Mar 2009 04:38:10 -0700
Local: Fri, Mar 13 2009 7:38 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

Except there's this: "/foo%252Fbar".

Thanks,
Ryan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Neukirchen  
View profile  
 More options Mar 13 2009, 9:58 am
From: Christian Neukirchen <chneukirc...@gmail.com>
Date: Fri, 13 Mar 2009 14:58:16 +0100
Local: Fri, Mar 13 2009 9:58 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

Ryan Tomayko <r...@tomayko.com> writes:
> Except there's this: "/foo%252Fbar".

Escaping is hell on earth.

--
Christian Neukirchen  <chneukirc...@gmail.com>  http://chneukirchen.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Roberts  
View profile  
 More options Mar 13 2009, 2:13 pm
From: Sam Roberts <vieuxt...@gmail.com>
Date: Fri, 13 Mar 2009 11:13:43 -0700
Local: Fri, Mar 13 2009 2:13 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

On Fri, Mar 13, 2009 at 2:30 AM, candlerb <b.cand...@pobox.com> wrote:
>> Maybe the PATH_INFO should obey the CGI spec, but there should be a
>> rack-specific env variable ("rack.path_info") that either doesn't
>> url-decode the path?

> In that scenario, any middleware which alters PATH_INFO will also have
> to be careful to make corresponding changes to rack.path_info. Another

Probably, though it might depend on why the middleware is modifying
the env. I'd think doing so would generally be a bad idea. There is a
lot of redundancy in the env as passed by Apache, anyway. Middleware
doesn't have a good chance of meaningfully rewriting it all.

> So the only characters which cannot appear unencoded are / ? # [ ]

And %.

> So actually, it's safe to unencode everything *except* %2F. This could
> be achieved by:

And %25, which used to arrive in the PATH_INFO decoded, so this seems
to be an attempt to make handling / in path components unambiguously
possible, at the expense of making % harder.

Also, how would you reconstruct the original URL from such a
"partially encoded" PATH_INFO? This would break:

http://www.python.org/dev/peps/pep-0333/#url-reconstruction

If rack just follows the CGI spec for CGI vars, and tries to present
the original undecoded data elsewhere we have standard conformance and
non-loss of data.

I totally sympathize with your goal of making the rack spec allow
stuff you can theoretically do with HTTP, but I don't think partially
encoded PATH_INFO will really help.

The app I'm working on relies on URL reconstruction. It also would
benefit very much from being able to use a full URL as a
path-component... but even though HTTP's escaping rules would allow
that, its pretty clear that it's chance of working with actually
deployed code is low.

I wanted to do:

  http://example.com/ics/http:%2f%2fsome.site.com%2fcalendars%2fevents....

But since I will never (famous last words) have more than a single URL
in my path, anyway, I just dump it after the ? as the query info,
which works fine:

  http://example.com/ics/atom?http://some.site.com/calendars/events.ics

And ends up easier to construct, anyway.

This, btw, is how I found that the query info was being inject into
the ARGV... I was getting server 500 errors and rackup complaining
that "http://some.site.com/calendars/events.ics" was not a valid
configuration, because it was ARGV[0], and rackup was trying to open
it as a config file.

Sam


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Magnus Holm  
View profile  
 More options Mar 13 2009, 2:17 pm
From: Magnus Holm <judo...@gmail.com>
Date: Fri, 13 Mar 2009 19:17:17 +0100
Local: Fri, Mar 13 2009 2:17 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

Indeed. Anyone knows how the frameworks handle this? Do they just unescape
the whole PATH_INFO (Camping does at least) or do they do anything fancier?

//Magnus Holm

On Fri, Mar 13, 2009 at 14:58, Christian Neukirchen
<chneukirc...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
candlerb  
View profile  
 More options Mar 15 2009, 4:44 pm
From: candlerb <b.cand...@pobox.com>
Date: Sun, 15 Mar 2009 13:44:28 -0700 (PDT)
Local: Sun, Mar 15 2009 4:44 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
On Mar 13, 6:13 pm, Sam Roberts <vieuxt...@gmail.com> wrote:

> > In that scenario, any middleware which alters PATH_INFO will also have
> > to be careful to make corresponding changes to rack.path_info. Another

> Probably, though it might depend on why the middleware is modifying
> the env. I'd think doing so would generally be a bad idea.

Rack::URLMap is the canonical example.

> > So the only characters which cannot appear unencoded are / ? # [ ]

> And %.

You (and Ryan and Christian) are right of course. It really has to be
one thing or the other.

Aside: since this is Ruby we're talking about, we're not limited to
just strings. For example, PATH_INFO could be defined to be an array
of path components. Probably doesn't make life easier for anyone
though, compared with just having the original path available.

Regards,

Brian.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
candlerb  
View profile  
 More options Mar 20 2009, 12:37 pm
From: candlerb <b.cand...@pobox.com>
Date: Fri, 20 Mar 2009 09:37:30 -0700 (PDT)
Local: Fri, Mar 20 2009 12:37 pm
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO
On Mar 11, 12:49 pm, Christian Neukirchen <chneukirc...@gmail.com>
wrote:

> candlerb <b.cand...@pobox.com> writes:
> > Rack can specify whatever behaviour it likes, but the problem if we
> > say that handlers should *not* decode PATH_INFO is that in some cases
> > it may have already been done (e.g. when Rack is running as a CGI).

> When would it be useful to have it not decoded?

I just came across a practical example of this.

Apache Couchdb <http://couchdb.apache.org/> provides a HTTP API. The
first component of the path is the database name. You are allowed to
specify a database name which includes slashes, but they must be
encoded as %2F. e.g.

  http://127.0.0.1:5984/dev%2Fcustomers/...etc

If you do this, then it places the database file on disk under a
subdirectory hierarchy matching the database name, e.g.

 /usr/local/var/lib/couchdb/dev/customers.couch
                            ^^^^^^^^^^^^^


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Neukirchen  
View profile  
 More options Mar 25 2009, 9:26 am
From: Christian Neukirchen <chneukirc...@gmail.com>
Date: Wed, 25 Mar 2009 14:26:49 +0100
Local: Wed, Mar 25 2009 9:26 am
Subject: Re: Bug: WEBrick handler does not set the unescaped PATH_INFO

candlerb <b.cand...@pobox.com> writes:
> I just came across a practical example of this.

Since most webservers leave it with escapes and we have a patch to fix
webrick to make it escaped as well, I reverted 7a3d21f4b469d5ce; web
frameworks now have to escape for themselves.

I clarified the SPEC accordingly.

--
Christian Neukirchen  <chneukirc...@gmail.com>  http://chneukirchen.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »