Does Rack SPEC states that PATH_INFO must be cut after ";" (semicolon)?

30 views
Skip to first unread message

Iñaki Baz Castillo

unread,
Nov 24, 2009, 1:22:12 PM11/24/09
to Rack Development
Hi, I've reported an issue for Thin HTTP server since it cuts PATH_INFO after
finding a ";":

http://github.com/macournoyer/thin/issues/closed/#issue/6

---------------------------
If Thin receives a request with request line:

GET /level1/level2;user=alice/level3/doc.txt HTTP/1.1

then Thin cuts the URI path and leaves:
env{"PATH_INFO"} => "/level1/level2"

Why? I've looked for such behavior in RFC 3986 and RFC 2616 but I don't find a
reason to remove all the path content after ";".
---------------------------


A Thin developer closed the bug with the argument:

----------------------------
the env object is constructed according to Rack specs:
http://rack.rubyforge.org/doc/SPEC.html
If you want the full URI look into env["REQUEST_URI"]
----------------------------


However I don't find in the SPEC where the PATH_INFO must be cut before the
first semicolon char.

I expect that is not correct at all since if ";" appears in the PATH (I mean
before the query) then it's a valid char for the PATH.

In fact this issue is breaking my application since Rack receives requests
containing ";" in the path like:

GET /app/user%25b@id=%22...@domain.org;nat=yes%22%2d

which after hex-unescaping becomes:

GET /app/user[@id="i...@domain.org;nat=yes"]

Thin (and Mongrel) creates PATH_INFO as follows:

/app/user%25b@id=%22...@domain.org


A workaround is forcing the client to escape the semicolon, but this violates
the application specs as ";" is not required to be encoded in the HTTP uri (it
could be encoded or not, but it's not required).


IMHO it's a bug in Rack SPEC (even if I don't find that point) or in
Thin/Mongrel implementation.

Thanks a lot for any comment.


--
Iñaki Baz Castillo <i...@aliax.net>

Iñaki Baz Castillo

unread,
Nov 30, 2009, 11:27:06 AM11/30/09
to Rack Development
Hi, could I get some reply to this mail please? I strongly think it could be a
bug in Rack so I can't understand why it is ignored.

In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
allowed character into the URI path sections.

Regards.

Hongli Lai

unread,
Nov 30, 2009, 6:31:00 PM11/30/09
to Rack Development
On Nov 30, 5:27 pm, Iñaki Baz Castillo <i...@aliax.net> wrote:
> Hi, could I get some reply to this mail please? I strongly think it could be a
> bug in Rack so I can't understand why it is ignored.
>
> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
> allowed character into the URI path sections.

As far as I know the Rack spec doesn't mention anything about ";".
Neither do the CGI and WSGI specs for that matter:
http://hoohoo.ncsa.illinois.edu/cgi/env.html
http://www.python.org/dev/peps/pep-0333/

Iñaki Baz Castillo

unread,
Nov 30, 2009, 7:42:46 PM11/30/09
to rack-...@googlegroups.com
2009/12/1 Hongli Lai <hon...@phusion.nl>:
Yes, as I said I inspected the BNF grammar section for both URI and
HTTP specifications, and ";" is an allowed char into an URI segment.
But the fact is that Thin and Mongrel cut the PATH_INFO when finding a ";".

Unfortunatelly the author of Thin closed the bug:
http://github.com/macournoyer/thin/issues/closed/#issue/6

Scytrin dai Kinthra

unread,
Dec 5, 2009, 5:54:03 PM12/5/09
to rack-...@googlegroups.com
After going through the various handlers and routing middlewares, I am
unaware of any apps that modify the PATH_INFO in the way described.
The split of PATH_INFO at ';' seems to occur before the request hits
rack.
--
stadik.net

Iñaki Baz Castillo

unread,
Dec 5, 2009, 6:15:07 PM12/5/09
to rack-...@googlegroups.com
El Sábado, 5 de Diciembre de 2009, Scytrin dai Kinthra escribió:
> After going through the various handlers and routing middlewares, I am
> unaware of any apps that modify the PATH_INFO in the way described.
> The split of PATH_INFO at ';' seems to occur before the request hits
> rack.

I'm really sorry as I forgot to update this thread. Thin's developer already
confirmed me that this is a bug in Mongrel parser:

http://github.com/macournoyer/thin/issues#issue/7

The fact is that using Webrick the issue doesn't happen. It occurs with
Mongrel itself and Thin as both uses Mongrel HTTP parser.

Eric Wong

unread,
Dec 6, 2009, 9:32:46 PM12/6/09
to rack-...@googlegroups.com, Marc-Andre Cournoyer
I�aki Baz Castillo <i...@aliax.net> wrote:
> El S�bado, 5 de Diciembre de 2009, Scytrin dai Kinthra escribi�:
> > After going through the various handlers and routing middlewares, I am
> > unaware of any apps that modify the PATH_INFO in the way described.
> > The split of PATH_INFO at ';' seems to occur before the request hits
> > rack.
>
> I'm really sorry as I forgot to update this thread. Thin's developer already
> confirmed me that this is a bug in Mongrel parser:
>
> http://github.com/macournoyer/thin/issues#issue/7
>
> The fact is that using Webrick the issue doesn't happen. It occurs with
> Mongrel itself and Thin as both uses Mongrel HTTP parser.

Hi,

This should be a one line fix for Thin, I just pushed out
a similar fix for unicorn.git (with tests).

From e8dd3e13b9a9f548a3138debd09e87fbb69e3998 Mon Sep 17 00:00:00 2001
From: Eric Wong <normal...@yhbt.net>
Date: Mon, 7 Dec 2009 02:20:18 +0000
Subject: [PATCH] http: PATH_INFO/REQUEST_PATH includes semi-colons

This is allowed according to RFC 2396, section 3.3 and matches
the behavior of URI.parse, as well.
---
ext/unicorn_http/unicorn_http_common.rl | 2 +-
lib/unicorn/app/old_rails/static.rb | 6 +---
test/unit/test_http_parser_ng.rb | 46 +++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl
index 5d46087..041dfec 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -33,7 +33,7 @@
query = ( uchar | reserved )* %query_string ;
param = ( pchar | "/" )* ;
params = ( param ( ";" param )* ) ;
- rel_path = ( path? %request_path (";" params)? ) ("?" %start_query query)?;
+ rel_path = (path? (";" params)? %request_path) ("?" %start_query query)?;
absolute_path = ( "/"+ rel_path );
path_uri = absolute_path > mark %request_uri;
Absolute_URI = (scheme "://" host_with_port path_uri);
diff --git a/lib/unicorn/app/old_rails/static.rb b/lib/unicorn/app/old_rails/static.rb
index 82f8aa5..13a435e 100644
--- a/lib/unicorn/app/old_rails/static.rb
+++ b/lib/unicorn/app/old_rails/static.rb
@@ -46,11 +46,7 @@ class Unicorn::App::OldRails::Static < Struct.new(:app, :root, :file_server)
end

# then try the cached version:
-
- # grab the semi-colon REST operator used by old versions of Rails
- # this is the reason we didn't just copy the new Rails::Rack::Static
- env[REQUEST_URI] =~ /^#{Regexp.escape(path_info)}(;[^\?]+)/
- path_info << "#$1#{ActionController::Base.page_cache_extension}"
+ path_info << ActionController::Base.page_cache_extension

if File.file?("#{root}/#{::Rack::Utils.unescape(path_info)}")
env[PATH_INFO] = path_info
diff --git a/test/unit/test_http_parser_ng.rb b/test/unit/test_http_parser_ng.rb
index e84c765..bb61e7f 100644
--- a/test/unit/test_http_parser_ng.rb
+++ b/test/unit/test_http_parser_ng.rb
@@ -371,4 +371,50 @@ class HttpParserNgTest < Test::Unit::TestCase
assert ! parser.headers?
end

+ def test_path_info_semicolon
+ qs = "QUERY_STRING"
+ pi = "PATH_INFO"
+ req = {}
+ str = "GET %s HTTP/1.1\r\nHost: example.com\r\n\r\n"
+ {
+ "/1;a=b?c=d&e=f" => { qs => "c=d&e=f", pi => "/1;a=b" },
+ "/1?c=d&e=f" => { qs => "c=d&e=f", pi => "/1" },
+ "/1;a=b" => { qs => "", pi => "/1;a=b" },
+ "/1;a=b?" => { qs => "", pi => "/1;a=b" },
+ "/1?a=b;c=d&e=f" => { qs => "a=b;c=d&e=f", pi => "/1" },
+ "*" => { qs => "", pi => "" },
+ }.each do |uri,expect|
+ assert_equal req, @parser.headers(req.clear, str % [ uri ])
+ @parser.reset
+ assert_equal uri, req["REQUEST_URI"], "REQUEST_URI mismatch"
+ assert_equal expect[qs], req[qs], "#{qs} mismatch"
+ assert_equal expect[pi], req[pi], "#{pi} mismatch"
+ next if uri == "*"
+ uri = URI.parse("http://example.com#{uri}")
+ assert_equal uri.query.to_s, req[qs], "#{qs} mismatch URI.parse disagrees"
+ assert_equal uri.path, req[pi], "#{pi} mismatch URI.parse disagrees"
+ end
+ end
+
+ def test_path_info_semicolon_absolute
+ qs = "QUERY_STRING"
+ pi = "PATH_INFO"
+ req = {}
+ str = "GET http://example.com%s HTTP/1.1\r\nHost: www.example.com\r\n\r\n"
+ {
+ "/1;a=b?c=d&e=f" => { qs => "c=d&e=f", pi => "/1;a=b" },
+ "/1?c=d&e=f" => { qs => "c=d&e=f", pi => "/1" },
+ "/1;a=b" => { qs => "", pi => "/1;a=b" },
+ "/1;a=b?" => { qs => "", pi => "/1;a=b" },
+ "/1?a=b;c=d&e=f" => { qs => "a=b;c=d&e=f", pi => "/1" },
+ }.each do |uri,expect|
+ assert_equal req, @parser.headers(req.clear, str % [ uri ])
+ @parser.reset
+ assert_equal uri, req["REQUEST_URI"], "REQUEST_URI mismatch"
+ assert_equal "example.com", req["HTTP_HOST"], "Host: mismatch"
+ assert_equal expect[qs], req[qs], "#{qs} mismatch"
+ assert_equal expect[pi], req[pi], "#{pi} mismatch"
+ end
+ end
+
end
--
Eric Wong

Iñaki Baz Castillo

unread,
Dec 7, 2009, 4:53:14 AM12/7/09
to rack-...@googlegroups.com
El Lunes, 7 de Diciembre de 2009, Eric Wong escribió:

> This should be a one line fix for Thin, I just pushed out
> a similar fix for unicorn.git (with tests).

> diff --git a/ext/unicorn_http/unicorn_http_common.rl
> b/ext/unicorn_http/unicorn_http_common.rl index 5d46087..041dfec 100644
> --- a/ext/unicorn_http/unicorn_http_common.rl
> +++ b/ext/unicorn_http/unicorn_http_common.rl
> @@ -33,7 +33,7 @@
> query = ( uchar | reserved )* %query_string ;
> param = ( pchar | "/" )* ;
> params = ( param ( ";" param )* ) ;
> - rel_path = ( path? %request_path (";" params)? ) ("?" %start_query
> query)?;
> + rel_path = (path? (";" params)? %request_path) ("?"
> %start_query query)?; absolute_path = ( "/"+ rel_path );

Thanks, I've reported it to Mongrel and Thin developers.

masayoshi takahashi

unread,
Dec 7, 2009, 6:08:37 AM12/7/09
to rack-...@googlegroups.com
FYI:
PSGI (Perl Web Server Gateway Interface Specification;
it's Perl version of Rack or WSGI) has no explicit definition
of PATH_INFO's separator, and Plack(PSGI reference implementation)
keep ";" in PATH_INFO.

cf.
http://search.cpan.org/~miyagawa/PSGI-1.03/PSGI.pod
http://github.com/miyagawa/Plack
http://bulknews.typepad.com/blog/2009/10/request_uri-will-be-in-mojo-support.html

Thanks,

Masayoshi Takahashi


2009/12/7 masayoshi takahashi <takah...@gmail.com>:
> Hi,
>
> 2009/12/1 Iñaki Baz Castillo <i...@aliax.net>:
>> Hi, could I get some reply to this mail please? I strongly think it could be a
>> bug in Rack so I can't understand why it is ignored.
>>
>> In fact, I've re-checked HTTP and URI BNF grammar and the semicolon ";" is an
>> allowed character into the URI path sections.
>
> I've read RFC3986 "Uniform Resource Identifier (URI): Generic Syntax".
>
> * "3.3. Path" in RFC3986:
>
>   Aside from dot-segments in hierarchical paths, a path segment is
>   considered opaque by the generic syntax.  URI producing applications
>   often use the reserved characters allowed in a segment to delimit
>   scheme-specific or dereference-handler-specific subcomponents.  For
>   example, the semicolon (";") and equals ("=") reserved characters are
>   often used to delimit parameters and parameter values applicable to
>   that segment.  The comma (",") reserved character is often used for
>   similar purposes.  For example, one URI producer might use a segment
>   such as "name;v=1.1" to indicate a reference to version 1.1 of
>   "name", whereas another might use a segment such as "name,1.1" to
>   indicate the same.  Parameter types may be defined by scheme-specific
>   semantics, but in most cases the syntax of a parameter is specific to
>   the implementation of the URI's dereferencing algorithm.
>
> * "5.4.  Reference Resolution Examples" and "5.4.1.  Normal Examples"
> in RFC3986:
>
>   Within a representation with a well defined base URI of
>
>      http://a/b/c/d;p?q
>
>   a relative reference is transformed to its target URI as follows.
> (snip)
>      "g:h"           =  "g:h"
>      "g"             =  "http://a/b/c/g"
>      "./g"           =  "http://a/b/c/g"
>      "g/"            =  "http://a/b/c/g/"
>      "/g"            =  "http://a/g"
>      "//g"           =  "http://g"
>      "?y"            =  "http://a/b/c/d;p?y"
>      "g?y"           =  "http://a/b/c/g?y"
>      "#s"            =  "http://a/b/c/d;p?q#s"
>      "g#s"           =  "http://a/b/c/g#s"
>      "g?y#s"         =  "http://a/b/c/g?y#s"
>      ";x"            =  "http://a/b/c/;x"
>      "g;x"           =  "http://a/b/c/g;x"
>      "g;x?y#s"       =  "http://a/b/c/g;x?y#s"
> (snip)
>
> I think ";" is used separator like "?" in this document.
>
> And "2.3.  Specific Schemes and their Syntactic Categories" in RFC 1808:
>
>   NOTE: Section 5 of RFC 1738 specifies that the question-mark
>         character ("?") is allowed in an ftp or file path segment.
>         However, this is not true in practice and is believed to be an
>         error in the RFC.  Similarly, RFC 1738 allows the reserved
>         character semicolon (";") within an http path segment, but does
>         not define its semantics; the correct semantics are as defined
>         by this document for <params>.
>
> (notice: RFC1808 is obsoleted by RFC3986)
>
> I think ";"  is confused. So Rack SPEC should not mention about ";".
> But, in the real applications like Thin or others, it's not bad
> that ";" is treated as ordinary character, not PATH separator.
>
> Hope this help,
>
> Masayoshi Takahashi
>
Reply all
Reply to author
Forward
0 new messages