> There is no current rack spec describing what REQUEST_URI should be.
Trouble starts in that there is absolutely no specification on what
REQUEST_URI even means. Try four different webservers, get four
different results.
I think a not-to-be-changed copy of the entire request path could be
useful, but let's not call it REQUEST_URI.
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org
I am in favor of providing the ability to 1) see a virtual mount point
and 2) know the full original information about the incoming request.
Making it part of the spec would enable middleware and app cooperation
without one-off ENV hacks.
Jon Crosby
http://joncrosby.me
> What do you think about making PATH_INFO rewritable, making it a potentially
> true virtual location?
This may be specified awkwardly, but already is supposed to work.
My concern with rewriting PATH_INFO to something completely different
is that the rack end point will not know what the original request URI
was. I'm fine with not using REQUEST_URI as the key, but let's pick a
name.
REQUEST_PATH, ORIGINAL_PATH, FULL_PATH, any other ideas?
Carl Lerche
Engine Yard
REQUEST_PATH would probably not include the host, scheme, etc.
what about CLIENT_URI?
-----Original Message-----
From: Yehuda Katz <wyc...@gmail.com>
Subj: Re: Making PATH_INFO possibly unrelated to the request URI and defining REQUEST_URI
Date: Tue Apr 7, 2009 6:06 pm
Size: 1K
To: rack-...@googlegroups.com
REQUEST_PATH would probably not include the host, scheme, etc.
what about CLIENT_URI?
-- Yehuda
On Tue, Apr 7, 2009 at 4:00 PM, Daniel N <has...@gmail.com> wrote:
>
>
> On Wed, Apr 8, 2009 at 7:03 AM, Carl Lerche <carl....@gmail.com> wrote:
>
>>
>> My concern with rewriting PATH_INFO to something completely different
>> is that the rack end point will not know what the original request URI
>> was. I'm fine with not using REQUEST_URI as the key, but let's pick a
>> name.
>>
>> REQUEST_PATH, ORIGINAL_PATH, FULL_PATH, any other ideas?
>>
>> Carl Lerche
>> Engine Yard
>>
>
> +1 for REQUEST_PATH as the original immutable version
>
>
>>
>> On Apr 7, 1:17 pm, Christian Neukirchen <chneukirc...@gmail.com>
>> wrote:
>> > Yehuda Katz <wyc...@gmail.com> writes:
>> > > What do you think about making PATH_INFO rewritable, making it a
>> potentially
>> > > true virtual location?
>> >
>> > This may be specified awkwardly, but already is supposed to work.
>> >
>> > --
>> > Christian Neukirchen <chneukirc...@gmail.com> http://chneukirchen.org
>
--- message truncated ---
> REQUEST_PATH, ORIGINAL_PATH, FULL_PATH, any other ideas?
rack.original_path
Do we need to keep the SERVER_NAME etc as well? Then maybe rather
rack.original_uri
Eitherway, it needs to be specced well.
I agree.
I had trouble finding the url to my app's rack mount point in a way
that worked with cgi and mongrel, and worked with .htaccess url
rewriting.
Basically, I have a Sinatra app that can be mounted at arbitrary
points using rack, and that needs to know it's location to return html
that points back at subpaths within it. Also, I need to strip the
request paths of query parameters.
I ended up with this:
# Complete path, as requested by the client. Take care about CGI
path rewriting.
def request_path
# Using .to_s because rack/request.rb does, though I think the Rack
# spec requires these to be strings already.
begin
URI.parse(env["SCRIPT_URI"].to_s).path
rescue
env["SCRIPT_NAME"].to_s + env["PATH_INFO"].to_s
end
end
# Complete path, as requested by the client, without the env's PATH_INFO.
# This is the path to whatever is "handling" the request.
#
# Recent discussions on how PATH_INFO must be decoded leads me to think
# this might not work if the path had any URL encoded characters in it.
def script_path
request_path.sub(/#{env["PATH_INFO"]}$/, "")
end
It is particularly difficult to find the original paths in the face of
url rewriting, it would be nice if the rack spec forced the handlers
to gather this information in a coherent and well-defined way from the
servers, and pass it through as "rack." env variables.
Cheers,
Sam
+100!
I like that rack uses a CGI-like env as its base, and doesn't hack it.
But there is information that isn't available in a standard fashion in
the CGI environment, and non-CGI adapters are different, anyhow.
Knowing where you are, and in particular, having the information
available in a URL-encoded form (so it isn't damaged, and its
reversible) would be really handy, note the recent problems with
PATH_INFO, and questions about what it's form is (encoded vs decoded).
I've only worked with 2 (or 3?) adapters, and I had to run them all in
debug mode, examine the env, and develop a strategy for finding my
apps location. It shouldn't be necessary to run every rack adapter to
write a rack-based app!
SCRIPT_NAME isn't what you want when deploying under CGI, and you
shouldn't have to know that, or test with every rack adapter.
> Running an application framework like this as a CGI - even a
> comparatively small one like Sinatra+Rack - is going to have a pretty
> painful startup overhead per request, and so supporting this wouldn't
> be my number one priority. Perhaps FastCGI or SCGI are more important
> to support. However, I don't know how mod_rewrite and SCRIPT_NAME
> interact for those.
Important to who? Rack isn't all about web apps with small low-latency
requests, I hope!
Whether CGI overhead is "painful" depends on how much work the script
does, if it performs a high-latency task, the overhead is
unnoticeable.
Note that CGI is:
- trivial to deploy, often just involving copying the exe into cgi-bin/
- is trivially parallizable even with ruby 1.8, in the sense that
every request is its own process, so if your http server runs CGI
scripts in parallel, you get true parallelism, no need for event
driven co-ordination through rack to the server. This is particularly
nice when it takes a long time to service requests and you don't want
the whole server blocked - note that this is the case where CGI
overhead is irrelevant. For long-latency service points in a SOA
architecture, this sidesteps various blocking issues.
- is naturally in "development mode", since it always does reloading
by its nature, so all the difficulties getting sinatra apps to reload
disappear (that whole rack middleware thing where you use a non-CGI
server, but then fork and run everything in another ruby instance is
basically convoluted CGI).
I'm not trying to convince anybody to use CGI, running merb or rails
under it would probably classify as criminally incompetent, but rack
can be agnostic as to the kinds of apps built on it.
> However, if you are using mod_proxy and rewriting the URL to a
> different path, then this will have the same problem. I can't see any
> option other than explicitly configuring the application with its
> mountpoint, because the proxied HTTP request will not carry the
> 'original' URL anyway.
I'm surprised the original external-facing request paths aren't passed
onwards with proxies, though. Too bad.
Anyhow, not the same thing, its the difference between possible and
impossible. If proxying/http forwarding masks the external URL, then
you do indeed need configuration.
For Apache's CGI implementation, it provides external facing URL info
(pre-rewritten), and post-rewritten, so configuration wasn't necessary
for me running under the rack two handlers I tested.
Sam
Try with mod_rewrite to hide the cgi-bin/script.rb
Magnus Holm just described the behaviour under those conditions, also
see my original code.
> $ time curl http://localhost/cgi-bin/env.rb/foo/bar
> SCRIPT_NAME="/cgi-bin/env.rb"; PATH_INFO="/foo/bar"
> real 0m0.431s
> user 0m0.032s
> sys 0m0.020s
>
> Using rubygems instead of explicit paths to rack and sinatra takes it
> to over 1.1 seconds.
Weird, my garbage rack playground (which is currently dying and
triggering the exception middleware) uses gems, and returns in 1/10 of
a second. I'm in Vancouver, webfaction is in Texas, AFAIK.
time curl http://hello.octetcloud.com/ > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 51720 100 51720 0 0 76770 0 --:--:-- --:--:-- --:--:-- 146k
curl http://hello.octetcloud.com/ > /dev/null 0.01s user 0.00s system
1% cpu 0.681 total
Cheers,
Sam
> SCRIPT_NAME isn't what you want when deploying under CGI, and you
> shouldn't have to know that, or test with every rack adapter.
When you use mod_rewrite or something, *you* need to ensure that
SCRIPT_NAME is the part before PATH_INFO when the app finally gets
called, e.g. with an overriding middleware, or something like LighttpdFix.
REQUEST_URI = the whole part after the hostname
REQUEST_PATH = PATH_INFO = everything up to the first "?"
check out http://zelhory.schkocr.cz/phpinfo.php?foo=bar
Also, checkout the URI syntax on http://www.w3.org/Protocols/rfc1945/rfc1945
Section 3.2.1.
relativeURI = net_path | abs_path | rel_path
rel_path = [ path ] [ ";" params ] [ "?" query ]
And the thin parser shows how the parts are added to the ENV.
http://github.com/macournoyer/thin/blob/master/ext/thin_parser/common.rl
rel_path = ( path? %request_path (";" params)? ) ("?" %start_query
query)?;
Ciao,
Tim
> REQUEST_URI = the whole part after the hostname
Which would not be a URI at all.
Please let's not use REQUEST_URI for anything, it is severely
unspecified. Cook up a new name for what it should be.
I would have no problem with placing an immutable definition of the
script-URI (as in the context of the CGI 1.1 spec) in some rack
specific variable such as 'rack.request_uri'. It would be awesome if
this was an instance of URI, and then we could be able to simply track
PATH_INFO or SCRIPT_NAME as within env['rack.request_uri'].path and
REQUEST_URI would be subject to willy-nilly.
I am opposed to altering the definition of these meta-variables away
from the CGI spec. It's already a bit mangled in terms of the rack
routing with such middlewares like URLMap, but it still provides a
nice definition of SCRIPT_NAME is something I want to keep in any urls
that any requests return to this specific application, and PATH_INFO
is malleable within my application or provides additional information.
--
stadik.net
> Tim Carey-Smith <g...@spork.in> writes:
>
>> REQUEST_URI = the whole part after the hostname
>
> Which would not be a URI at all.
>
> Please let's not use REQUEST_URI for anything, it is severely
> unspecified. Cook up a new name for what it should be.
I believe the place where someone might have decided REQUEST_URI was
useful
was RFC 1945 [1] and you are correct that it isn't really a URI.
In this instance, the Request-URI (section 5.1.2) is describing either
the
"abs_path" (/foo/bar;params?query=string) or the "absoluteURI" for
proxy requests.
It does seem to have some ambiguity in what it specifies, if indeed it
specifies
anything!
It seems that most HTTP parsers extract the parts of the "Request-
Line" into their own
values and that itself it reasonably standard (aside from nginx-
passenger :).
Request-Line = HTTP_METHOD " " PATH_INFO QUERY_STRING " " HTTP_VERSION
"\r\n"
I think I'm verging on mindless rambling now,
Tim