This is off the original topic but have you considered how to deal
with Rack::Request
subclasses now that Rack::Request.new does memoization? I suppose
we'll need to define
RequestSubclass.new to have Object.new semantics again. Has anyone
else ran into this?
I was thinking it might be better to only perform memoization when
self == Rack::Request:
class Request
def self.new(env)
if self == Rack::Request
env["rack.request"] ||= super
else
super
end
end
end
Thoughts?
Ryan
> On Fri, Dec 26, 2008 at 8:55 AM, Michael Fellinger
> <m.fel...@gmail.com> wrote:
>> I'm using a subclass of Rack::Request for the things you mention
>> here...
>
> This is off the original topic but have you considered how to deal
> with Rack::Request
> subclasses now that Rack::Request.new does memoization? I suppose
> we'll need to define
> RequestSubclass.new to have Object.new semantics again. Has anyone
> else ran into this?
I'd prefer if people didn't subclass Rack::Request, but aggregated it...
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org
You mean have everyone open / add methods directly to Rack::Request? That's the
approach we took with Sinatra initially but it led to some weirdness and I've
since erred on the side of subclassing in most cases.
For instance, at one point we implemented the equivalent of
Rack::MethodOverride directly in Rack::Request#request_method. This caused
the request body to be read and parsed into params the first time
#request_method
was called, which broke an assumption in some other piece of Rack code. The
backtrace was jarring because you could see where Sinatra had injected itself
into core rack logic that it had no need or intention of mucking with.
Rack::Cache has some interesting examples in this area as well. The Request
and Response objects have a very shared-cache oriented view of themselves. For
instance, the Response#max_age method returns the value of the s-maxage
cache-control directive before falling back on the max-age directive.
IMO, having
a subclass has some utility in these cases.
It makes sense to try to get as much shared convenience code in Rack core as
possible but I think subclassing also has its place here. It should be supported
without too much syntactic vinegar, IMO.
Thanks,
Ryan
> You mean have everyone open / add methods directly to Rack::Request?
No, I mean has-a over is-a. Are there that many reasons to extend
Rack::Request?
On Sat, Dec 27, 2008 at 12:31 AM, Yehuda Katz <wyc...@gmail.com> wrote:
> Is there a specific reason to ever *not* do memoization? It seems to me that
> subclasses should have identical semantics unless specifically overridden.
If I'm reading the code correctly, the current implementation of
Rack::Request.new makes it hard/weird to use a subclass of Rack::Request once
any other Rack::Request instance has been memoized into an env:
class FooMiddleware < Struct.new(:app)
def call(env)
request = Rack::Request.new(env) # first time. memoized.
...
end
end
class BarMiddleware < Struct.new(:app)
class MySpecialRequest < Rack::Request
end
def call(env)
request = MySpecialRequest.new(env) # looked up in env.
fail "this request isn't special!" if request.class != MySpecialRequest
end
end
use FooMiddleware
use BarMiddleware
run lambda { |env| [200, {}, ['Hello World']] }
With FooMiddleware in place, BarMiddleware gets an instance of
Rack::Request. Without FooMiddleware, BarMiddleware gets an instance of
MySpecialRequest... naughty, IMO.
One workaround is to override :new in subclasses and memoize to a different
key and/or have the subclass return to Object.new semantics:
class MySpecialRequest < Rack::Request
def self.new(env)
env['bar.myspecialrequest'] ||= # this is probably useless
begin
inst = allocate
inst.send :intialize, env
inst
end
end
end
Ewww.
I can think of other approaches, like passing an empty hash to super and then
merging in the env provided, but it sure makes subclasses jump through hoops
to create an actual new instance when one is needed. A better option at this
point is to simply not use a subclass of Rack::Request and try to place the
logic elsewhere. Which may be fine, I just wanted to see if anyone else was
having to consider these issues now that new returns memoized instances without
regard to the class "new" was called on.
But now that I think about it, I'm not sure I see a huge benefit to memoizing
here anyway. Has anyone actually profiled this and found it to be a hot spot?
Rack::Request#initialize set a single instance variable to reference the env
object. A memoized lookup saves a handful of Class.allocate and
instance variable
set invocations per request. They're both basically constant time with
the memoized
version using a bit less space. It seems rather insignificant to me. What did
we actually accomplish here?
Thanks,
Ryan
The memoizing is primarily a bonus to the web frameworks crowd which
may create a new Request on each phase of handling. At least that's
where I recall the majority of the pro argument came.
Honestly, the idea of a mixin with a #robust_params makes the most
sense to me. Memoizing the superduper params hash in a different
location in the env for later retrieval similar to the way params
does.
--
stadik.net
Ryan
Here's a quick benchmark of memoized vs. normal object creation (1M iterations):
Rehearsal --------------------------------------------
memoized 0.860000 0.000000 0.860000 ( 0.874840)
normal 1.700000 0.020000 1.720000 ( 1.715816)
----------------------------------- total: 2.580000sec
user system total real
memoized 0.870000 0.000000 0.870000 ( 0.881126)
normal 1.700000 0.020000 1.720000 ( 1.728066)
I don't want to be a PITA here but Rack has a really excellent track record when
it comes to keeping the codebase simple and straightforward. I'm not against
micro-optimizations like this but I'd very much like to see what we're buying
for the added complexity. In this case, I'd say it's really very
premature (are we
seriously looking for fraction of a usec / request speedups at this point?),
though it obviously has some benefit over a large number of requests.
Thanks,
Ryan
--
stadik.net
class Merb::Request
def initialize env
@rr = Rack::Request.new(env)
end
def method_missing m, *a, &b
return @rr.__send__ m, *a, &b if @rr respond_to? m
super
end
def params
# lol, more thread leakage.
end
end
--
stadik.net
> Merb::Request.new(Rack::Request.new(env)) in each piece of middleware?
Merb::Request.new(env), and it can instantiate Rack::Request if it needs to?
> On Dec 27, 11:24 pm, "Yehuda Katz" <wyc...@gmail.com> wrote:
>> The win comes from being able to lazy-load things like parameter parsing.
>> The alternative is a middleware that does the parsing every time, which
>> isn't ideal since it can be expensive. I laid out the full case for this
>> solution in my first post on this topic. In short, it's decidely not a
>> micro-optimization :P
>
> Rack::Request has always lazy loaded and memoized things like
> parameter parsing
> into the env hash without eager middleware or hacks. I don't see how
> memoizing
> the Request instance buys you anything new here other than the time/
> space win.
We can now encapsulate these memoized values in the Request object
itself, as instance variables.
The idea is that you may have a middleware that instantiates a rack
request and maybe uses request.params triggering the params parsing.
Now in your later middlewares you want to use rack request.params
again without triggering parsing again.
It is not just to save on the request initialization as that is not
worth memoizing. It is about saving the request between middlewares so
you don't have to reparse params and other memoized instance vars in
the request object.
Does that make more sense?
Cheers-
-Ezra
Well, yeah, but this has never actually been an issue AFAIK. The parsed
params and other intensive actions have always been memoized into the
env by the Request class the first time they're accessed. Other instances
of Request use the same parsed params Hash and do not perform the same
processing because they share the env. That was one of the major reasons
the environ was modifiable in WSGI and it seemed to be put to good use in
Rack as well.
Christian explained that the reason for this is so that we can memoize to
instance variables in Request instead of memoizing to the env hash. I'd
prefer memoizing to the env, personally, since that keeps all state in a
single place and allows different request instances to provide different
views or processing models on top of the env. I saw this as a feature, not
a flaw. In fact, when we were spec'ing WSGI, a simple dictionary was chosen
for storing request state precisely because it's a simple and dumb data
structure with behavior that's well defined and unlikely to be modified.
Each component along the request pipeline could add features by wrapping
the env in some higher level object (like Rack::Request), _but_ that code
applied only to the specific component.
With these changes I fear we're moving to a situation where an
instance of Rack::Request replaces the env Hash as "the place where
request state is stored." I prefer the current design because I think
it's much more simple, increases separation between components, and
makes it easier for components to easily customize the Request
implementation by subclassing (as opposed to delegating to a Request
instance). And there's very little downside from a performance perspective
that I'm aware of. I'm worried that we're forcing the env behind an
object because we're under the assumption that the env, being a simple
Hash, was under-designed or not well thought through. Nothing could be
further from the truth, IMO.
With all that being said, I'm really sorry to keep slogging away at
this. I do think there's some merit in memoizing the request for performance
and being able to use instance variables for some types of state may
be useful (although I can't think of a case where I'd use an ivar over
the env). I just want to make sure we're doing it for the right reasons
and not because we're under bad assumptions about the current design and
capabilities.
Thanks,
Ryan
-Ezra
Ezra Zygmuntowicz
e...@engineyard.com
> I'm on Ryans side here.
>
> Memoizing the Request object is actually a step backwards and makes it
> harder to implement extensions. The "env" hash should be the request
> state, not some other special request object. I'd rather see specs for
> conventions like "rack.forms.vars".
I'm against that, here's why: this would make people write things like
env["rack.forms.vars"] || Rack::Request.new(env).params
which breaks encapsulation and leaks implementation detail.
> It would really suck if you needed
> to create a Rack::Request object anytime you wanted to do anything
> interesting with the env hash.
No need to, but when you want to parse parameters with Rack::Request,
you need an Rack::Request instance of course. What we are talking
about is how to make Rack::Request not parse the parameters twice,
which is just an implementation detail.
Why would you first check for "rack.form.vars"? Wouldn't you just:
Rack::Request.new(env).params
>> It would really suck if you needed
>> to create a Rack::Request object anytime you wanted to do anything
>> interesting with the env hash.
>
> No need to, but when you want to parse parameters with Rack::Request,
> you need an Rack::Request instance of course. What we are talking
> about is how to make Rack::Request not parse the parameters twice,
> which is just an implementation detail.
Right. And Rack::Request never parses parameters twice today, right? The
only difference I can see with memoizing to instance variables is that you
end up parsing params for each instance of Rack::Request instead of once
per env instance.
Thanks,
Ryan
Me too. All other things (e.g., simplicitly, correctness, completeness,
consistency) being considered, of course. Is there something specific
you're referring to here?
Per the original thread, you seem to be under the impression that Rack::Request
did not, or was not capable of, memoizing lengthy operations like param parsing,
which is simply not true. We've always avoided extra allocations and processing
there. The original issue was a non-issue. I've tried to make a case that the
existing model of memoizing to env has significant benefits over memoizing to
instance variables on Request with equivalent performance characteristics.
I've also already said that memoizing Request instances on Request.new in order
to reduce the number of allocations is fine (if extremely premature).
So where exactly was an "but the overhead of FOO is small" argument made? I'd
very much like to retract or defend it.
Thanks,
Ryan
Then we're in agreement. Perfect.
As a side note, if you really want to find lots of unnecessary allocations and
extra processing, take a look at Rack::Response and consider that most of that
stuff will typically happen multiple times per request :)
Thanks,
Ryan
The only I've need to create a Rack::Request object in middleware is
to get the parameters. Most of the time you can get the other things
like the request method simply with env["REQUEST_METHOD"]. The real
issue is that it is hard to get the parameters without making a
request object. Why not make it a utility?
# Returns params and also caches it in rack.form.hash etc
Rack::Utils.extract_params!(env)
A light weight utility like this would be more useful in middleware situations.
--
Joshua Peek