Many web frameworks and ORM tools have the need to propagate data
depending on some or other context within which a request is dealt with.
Passing it all via parameters to every nook of your code is cumbersome.
A lot of the frameworks use a thread local context to solve this
problem. I'm assuming these are based on threading.local.
(See, for example:
http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual )
Such usage assumes that one request is served per thread.
This is not necessarily the case. (Twisted would perhaps be an example,
but I have not checked how the twisted people deal with the issue.)
The bottom line for me is that if you build a WSGI app, you'd not want
to restrict it to being able to run in a one request-per-thread setup.
So I've been playing with the idea to use something that creates a
context local to the current call stack instead. I.e. a context (dict)
which is inserted into the call stack at some point, and can be accessed
by any method/function deeper in the stack.
The normal use case for this is to propagate a database connection. But
it can also be used to propagate other things, such as information about
the user who is currently logged in, etc.
Since this is one way of creating objects that are global to a context
(the call stack), I'm sure it is in some ways evil and can be abused.
But that criticism can be levelled against the thread-local solution
too...
I attach some code to illustrate - and would appreciate some feedback on
the idea and its implementation.
-i
2008/7/4 Iwan Vosloo <iw...@reahl.org>:
[snip]
> A lot of the frameworks use a thread local context to solve this
> problem. I'm assuming these are based on threading.local.
>
> (See, for example:
> http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual )
scoped_session is actually, I think, a bad example, as SQLAlchemy uses
the thread id to scope things per session, not threading.local. As
long as there's a way to uniquely identify "context", scoped_session
could also be scoped differently, as long as it has a way identify the
context that doesn't need any non-global parameters.
Zope 3 may be a better example, as it does use thread locals to scope
things per thread (I believe this requirement by Zope was actually one
of the reasons this feature was moved into Python). There may also be
other parts of SQLAlchemy that indeed use thread local variables.
Regards,
Martijn
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com
On Fri, 2008-07-04 at 13:31 +0200, Martijn Faassen wrote:
> scoped_session is actually, I think, a bad example, as SQLAlchemy uses
> the thread id to scope things per session, not threading.local. As
> long as there's a way to uniquely identify "context", scoped_session
> could also be scoped differently, as long as it has a way identify the
> context that doesn't need any non-global parameters.
>
> Zope 3 may be a better example, as it does use thread locals to scope
> things per thread (I believe this requirement by Zope was actually one
> of the reasons this feature was moved into Python). There may also be
> other parts of SQLAlchemy that indeed use thread local variables.
Point taken, I'm not familiar with the implementation of scoped_session.
But still, it is the same idea as that implemented in threading.local,
isn't it?
-i
The natural solution with WSGI is to store objects in the environ
dictionary.
In fact in my web applications I always pass the environ dictionary
explicitly to every functions.
> [...]
Manlio Perillo
But, this passing of the environ dictionary to every function in you web
app is exactly what I'd want to avoid?
-i
On Fri, Jul 4, 2008 at 1:37 PM, Iwan Vosloo <iw...@reahl.org> wrote:
> On Fri, 2008-07-04 at 13:31 +0200, Martijn Faassen wrote:
>> scoped_session is actually, I think, a bad example, as SQLAlchemy uses
>> the thread id to scope things per session, not threading.local. As
>> long as there's a way to uniquely identify "context", scoped_session
>> could also be scoped differently, as long as it has a way identify the
>> context that doesn't need any non-global parameters.
>>
>> Zope 3 may be a better example, as it does use thread locals to scope
>> things per thread (I believe this requirement by Zope was actually one
>> of the reasons this feature was moved into Python). There may also be
>> other parts of SQLAlchemy that indeed use thread local variables.
>
> Point taken, I'm not familiar with the implementation of scoped_session.
> But still, it is the same idea as that implemented in threading.local,
> isn't it?
Yes, I think so, except that scoped_session is more flexible than that
and could actually be convinced to use your technique for
identification of scope as well.
Regards,
Martijn
Yes, but you only need to pass the environ dictionary and not N paramerers.
I think this is a good compromise.
Using thread local storage is not the solution to every problem (as you
have noted it can not be used when the server handle more then one
request per thread).
> -i
>
Manlio Perillo
You're correct that Twisted Web does not allocate a thread per request.
All requests are handled by an event loop in the main thread.
However, the WSGI request handled in Twisted does actually spawn a
thread to run the WSGI application because most WSGI applications are
blocking.
>
> The bottom line for me is that if you build a WSGI app, you'd not want
> to restrict it to being able to run in a one request-per-thread setup.
>
> So I've been playing with the idea to use something that creates a
> context local to the current call stack instead. I.e. a context (dict)
> which is inserted into the call stack at some point, and can be accessed
> by any method/function deeper in the stack.
In Twisted, the call stack tends to gets fragmented during a sequence of
asynchronous calls because of its callback mechanism. Basically, you're
hopping in and out of the Twisted reactor (the event mainloop) all the
time. Leaving something in the call stack would not work at all.
The ideal solution is, of course, to pass everything around to whatever
needs it. However, there's really tedious at times.
Whatever the architecture of the web server there is always a request
or, in case of WSGI, an env dict. Therefore, request-scope objects
should be associated with the request.
>
> The normal use case for this is to propagate a database connection. But
> it can also be used to propagate other things, such as information about
> the user who is currently logged in, etc.
>
> Since this is one way of creating objects that are global to a context
> (the call stack), I'm sure it is in some ways evil and can be abused.
> But that criticism can be levelled against the thread-local solution
> too...
Yep, thread and call stack locals are both bad. Think in terms of
request locals instead and things start getting better.
>
> I attach some code to illustrate - and would appreciate some feedback on
> the idea and its implementation.
>
> -i
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Web-SIG mailing list
> Web...@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/matt%40pollenation.net
--
Matt Goodall
Technical Director, Pollenation Internet Ltd
Registered Number: 4382123
Registered Office: 237 Lidgett Lane, Leeds, West Yorkshire, LS17 6QR
A member of the Brunswick MCL Group of Companies
w: http://www.pollenation.net/
e: ma...@pollenation.net
t: +44 (0) 113 2252500
This message may be confidential and the views expressed may not reflect
the views of my employers. Please read http://eudaimon-group.com/email
if you are uncertain what this means.
> In Twisted, the call stack tends to gets fragmented during a sequence of
> asynchronous calls because of its callback mechanism. Basically, you're
> hopping in and out of the Twisted reactor (the event mainloop) all the
> time. Leaving something in the call stack would not work at all.
Couldn't you put something in the call stack each time in the main loop,
before calling a callback (which will be popped again when that callback
returns to the main loop)?
> The ideal solution is, of course, to pass everything around to whatever
> needs it. However, there's really tedious at times.
>
> Whatever the architecture of the web server there is always a request
> or, in case of WSGI, an env dict. Therefore, request-scope objects
> should be associated with the request.
True, but even passing a request or env dict around to everyone gets
tedious don't you think?
-i
It can. Zope 3 makes a pretty good compromise here. The "top level"
object involved in handing the request -- a view -- gets the request
object explicitly passed as a parameter. If the view wants to pass the
request to function calls or other objects, then it's free to do so.
But, if at some point you find yourself without a reference to the
current request and really need it, you can get it "out of thin air" by
calling (essentially) get_request().
The Zope 3 publisher precesses requests using a thread pool, so
get_request() is implemented by stashing the request object in the
tread-local storage prior to processing the request and digging it back
out if requested.
Other implementations could store the request somewhere else, but the
idea is the same.
--
Benji York
CherryPy does something similar. The "top level" object involved in
handing the request -- cherrypy.serving -- gets the request and response
objects set as attributes. But instead of calling get_request() as in
Zope 3, there are proxy objects sitting at cherrypy.request and
cherrypy.response which shuttle getattr and setattr to
cherrypy.serving.request/response. That allows app code to just "import
cherrypy" and have access everywhere.
Now, cherrypy.serving _is_ a threadlocal object. But I don't imagine it
would be difficult for a non-threaded HTTP server to replace
cherrypy.serving with some other-context-local if they liked.
Robert Brewer
fuma...@aminus.org
The Spawning server
(http://ulaluma.com/pyx/archives/2008/06/spawning_01_rel.html) would
indeed get things mixed up this way, as uses greenlets to make (at least
some) blocking calls async. So it would encounter this problem full-force.
To throw another wrench in things, with the Paste/WebError evalexception
interactive exception handler, it restores this thread-local context so
you can later execute expressions in the same context.
--
Ian Bicking : ia...@colorstudy.com : http://blog.ianbicking.org
Yes, that's probably achievable by subclassing Deferred (the callback
class) and using a closure to reinstate the context before the callback
function is called. Perhaps I'll give it a go out of interest.
However, I'm not convinced it's a good idea and I suspect the Twisted
developers would sooner pluck out their eyeballs (or worse still, mine!)
than allow it into Twisted core ;-).
>
>> The ideal solution is, of course, to pass everything around to whatever
>> needs it. However, there's really tedious at times.
>>
>> Whatever the architecture of the web server there is always a request
>> or, in case of WSGI, an env dict. Therefore, request-scope objects
>> should be associated with the request.
>
> True, but even passing a request or env dict around to everyone gets
> tedious don't you think?
Yes, it can be tedious but I believe explicit arg passing is necessary
to make code readable, testable and reusable.
If it's web-related code then give it the request, it will almost
certainly need it. Otherwise, don't.
I would even advocate extracting request-scope objects, e.g. a database
connection, the current user, etc, as early as possible and passing them
around explicitly (along with the request, if necessary).
I've made the mistake of relying on magic contexts in the past. I'm
still trying to fix things.
- Matt
--
Matt Goodall
Technical Director, Pollenation Internet Ltd
Registered Number: 4382123
Registered Office: 237 Lidgett Lane, Leeds, West Yorkshire, LS17 6QR
A member of the Brunswick MCL Group of Companies
This message may be confidential and the views expressed may not reflect
the views of my employers. Please read http://eudaimon-group.com/email
if you are uncertain what this means.
I understand the explicit passing arguments. However, if you pass a
particular argument to _each and every_ little method,
readability/testability/reusability are adversely affected too. And
sometimes you need to pass, say the request, to the strangest little
methods just because one of them somewhere needs to do something with
the request which you did not anticipate. It may not even be logically
related to what that method does. Or, worse, it may be that you sit with
a bunch of polymorphic methods, and one of their implementations needs
to have a request - forcing you to add a request parameter to all of
them.
Bottom line for me is that if you add, say "request" to a method
signature, it must make sense from a caller's perspective to have to
give a request. (And I want to add: to give a request to a method named
xxx.) Otherwise the interface of the method contains illogical bits
necessitated by its implementation.
Isn't that bad too?
-i
This exactly what I too have realized!
I'm developing a WSGI framework with all these (and other) ideas:
http://hg.mperillo.ath.cx/wsgix
Its still not documented, so I have not yet made an official announcement.
The main design goal is to keep the level of the interface as low level
as possible.
I don't like additional interfaces (like Request and Response) objects
around the WSGI dictionary, and I don't like frameworks like Django that
completely hides the WSGI interface.
> [...]
Manlio Perillo
I'm adding web-sig in Cc.
> [...]
>> I'm developing a WSGI framework with all these (and other) ideas:
>> http://hg.mperillo.ath.cx/wsgix
>>
>> Its still not documented, so I have not yet made an official
>> announcement.
>>
>> The main design goal is to keep the level of the interface as low
>> level as possible.
>>
>> I don't like additional interfaces (like Request and Response) objects
>> around the WSGI dictionary, and I don't like frameworks like Django
>> that completely hides the WSGI interface.
>
> Have you tried webob? My first run as Paste avoided wrappers around
> those objects, but an object interface has been very helpful.
>
I have not tried it, but I have read the code (as I have read the code
of Paste).
In principle I'm against using additional interface, and one of the
reason I wrote wsgix is to have a prof of concept, for trying to
understand if it is feasible to write a WSGI application using an
alternative framework.
wsgix (+ mod_wsgi for Nginx) has the same role as Paste, but I have
decided to use a rather different approach.
As an example, in Paste you have choosed to using config dictionary for
middleware configuration, that is, you have middleware factories.
In wsgix it is very different.
As an example:
http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/contrib/messages.py
http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/contrib/error_page.py
There are no factories.
The configuration is read (and globally cached) at request time from the
environ dictionary.
With Nginx, configuration parameters can be defined in the server
configuration.
There is an helper class:
http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/options.py
that helps with the parsing.
There is also a middleware:
http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/conf/middleware.py
that reads the configuration from a YAML file, and merge it into the
environ dictionary.
Of course it's all a matter of personal taste :).
The goal is to have the possibility to write "truly" reusable
middlewares, that are easy to "plug" inside any WSGI server (almost all
of configuration parameters have default values).
I think this is a red herring. WebOb specifically doesn't do anything
related to configuration or the setup of the stack. What it does do is
stuff like:
expires = http.format_time(0)
http.generate_cookie(
environ, headers, name, '', expires=expires,
domain=cookie_domain(environ), path=path,
max_age=0)
which would be resp.delete_cookie(name) (well, cookie_domain seems to be
derived from a setting, but that's mostly unrelated). This isn't a
particularly substantial difference, but these small conveniences add up.
--
Ian Bicking : ia...@colorstudy.com : http://blog.ianbicking.org
Can you elaborate?
Robert Brewer
fuma...@aminus.org
As I have said, this is a personal taste, I don't like the
"architecture" used by WebOb and prefer to directly use the environ
dictionary without introducing other abstractions.
This is possible, I'm writing a "not simple" application using wsgix.
I'm still evaluating if I can reuse WebOb parsing functions (and this
would be a great thing: I think that we *really* need a package with
*only* low *level* parsing functions for the HTTP protocol).
From what I can see, WebOb *does* not offer a low level interface for
the parsers: you *have* to use the Request object.
I really like multilevel architectures, instead.
Manlio Perillo
This was the deliberate approach of Paste, and it does have several
functions for doing things similar to how you describe. As I said, I
went down exactly this path, but I think WebOb solves the problem
better. You can think of WebOb as a way of currying functions. All the
request functions take an environ argument, curried through
instantiation of webob.Request. All response functions take
status/headers/app_iter, curried through webob.Response. State is never
held outside the environment or the status/headers/app_iter of the response.
So think of webob.Request as the module of request-parsing routines, and
webob.Response as the module of response-parsing routines. (There are
underlying functions for things like parsing dates, but they are only
exposed through those classes.)
--
Ian Bicking : ia...@colorstudy.com : http://blog.ianbicking.org
Well, it's really call-local, i.e., dynamic scoping. Another option
would be something like attaching this dynamic scoping to the frame
objects themselves, in a way that evalexception could be aware
(restoring them when trying to execute code in the context of some
frame) and potentially greenlets could do the same thing.
It could be done in a WSGI-specific way, and that might be useful, but
the general issue is applicable to more than WSGI.
Generally the problems we are talking about only occur when some kind of
(semi-)transparent concurrency other than threads are used. This
includes greenlets, restoring a frame like in evalexception, and
potentially generators with the app_iter.
> Iwan Vosloo wrote:
>> Many web frameworks and ORM tools have the need to propagate data
>> depending on some or other context within which a request is dealt
>> with.
>> Passing it all via parameters to every nook of your code is
>> cumbersome.
>> A lot of the frameworks use a thread local context to solve this
>> problem. I'm assuming these are based on threading.local. (See,
>> for example:
>> http://www.sqlalchemy.org/docs/05/
>> session.html#unitofwork_contextual )
>> Such usage assumes that one request is served per thread.
>> This is not necessarily the case. (Twisted would perhaps be an
>> example,
>> but I have not checked how the twisted people deal with the issue.)
>
> The Spawning server (http://ulaluma.com/pyx/archives/2008/06/spawning_01_rel.html
> ) would indeed get things mixed up this way, as uses greenlets to
> make (at least some) blocking calls async. So it would encounter
> this problem full-force.
With the latest version of Spawning (http://pypi.python.org/pypi/Spawning/0.6
) this is only true if specifically configured to do so (by passing --
threads=0 or including num_threads = 0 in the ini file). In this case
Spawning monkey-patches the threadlocal module with a version that
stores things in greenlet-local storage. This makes Pylons
applications and other applications that use thread-local storage work
as long as the application does not do any blocking database operations.
However, by default Spawning now uses a threadpool to execute wsgi
applications, since the vast majority of wsgi applications probably
block. This makes it functionally identical to the Twisted server
which executes the actual wsgi application in a threadpool.
> To throw another wrench in things, with the Paste/WebError
> evalexception interactive exception handler, it restores this thread-
> local context so you can later execute expressions in the same
> context.
It seems to me that what is really needed here is an extension of wsgi
that specifies how to get, set, and list request local storage, and
for people to use that instead of the threadlocal module. Of course,
for threaded servers, they will just use the threadlocal module, but
for Spawning running in single-threaded cooperative mode it would use
a greenlet-local implementation, and for a hypothetical Twisted server
running a hypothetical asynchronous wsgi application it would just use
a random request id.
Donovan
I don't follow why you wouldn't just put that in the environ. (If
you need it to be carried back from the application, use mutable
objects in the environ.)
There seems to be something that I don't understand: why not just store
the values inside the WSGI environ dictionary?
It is a per request dictionary, so it is really what you want.
> [...]
Manlio Perillo
> At 02:12 PM 7/7/2008 -0700, Donovan Preston wrote:
>> It seems to me that what is really needed here is an extension of
>> wsgi
>> that specifies how to get, set, and list request local storage, and
>> for people to use that instead of the threadlocal module.
>
> I don't follow why you wouldn't just put that in the environ. (If
> you need it to be carried back from the application, use mutable
> objects in the environ.)
Yes, the logical place to store it is in the environ, but this whole
thread is about having an api for doing request-local storage that
doesn't involve passing the request everywhere.
Here's what I am imagining:
There's just a module, called requestlocal or something. It has an API
just like threading.local(), except the implementation can be changed
by the wsgi server.
I personally don't like the idea of having magical context, but I
think this is a practicality versus purity issue. Obviously plenty of
people have a desire to have a place to store request-local data
without passing the environment everywhere. Using threading.local is a
good way to do that, unless the server is not using one thread per
request. Giving people an interface to write to that doesn't
specifically mention threads and is customizable by the wsgi server is
what I am suggesting.
Donovan
Using greenlets, there is always a current greenlet, so you can use this
for local storage.
A library function can check if there is an active greenlet, and use it
as data key; otherwise it will use the current thread id.
However this will not work if you have an asynchronous server that does
not make use of greenlets.
> [...]
Manlio Perillo
> Using greenlets, there is always a current greenlet, so you can use
> this for local storage.
>
> A library function can check if there is an active greenlet, and use
> it as data key; otherwise it will use the current thread id.
Yes, this is exactly what I did in the
wrap_threading_local_with_coro_local here:
http://donovanpreston.com:8888/eventlet/file/b6f9627e88df/eventlet/util.py
> However this will not work if you have an asynchronous server that
> does not make use of greenlets.
Exactly, which is why I am proposing just standardizing something that
does exactly what people use threading.local for, but whose
implementation is pluggable by the wsgi server.
Donovan
Ok.
>> However this will not work if you have an asynchronous server that
>> does not make use of greenlets.
>
> Exactly, which is why I am proposing just standardizing something that
> does exactly what people use threading.local for, but whose
> implementation is pluggable by the wsgi server.
>
But this will be not easy to implement, especially if it should go in a
separate module.
Maybe its better to have something like:
wsgiorg.local_scope
a function that returns the current request id.
The function itself is not bound to the current request, so it can be
safely stored.
Maybe this should be more easy to implement, I'm not sure.
Manlio Perillo
Yes... and the practicality of simply storing things in the environ wins. :)
Don't get me wrong: I use "magical" contexts in my libraries, both
thread-local and otherwise. Indeed, I've got one that solves the
sort of problems you guys are talking about here, at least insofar as
being able to handle Twisted or greenlets' context-swapping needs.
But for stuff you could just put in a WSGI environ, it seems like
ludicrous overkill to me.
> Obviously plenty of
>people have a desire to have a place to store request-local data
>without passing the environment everywhere. Using threading.local is a
>good way to do that, unless the server is not using one thread per
>request. Giving people an interface to write to that doesn't
>specifically mention threads and is customizable by the wsgi server is
>what I am suggesting.
Er, and how do you propose people *access* that interface rather than
a specific implementation of it? Wouldn't we need to pass it in the
environ, thereby rendering the whole thing even more obviously moot? :)
> At 11:35 AM 7/8/2008 -0700, Donovan Preston wrote:
>> Obviously plenty of
>> people have a desire to have a place to store request-local data
>> without passing the environment everywhere. Using threading.local
>> is a
>> good way to do that, unless the server is not using one thread per
>> request. Giving people an interface to write to that doesn't
>> specifically mention threads and is customizable by the wsgi server
>> is
>> what I am suggesting.
>
> Er, and how do you propose people *access* that interface rather
> than a specific implementation of it? Wouldn't we need to pass it
> in the environ, thereby rendering the whole thing even more
> obviously moot? :)
You're right. A standard specific implementation is what I am
suggesting. Here, code should help:
## requestlocal.py
## use thread-local storage as the default
from threading import local
def set_local_implementation(imp):
global local
local = imp
If a wsgi server wants to implement request-local storage by using the
environ, it would call set_local_implementation with an imp function
that closes over the environ for each request.
Donovan
And what package does requestlocal.py live in?
Robert Brewer
fuma...@aminus.org
I can't decide what the question is here. You mean, how can a greenlet
request-local provider indicate that they are providing a way of getting
the current request? Or, how can a consumer get access, given that it
can live in any module, and the consumer presumably doesn't have an environ?
I imagine from what Donovan says that there would actually be one
module, requestlocal, and one implementation, and that implementation
would be awesome and support greenlets and threads, and whatever else
comes along (which luckily is not much else), and I guess maybe has a
middleware that would register the request on entry and deregister it on
exit, and consumers would do:
import requestlocal
def whatever():
environ = requestlocal.get_request()
and we'd just all agree on this singular implementation, because I don't
see any way around that.
--
Ian Bicking : ia...@colorstudy.com : http://blog.ianbicking.org