Please give your feedback, I want to know if I've explained things
properly and will use your questions / corrections to improve the
docs.
Cross-posted to paste-users and better-python. Would this be also
appropriate in other mailing-lists?
http://groups.google.com/group/paste-users
http://groups.google.com/group/better-python
Today I want to describe a very small but EXTREMELY (yes, all caps)
useful utility from pasteob, but I'll have to start with the problem
description and a summary of its solutions I've encountered over the
years.
The problem is to make webapps decomposable, non-monolithic while
still passing data around. I'll explain this using an example of a
contact-form app. If we are implementing the entire thing as one big
function we would have something like this:
@webob_wrap
def cform_app(req):
name = req.params.get('name', req.cookies.get('name'))
email = ...
subject = ...
message = ...
return_to = req.params.get('return_to') or req.referrer
if message:
send_message(name, .....)
return Response("Thanks, go to %" % return_to)
form = form_tmpl % {'name': name, ....}
return Response(form)
What happens is that we extract some data from request, and if we have
enough data to send a message, we assemble that message and send it.
Otherwise we prepopulate the form w/ that data and return the form.
The important thing is that the same set of data (not trivially
derived from the request) can be used for two different things.
And at some point we will decide we would rather have something like this:
@webob_wrap
def cform_app(req):
name = req.params.get('name', req.cookies.get('name'))
...
if message:
return send_message_app
else:
return the_form_app
The problem, of course, is that now those apps have to extract all of
that request data again. How do we pass that data in?
Solution #1: Servlets
We can create a class to be instantiated once per request and
implement all those apps and subapps as methods, ending up with
something like this:
class CFormApp(object):
def __init__(self, req):
self.req = req
self.name = ....
self.email = ....
...
def send_message_app(self):
send_message(self.name, .....)
return Response("Thanks, go to %" % self.return_to)
That's not bad, but while we are now able to partially override
functionality in subclasses, the code is kinda ugly and not
decomposable. It gets worse with more complex apps and especially when
the data to be reused cannot be confined to a specific servlet.
Sometimes this data needs to be available to dispatch the request and
then the same data might be needed in the resulting app, which could
be a different servlet or, in fact, not a servlet at all.
Servlets are apparently The Java Way.
Ian mentioned he has webob.dec.wsgify.instantiate to make such classes
more easily exposed as regular WSGI apps, but I currently don't see it
in the webob repo.
http://blog.ianbicking.org/2009/05/22/webob-decorator/comment-page-1/#comment-109344
Solution #2: environ keys
As WSGI environ gets passed around anyway, we could store additional data there:
@webob_wrap
def cform_app(req):
name = req.environ['cform.name'] = ...
...
This is ugly, it doesn't scale if you need to pass around more data
and the decomposition is possible but way too fragile.
Solution #3: environ['webob.adhoc_attrs']
See http://bitbucket.org/ianb/webob/src/tip/webob/request.py#cl-1025
@wsgify
def cform_app(req):
req.cform_name = ...
if req.message:
return send_message_app
....
@wsgify
def send_message_app(req):
send_message(req.cform_name, ..)
This is an improvement over solution #2 because it's certainly tidier.
Accessing req attrs sure beats extracting data from environ, but most
of the issues are still there: you share the namespace for this
additional data with every other app that uses this system, you should
worry about collisions with standard attrs, and the decomposition is
still weak -- unless the request passed through the app that sets
certain attrs, those attrs will not be present.
Solution #4: Request subclasses
We could create a Request subclass like this:
class CFormReq(BaseRequest):
def __init__(self, environ):
super(CFormRequest, self).__init__(environ)
self.name = self.params.get('name', self.cookies.get('name'))
....
And then use it somewhat like this:
@wsgify(RequestClass=CFormReq)
def cform_app(req):
if req.message:
return send_message_app
@wsgify(RequestClass=CFormReq)
def send_message_app(req):
send_message(req.name, ..)
This is much better as far as namespaces go, but still seems like a
bit too much for such a simple need. Also, the data is extracted every
time the request is instantiated, which is inefficient. And sometimes
we might want to edit some of that data after it was extracted and
with this solution those changes will not persist once we leave this
specific app -- another app will recreate the request object and the
changes will not be there (unless we subclass Request and not
BaseRequest, which is not optimal either).
Solution #5: Passing data around as arguments
This is something that does not work w/ my @webob_wrap, but does w/
Ian's @wsgify (same applies to request cls overriding from previous
solution as well).
@wsgify
def cform_app(req):
name = ...
if req.message:
return send_message_app(req, name, ...)
....
@wsgify
def send_message_app(req, name, ...):
send_message(name, ..)
While this seems like a nice solution when you only need to pass an
argument or two, the decomposability is compromised (or outright gone)
and this approach doesn't scale -- at some point you will be passing
around so many arguments you'll start forgetting to pass some and to
address that will probably create a dedicated class to hold all that
data and pass around an instance of that class instead.
This still has the problem we had with solution #3 -- that class has
to be instantiated at some point and then passed around. The apps
expecting to get that object are not really pure WSGI anymore either
-- if they don't get that argument, they fail (this is a problem w/
most of the solutions above).
Other solutions
There are probably a few more approaches I didn't mention, surely
there are plenty framework-specific ones, some obscure ones (for
example this can be accomplished w/ Contextual), but unless I
reinvented something that already exists, nothing comes close to
convenience and simplicity of pasteob.ReqAddon.
Let's summarize what the ideal solution should be like:
* Should be namespace-capable
* Should require no additional wrappers -- usable in any WSGI app, anywhere
* Should only extract data from request once
* Changes to extracted data should be persistent
* Should not create order dependencies in user code (unlike argument
passing for ex.)
* As little code as possible
* Should scale to as many fields as needed
* Should be capable of extracting the data lazily (on first access)
* Ideally, the namespaces should be programmatically creatable, so
that you could, for example, reuse most of the code to handle data
from TWO contact forms on the page.
[to be continued]
--
Best Regards,
Sergey Schetinin
http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter
Servlets are apparently The Java Way.
Ian mentioned he has webob.dec.wsgify.instantiate to make such classes
more easily exposed as regular WSGI apps, but I currently don't see it
in the webob repo.
http://blog.ianbicking.org/2009/05/22/webob-decorator/comment-page-1/#comment-109344
Solution #5: Passing data around as arguments
This is something that does not work w/ my @webob_wrap, but does w/
Ian's @wsgify (same applies to request cls overriding from previous
solution as well).
@wsgify
def cform_app(req):
name = ...
if req.message:
return send_message_app(req, name, ...)
....
@wsgify
def send_message_app(req, name, ...):
send_message(name, ..)
While this seems like a nice solution when you only need to pass an
argument or two, the decomposability is compromised (or outright gone)
and this approach doesn't scale -- at some point you will be passing
around so many arguments you'll start forgetting to pass some and to
address that will probably create a dedicated class to hold all that
data and pass around an instance of that class instead.
This still has the problem we had with solution #3 -- that class has
to be instantiated at some point and then passed around. The apps
expecting to get that object are not really pure WSGI anymore either
-- if they don't get that argument, they fail (this is a problem w/
most of the solutions above).
Other solutions
There are probably a few more approaches I didn't mention, surely
there are plenty framework-specific ones, some obscure ones (for
example this can be accomplished w/ Contextual), but unless I
reinvented something that already exists, nothing comes close to
convenience and simplicity of pasteob.ReqAddon.
I believe it is possible and claim that it is far easier to do than
appears to be. This expectation of complexity, I believe, is also one
of the primary reasons frameworks are that popular -- the only way to
claim they save time is to claim that the alternative is even more
complex, which one discovers to be not true.
So I want to stress that I am not advocating for more complexity in
the name of better decomposition. I'm just saying simple solutions
usually work better than complex ones. Also, approaching the problems
one at a time leads to better solutions as well. At some point one
might come with a more general one, but the correct way to go is to
start with the specific cases and generalize once that comes
naturally.
As to how separate the apps are anyway, sometimes they are not that
*independent*, but still need to be *separable*. For example, let's
say we have apps A and B that depend on each other. At some point we
(or someone reusing our code) discovers a need for another layer of
dispatching between them. Now, you cannot expect that dispatching
layer to keep passing your arguments around, so some of the solutions
will be able to pass the data across this layer and some will not. I
hope you agree this is a reasonable use-case.
Another example: lets say we decide add a spellchecker to the form, so
that submissions with incorrect grammar can't go through (this is a
bit of a stretch, but I don't want to come up with a completely new
example). We still want to keep our old cform_app, but add another
one, checked_cform_app that goes something like this:
@wsgify
def checked_cform_app(req):
name = ...
message = ...
if message and spellcheck(message):
return send_message_app(req, name, ...)
send_message_app is still "bound", but you see the problem -- we are
forced to have a single point of entry or we'll have to repeat
ourselves or invent something to extract things just once and we're
back where we started.
Also consider a case when we want to expose send_message_app directly,
say as a way of accepting form submissions as a web-service, why not
just let it be functional without the additional args?
> @wsgify does support this specific use case at the scales I'm confident it
> is valid. At other scales... well, I suspect we need new and bigger
> metaphors, that could be built on top of WSGI, but wouldn't be WSGI
> themselves.
>
>>
>> Other solutions
>>
>> There are probably a few more approaches I didn't mention, surely
>> there are plenty framework-specific ones, some obscure ones (for
>> example this can be accomplished w/ Contextual), but unless I
>> reinvented something that already exists, nothing comes close to
>> convenience and simplicity of pasteob.ReqAddon.
>>
>
> Chris McDonough and some other Repoze people have been playing around with a
> framework/idea called Marco, that includes something similar to Contextual
> as well as a small number of other pieces. You might want to copy them on
> these emails too, as the basic motivations are the same.
OK, I'll CC this to Chris as well.
Thanks for the suggestion, I'll have a look at it. (I'm assuming this
is the main Marco repo: http://bitbucket.org/chrism/marco/)
Also, Crosscuts by PJE is kinda related: http://svn.eby-sarna.com/Crosscuts/
ReqAddon is way less magic though.
So I want to stress that I am not advocating for more complexity in
the name of better decomposition. I'm just saying simple solutions
usually work better than complex ones. Also, approaching the problems
one at a time leads to better solutions as well. At some point one
might come with a more general one, but the correct way to go is to
start with the specific cases and generalize once that comes
naturally.
As to how separate the apps are anyway, sometimes they are not that
*independent*, but still need to be *separable*. For example, let's
say we have apps A and B that depend on each other. At some point we
(or someone reusing our code) discovers a need for another layer of
dispatching between them. Now, you cannot expect that dispatching
layer to keep passing your arguments around, so some of the solutions
will be able to pass the data across this layer and some will not. I
hope you agree this is a reasonable use-case.
Another example: lets say we decide add a spellchecker to the form, so
that submissions with incorrect grammar can't go through (this is a
bit of a stretch, but I don't want to come up with a completely new
example). We still want to keep our old cform_app, but add another
one, checked_cform_app that goes something like this:
@wsgify
def checked_cform_app(req):
name = ...
message = ...
if message and spellcheck(message):
return send_message_app(req, name, ...)send_message_app is still "bound", but you see the problem -- we are
forced to have a single point of entry or we'll have to repeat
ourselves or invent something to extract things just once and we're
back where we started.
Also consider a case when we want to expose send_message_app directly,
say as a way of accepting form submissions as a web-service, why not
just let it be functional without the additional args?
Thanks for the suggestion, I'll have a look at it. (I'm assuming this
is the main Marco repo: http://bitbucket.org/chrism/marco/)
A lot of that linking goes on in the template, so depending on how one
does templating, this can be less of a problem. For the cases when the
app needs more knowledge about where some other apps are located I
usually solve it by having the apps implemented as classes with this
kind of configuration passed in at creation time.
For example I have a licensing web-service that needs to query an
order-tracking webservice -- what I have is a LicensingApp what is
created as LicensingApp(order_tracking_ws_url). There's no implicit
configuration and it works great. And if the app needs not to query
but *link* somewhere, everything is exactly the same -- just pass some
urls around. It's also nice because once one needs to scale beyond one
server everything is ready for it.
When the links can have different formats, the most straightforward
solution works just fine -- define an interface (just document, not
declare w/ ZCA or something) to generate those links, implement that
interface and pass that implementation to the app that needs it. Given
it's well-defined, this "replaceability" will end up being used more
often than not.
Sometimes the same thing needs to be passed to a lot of instances and
for cases like that I've used Contextual a couple times. There are a
few nice things about it for something like this. First is that you
get interface checking pretty much for free. Second is that it still
can be customizable on a per-instance basis. To do this, one should
not access the service directly, but still accept arguments to
constructors and use that. But the constructor can define the default
value of the argument to be *the service itself*, that is: the class
delegate. That way, if you don't pass any custom objects, the service
will end up being acquired from context when used, but you still can
pass specific instances during configuration. And when you do that,
you don't even need the contextual state management.
One specific case that I use this for is "Sendmail" service -- the
apps just want to send some mail and don't care if it goes through
SMTP, /bin/sendmail or GAE APIs. So I configure it only once and it
just works for everything in that process. I would not recommend using
this approach for more app-specific cases.
> It would improve testing though, as you can test the components more easily
> with less mocking. Chris has noted that a large portion of what people use
> the ZCA for is this kind of testing fixture.
>
>>
>> Another example: lets say we decide add a spellchecker to the form, so
>> that submissions with incorrect grammar can't go through (this is a
>> bit of a stretch, but I don't want to come up with a completely new
>> example). We still want to keep our old cform_app, but add another
>> one, checked_cform_app that goes something like this:
>>
>> @wsgify
>> def checked_cform_app(req):
>> name = ...
>> message = ...
>> if message and spellcheck(message):
>> return send_message_app(req, name, ...)
>>
>> send_message_app is still "bound", but you see the problem -- we are
>> forced to have a single point of entry or we'll have to repeat
>> ourselves or invent something to extract things just once and we're
>> back where we started.
>
> OK, so in this case you are thinking about decoration. And maybe...
> splitting code in the middle. Like, separate out how name and message are
> extracted, and then putting some code between that and how name and message
> are used? I don't think I'm really following the purpose now.
Yes, what you've just said, and most things mentioned before as well.
The purpose is having an elegant solution for extracting some data
that may be used in multiple places. Being able to persistently edit
that data. Essentially having something as convenient and unfussy as
local namespace, which is not *that* local.
What ReqAddon does is attach a namespace to a request (actually to
environ). Here's what this means:
>>> from pasteob import *
>>> req = Request.blank('/')
>>> ReqAddon(req) is ReqAddon(req)
True
It works exactly the same when called with environ:
>>> ReqAddon(req) is ReqAddon(req.environ)
True
See, no matter how many times we call it, we get the same instance.
This fulfills some of our requirements:
* Should require no additional wrappers -- usable in any WSGI app, anywhere
* Changes to extracted data should be persistent
We could just set attributes on those instances, but that would make
it identical to "webob.adhoc_attrs". That is NOT the intended way of
using it. Instead you should subclass ReqAddon to get your own
namespace.
>>> class TestAddon1(ReqAddon):
... pass
...
>>> TestAddon1(req) is TestAddon1(req)
True
>>> TestAddon1(req) is ReqAddon(req)
False
This gets us:
* Should be namespace-capable
Now, lets do some data extraction. By default ReqAddon gets .env and
.req attributes storing the environment it's bound to and the
corresponding BaseRequest. If you don't want the request object, you
can use EnvAddon baseclass instead. To extract some data from request
when the addon is first created, redefine the init method. (Not
__init__ simply because this way you are not required to accept
environ / request argument. It is better this way for many reasons,
mostly because it doesn't make things appear something they are not.)
class CFormData(ReqAddon):
def init(self):
self.req.charset = 'utf8'
req = self.req
p = req.params
self.name = p.get('name', req.cookies.get('name'))
...
@webob_wrap
def cform_app(req):
if CFormData(req).message:
return send_message_app
else:
return the_form_app
@webob_wrap
def send_message_app(req):
form = CFormData(req)
send_message(form.name, ..)
...
Or even something like this:
@webob_wrap
def send_message_app(req):
CFormData(req).send_to(...)
...
Note that init gets called only once.
>>> class PrintAddon(ReqAddon):
... def init(self):
... print 'created', self
...
>>> PrintAddon(req) is PrintAddon(req)
created <PrintAddon object at ...>
True
Addons persist even if you don't keep a reference:
>>> pa = PrintAddon(req); del pa
This wins us:
* Should only extract data from request once
* Should not create order dependencies in user code (unlike argument
passing for ex.)
* Should scale to as many fields as needed
You can also define methods and descriptors to extract more data on
request, cache it in the object etc.
* Should be capable of extracting the data lazily (on first access)
But that is not all, you can pass additional arguments to the subclasses:
>>> ReqAddon(req, 1)
Traceback (most recent call last):
...
TypeError: init() takes exactly 1 argument (2 given)
>>> class TestAddon2(ReqAddon):
... def init(self, arg):
... self.arg = arg
...
>>> TestAddon2(req, 1)
<TestAddon2 object at ...>
>>> TestAddon2(req, 1).arg
1
>>> TestAddon2(req, 1) is TestAddon2(req, 1)
True
>>> TestAddon2(req, 1) is TestAddon2(req, 2)
False
Changes are persistent:
>>> TestAddon2(req, 1).arg = 100
>>> TestAddon2(req, 1).arg
100
(There are quirks with default parameter values, check out the doctest
for more on that.)
With this we also have:
* Ideally, the namespaces should be programmatically creatable, so
that you could, for example, reuse most of the code to handle data
from TWO contact forms on the page.
All what is left is:
* As little code as possible
I don't think you can claim the user code could be any shorter and
neither could be the implementation. See for
yourself:http://bitbucket.org/mlk/pasteob/src/tip/pasteob/__init__.py#cl-400
Other
http://bitbucket.org/mlk/pasteob/src/tip/tests/ReqAddon.txt doctest
http://pypi.python.org/pypi/AddOns
AddOns is a package that gave me the idea. I highly recommended you to
check it out, but note that it does not work as smoothly with req /
environ.
Also note that ReqAddon instances create a circular reference to the
environ. If you really don't want this, override __init__ and don't
store references to env/req in attrs.
4. maybe EnvAddon is unnecessary?
5. is it better to handle default arguments or should the docs just
say "don't do it"?
So now I think maybe it should go like this:
class ReqAddon(object):
class __metaclass__(type):
def __call__(cls, env, *args):
if type(env) is not dict:
env = env.environ
key = (cls, args)
if key in env:
r = env[key]
if r._env_id == id(env):
return r
# this is a copied environ, we want a new addon instance
del env[key]
req = Request(env)
inst = type.__call__(cls, req, *args)
inst._env_id = id(env)
return env.setdefault(key, inst)
def __init__(self, req):
pass