pasteob.ReqAddon

Sergey Schetinin

unread,

Mar 12, 2010, 12:47:34 AM3/12/10

to Paste Users, better-python

I'll do a series of emails describing some parts of pasteob that I
believe you should know about. I'll later use these to create some
docs / a website for pasteob.

Please give your feedback, I want to know if I've explained things
properly and will use your questions / corrections to improve the
docs.

Cross-posted to paste-users and better-python. Would this be also
appropriate in other mailing-lists?
http://groups.google.com/group/paste-users
http://groups.google.com/group/better-python

Today I want to describe a very small but EXTREMELY (yes, all caps)
useful utility from pasteob, but I'll have to start with the problem
description and a summary of its solutions I've encountered over the
years.

The problem is to make webapps decomposable, non-monolithic while
still passing data around. I'll explain this using an example of a
contact-form app. If we are implementing the entire thing as one big
function we would have something like this:

@webob_wrap
def cform_app(req):
name = req.params.get('name', req.cookies.get('name'))
email = ...
subject = ...
message = ...
return_to = req.params.get('return_to') or req.referrer
if message:
send_message(name, .....)
return Response("Thanks, go to %" % return_to)
form = form_tmpl % {'name': name, ....}
return Response(form)

What happens is that we extract some data from request, and if we have
enough data to send a message, we assemble that message and send it.
Otherwise we prepopulate the form w/ that data and return the form.
The important thing is that the same set of data (not trivially
derived from the request) can be used for two different things.

And at some point we will decide we would rather have something like this:

@webob_wrap
def cform_app(req):
name = req.params.get('name', req.cookies.get('name'))
...
if message:
return send_message_app
else:
return the_form_app

The problem, of course, is that now those apps have to extract all of
that request data again. How do we pass that data in?

Solution #1: Servlets

We can create a class to be instantiated once per request and
implement all those apps and subapps as methods, ending up with
something like this:

class CFormApp(object):
def __init__(self, req):
self.req = req
self.name = ....
self.email = ....

...

def send_message_app(self):
send_message(self.name, .....)
return Response("Thanks, go to %" % self.return_to)

That's not bad, but while we are now able to partially override
functionality in subclasses, the code is kinda ugly and not
decomposable. It gets worse with more complex apps and especially when
the data to be reused cannot be confined to a specific servlet.
Sometimes this data needs to be available to dispatch the request and
then the same data might be needed in the resulting app, which could
be a different servlet or, in fact, not a servlet at all.

Servlets are apparently The Java Way.

Ian mentioned he has webob.dec.wsgify.instantiate to make such classes
more easily exposed as regular WSGI apps, but I currently don't see it
in the webob repo.
http://blog.ianbicking.org/2009/05/22/webob-decorator/comment-page-1/#comment-109344

Solution #2: environ keys

As WSGI environ gets passed around anyway, we could store additional data there:

@webob_wrap
def cform_app(req):
name = req.environ['cform.name'] = ...
...

This is ugly, it doesn't scale if you need to pass around more data
and the decomposition is possible but way too fragile.

Solution #3: environ['webob.adhoc_attrs']

See http://bitbucket.org/ianb/webob/src/tip/webob/request.py#cl-1025

@wsgify
def cform_app(req):
req.cform_name = ...
if req.message:
return send_message_app
....

@wsgify
def send_message_app(req):
send_message(req.cform_name, ..)

This is an improvement over solution #2 because it's certainly tidier.
Accessing req attrs sure beats extracting data from environ, but most
of the issues are still there: you share the namespace for this
additional data with every other app that uses this system, you should
worry about collisions with standard attrs, and the decomposition is
still weak -- unless the request passed through the app that sets
certain attrs, those attrs will not be present.

Solution #4: Request subclasses

We could create a Request subclass like this:

class CFormReq(BaseRequest):
def __init__(self, environ):
super(CFormRequest, self).__init__(environ)
self.name = self.params.get('name', self.cookies.get('name'))
....

And then use it somewhat like this:

@wsgify(RequestClass=CFormReq)
def cform_app(req):
if req.message:
return send_message_app

@wsgify(RequestClass=CFormReq)
def send_message_app(req):
send_message(req.name, ..)

This is much better as far as namespaces go, but still seems like a
bit too much for such a simple need. Also, the data is extracted every
time the request is instantiated, which is inefficient. And sometimes
we might want to edit some of that data after it was extracted and
with this solution those changes will not persist once we leave this
specific app -- another app will recreate the request object and the
changes will not be there (unless we subclass Request and not
BaseRequest, which is not optimal either).

Solution #5: Passing data around as arguments

This is something that does not work w/ my @webob_wrap, but does w/
Ian's @wsgify (same applies to request cls overriding from previous
solution as well).

@wsgify
def cform_app(req):
name = ...
if req.message:
return send_message_app(req, name, ...)
....

@wsgify
def send_message_app(req, name, ...):
send_message(name, ..)

While this seems like a nice solution when you only need to pass an
argument or two, the decomposability is compromised (or outright gone)
and this approach doesn't scale -- at some point you will be passing
around so many arguments you'll start forgetting to pass some and to
address that will probably create a dedicated class to hold all that
data and pass around an instance of that class instead.

This still has the problem we had with solution #3 -- that class has
to be instantiated at some point and then passed around. The apps
expecting to get that object are not really pure WSGI anymore either
-- if they don't get that argument, they fail (this is a problem w/
most of the solutions above).

Ian Bicking

unread,

Mar 12, 2010, 2:40:08 PM3/12/10

to Sergey Schetinin, Paste Users, better-python

On Thu, Mar 11, 2010 at 11:47 PM, Sergey Schetinin <mal...@gmail.com> wrote:

Servlets are apparently The Java Way.

Ian mentioned he has webob.dec.wsgify.instantiate to make such classes
more easily exposed as regular WSGI apps, but I currently don't see it
in the webob repo.
http://blog.ianbicking.org/2009/05/22/webob-decorator/comment-page-1/#comment-109344

I finished it up on a plane ride, then found the result so complicated compared to the problem it was solving that I just gave up on it.

Solution #5: Passing data around as arguments

This is something that does not work w/ my @webob_wrap, but does w/
Ian's @wsgify (same applies to request cls overriding from previous
solution as well).

@wsgify
def cform_app(req):
name = ...
if req.message:
return send_message_app(req, name, ...)
....

@wsgify
def send_message_app(req, name, ...):
send_message(name, ..)

While this seems like a nice solution when you only need to pass an
argument or two, the decomposability is compromised (or outright gone)
and this approach doesn't scale -- at some point you will be passing
around so many arguments you'll start forgetting to pass some and to
address that will probably create a dedicated class to hold all that
data and pass around an instance of that class instead.

This still has the problem we had with solution #3 -- that class has
to be instantiated at some point and then passed around. The apps
expecting to get that object are not really pure WSGI anymore either
-- if they don't get that argument, they fail (this is a problem w/
most of the solutions above).

Well, I'm a little unsure if you can really consider the apps that separate anyway. These shared variables (name etc) are a kind of binding, at which point I'm not sure the WSGI abstraction will make them any less bound. I'm not confident that unbound apps at this level are possible, except perhaps with fancy composition frameworks like the Zope Component Architecture (which I think was created with just these kinds of use cases in mind, though I don't know if it ever was really *used* in this way). With the ZCA an "integrator" would be writing ZCML to wire these pieces together, and could manage some of the mismatches at that point.

@wsgify does support this specific use case at the scales I'm confident it is valid. At other scales... well, I suspect we need new and bigger metaphors, that could be built on top of WSGI, but wouldn't be WSGI themselves.

Other solutions

There are probably a few more approaches I didn't mention, surely
there are plenty framework-specific ones, some obscure ones (for
example this can be accomplished w/ Contextual), but unless I
reinvented something that already exists, nothing comes close to
convenience and simplicity of pasteob.ReqAddon.

Chris McDonough and some other Repoze people have been playing around with a framework/idea called Marco, that includes something similar to Contextual as well as a small number of other pieces. You might want to copy them on these emails too, as the basic motivations are the same.

--
Ian Bicking | http://blog.ianbicking.org | http://twitter.com/ianbicking

Sergey Schetinin

unread,

Mar 12, 2010, 5:23:37 PM3/12/10

to Ian Bicking, Paste Users, better-python, Chris McDonough

I believe it is possible and claim that it is far easier to do than
appears to be. This expectation of complexity, I believe, is also one
of the primary reasons frameworks are that popular -- the only way to
claim they save time is to claim that the alternative is even more
complex, which one discovers to be not true.

So I want to stress that I am not advocating for more complexity in
the name of better decomposition. I'm just saying simple solutions
usually work better than complex ones. Also, approaching the problems
one at a time leads to better solutions as well. At some point one
might come with a more general one, but the correct way to go is to
start with the specific cases and generalize once that comes
naturally.

As to how separate the apps are anyway, sometimes they are not that
*independent*, but still need to be *separable*. For example, let's
say we have apps A and B that depend on each other. At some point we
(or someone reusing our code) discovers a need for another layer of
dispatching between them. Now, you cannot expect that dispatching
layer to keep passing your arguments around, so some of the solutions
will be able to pass the data across this layer and some will not. I
hope you agree this is a reasonable use-case.

Another example: lets say we decide add a spellchecker to the form, so
that submissions with incorrect grammar can't go through (this is a
bit of a stretch, but I don't want to come up with a completely new
example). We still want to keep our old cform_app, but add another
one, checked_cform_app that goes something like this:

@wsgify
def checked_cform_app(req):
name = ...
message = ...
if message and spellcheck(message):
return send_message_app(req, name, ...)

send_message_app is still "bound", but you see the problem -- we are
forced to have a single point of entry or we'll have to repeat
ourselves or invent something to extract things just once and we're
back where we started.

Also consider a case when we want to expose send_message_app directly,
say as a way of accepting form submissions as a web-service, why not
just let it be functional without the additional args?

> @wsgify does support this specific use case at the scales I'm confident it
> is valid. At other scales... well, I suspect we need new and bigger
> metaphors, that could be built on top of WSGI, but wouldn't be WSGI
> themselves.
>
>>
>> Other solutions
>>
>> There are probably a few more approaches I didn't mention, surely
>> there are plenty framework-specific ones, some obscure ones (for
>> example this can be accomplished w/ Contextual), but unless I
>> reinvented something that already exists, nothing comes close to
>> convenience and simplicity of pasteob.ReqAddon.
>>
>
> Chris McDonough and some other Repoze people have been playing around with a
> framework/idea called Marco, that includes something similar to Contextual
> as well as a small number of other pieces. You might want to copy them on
> these emails too, as the basic motivations are the same.

OK, I'll CC this to Chris as well.

Thanks for the suggestion, I'll have a look at it. (I'm assuming this
is the main Marco repo: http://bitbucket.org/chrism/marco/)

Also, Crosscuts by PJE is kinda related: http://svn.eby-sarna.com/Crosscuts/

ReqAddon is way less magic though.

Ian Bicking

unread,

Mar 12, 2010, 6:14:38 PM3/12/10

to Sergey Schetinin, Paste Users, better-python, Chris McDonough

Sure... I can't really feel confident one way or the other until I see it in action. You should write a CMS to prove the point ;)

So I want to stress that I am not advocating for more complexity in
the name of better decomposition. I'm just saying simple solutions
usually work better than complex ones. Also, approaching the problems
one at a time leads to better solutions as well. At some point one
might come with a more general one, but the correct way to go is to
start with the specific cases and generalize once that comes
naturally.

As to how separate the apps are anyway, sometimes they are not that
*independent*, but still need to be *separable*. For example, let's
say we have apps A and B that depend on each other. At some point we
(or someone reusing our code) discovers a need for another layer of
dispatching between them. Now, you cannot expect that dispatching
layer to keep passing your arguments around, so some of the solutions
will be able to pass the data across this layer and some will not. I
hope you agree this is a reasonable use-case.

I'm not sure that dispatching is a good enough justification. Changing dispatching seems generally hard, because everything tends to link to everything else. Even if you use link generation, you have to generate the links based on *something* -- substituting controller names and variables for URLs doesn't feel very different to me.

It would improve testing though, as you can test the components more easily with less mocking. Chris has noted that a large portion of what people use the ZCA for is this kind of testing fixture.

Another example: lets say we decide add a spellchecker to the form, so
that submissions with incorrect grammar can't go through (this is a
bit of a stretch, but I don't want to come up with a completely new
example). We still want to keep our old cform_app, but add another
one, checked_cform_app that goes something like this:

@wsgify
def checked_cform_app(req):
name = ...
message = ...
if message and spellcheck(message):

return send_message_app(req, name, ...)

send_message_app is still "bound", but you see the problem -- we are
forced to have a single point of entry or we'll have to repeat
ourselves or invent something to extract things just once and we're
back where we started.

OK, so in this case you are thinking about decoration. And maybe... splitting code in the middle. Like, separate out how name and message are extracted, and then putting some code between that and how name and message are used? I don't think I'm really following the purpose now.

Also consider a case when we want to expose send_message_app directly,
say as a way of accepting form submissions as a web-service, why not
just let it be functional without the additional args?

[...]

Thanks for the suggestion, I'll have a look at it. (I'm assuming this
is the main Marco repo: http://bitbucket.org/chrism/marco/)

Yes. I'd think of it as an exploration. I think the three main parts are:

* Events

* Context (without forcing everything into the request context, but otherwise working very similarly)

* Application configuration (more along the lines of stuff like dispatch, not deployment configuration)

Sergey Schetinin

unread,

Mar 12, 2010, 6:55:07 PM3/12/10

to Ian Bicking, Paste Users, better-python, Chris McDonough

On 13 March 2010 01:14, Ian Bicking <ia...@colorstudy.com> wrote:
> On Fri, Mar 12, 2010 at 4:23 PM, Sergey Schetinin <mal...@gmail.com> wrote:
>> As to how separate the apps are anyway, sometimes they are not that
>> *independent*, but still need to be *separable*. For example, let's
>> say we have apps A and B that depend on each other. At some point we
>> (or someone reusing our code) discovers a need for another layer of
>> dispatching between them. Now, you cannot expect that dispatching
>> layer to keep passing your arguments around, so some of the solutions
>> will be able to pass the data across this layer and some will not. I
>> hope you agree this is a reasonable use-case.
>
> I'm not sure that dispatching is a good enough justification. Changing
> dispatching seems generally hard, because everything tends to link to
> everything else. Even if you use link generation, you have to generate the
> links based on *something* -- substituting controller names and variables
> for URLs doesn't feel very different to me.

A lot of that linking goes on in the template, so depending on how one
does templating, this can be less of a problem. For the cases when the
app needs more knowledge about where some other apps are located I
usually solve it by having the apps implemented as classes with this
kind of configuration passed in at creation time.

For example I have a licensing web-service that needs to query an
order-tracking webservice -- what I have is a LicensingApp what is
created as LicensingApp(order_tracking_ws_url). There's no implicit
configuration and it works great. And if the app needs not to query
but *link* somewhere, everything is exactly the same -- just pass some
urls around. It's also nice because once one needs to scale beyond one
server everything is ready for it.

When the links can have different formats, the most straightforward
solution works just fine -- define an interface (just document, not
declare w/ ZCA or something) to generate those links, implement that
interface and pass that implementation to the app that needs it. Given
it's well-defined, this "replaceability" will end up being used more
often than not.

Sometimes the same thing needs to be passed to a lot of instances and
for cases like that I've used Contextual a couple times. There are a
few nice things about it for something like this. First is that you
get interface checking pretty much for free. Second is that it still
can be customizable on a per-instance basis. To do this, one should
not access the service directly, but still accept arguments to
constructors and use that. But the constructor can define the default
value of the argument to be *the service itself*, that is: the class
delegate. That way, if you don't pass any custom objects, the service
will end up being acquired from context when used, but you still can
pass specific instances during configuration. And when you do that,
you don't even need the contextual state management.

One specific case that I use this for is "Sendmail" service -- the
apps just want to send some mail and don't care if it goes through
SMTP, /bin/sendmail or GAE APIs. So I configure it only once and it
just works for everything in that process. I would not recommend using
this approach for more app-specific cases.

> It would improve testing though, as you can test the components more easily
> with less mocking. Chris has noted that a large portion of what people use
> the ZCA for is this kind of testing fixture.
>
>>
>> Another example: lets say we decide add a spellchecker to the form, so
>> that submissions with incorrect grammar can't go through (this is a
>> bit of a stretch, but I don't want to come up with a completely new
>> example). We still want to keep our old cform_app, but add another
>> one, checked_cform_app that goes something like this:
>>
>> @wsgify
>> def checked_cform_app(req):
>> name = ...
>> message = ...
>> if message and spellcheck(message):
>> return send_message_app(req, name, ...)
>>
>> send_message_app is still "bound", but you see the problem -- we are
>> forced to have a single point of entry or we'll have to repeat
>> ourselves or invent something to extract things just once and we're
>> back where we started.
>
> OK, so in this case you are thinking about decoration. And maybe...
> splitting code in the middle. Like, separate out how name and message are
> extracted, and then putting some code between that and how name and message
> are used? I don't think I'm really following the purpose now.

Yes, what you've just said, and most things mentioned before as well.
The purpose is having an elegant solution for extracting some data
that may be used in multiple places. Being able to persistently edit
that data. Essentially having something as convenient and unfussy as
local namespace, which is not *that* local.

Sergey Schetinin

unread,

Mar 13, 2010, 12:02:52 AM3/13/10

to Ian Bicking, Paste Users, better-python, Chris McDonough

So, let's pick up where we left. (Here I'm repurposing some of the
examples from a doctest)

What ReqAddon does is attach a namespace to a request (actually to
environ). Here's what this means:

>>> from pasteob import *
>>> req = Request.blank('/')
>>> ReqAddon(req) is ReqAddon(req)
True

It works exactly the same when called with environ:

>>> ReqAddon(req) is ReqAddon(req.environ)
True

See, no matter how many times we call it, we get the same instance.
This fulfills some of our requirements:

* Should require no additional wrappers -- usable in any WSGI app, anywhere

* Changes to extracted data should be persistent

We could just set attributes on those instances, but that would make
it identical to "webob.adhoc_attrs". That is NOT the intended way of
using it. Instead you should subclass ReqAddon to get your own
namespace.

>>> class TestAddon1(ReqAddon):
... pass
...
>>> TestAddon1(req) is TestAddon1(req)
True
>>> TestAddon1(req) is ReqAddon(req)
False

This gets us:

* Should be namespace-capable

Now, lets do some data extraction. By default ReqAddon gets .env and
.req attributes storing the environment it's bound to and the
corresponding BaseRequest. If you don't want the request object, you
can use EnvAddon baseclass instead. To extract some data from request
when the addon is first created, redefine the init method. (Not
__init__ simply because this way you are not required to accept
environ / request argument. It is better this way for many reasons,
mostly because it doesn't make things appear something they are not.)

class CFormData(ReqAddon):
def init(self):
self.req.charset = 'utf8'
req = self.req
p = req.params
self.name = p.get('name', req.cookies.get('name'))
...

@webob_wrap
def cform_app(req):
if CFormData(req).message:

return send_message_app
else:
return the_form_app

@webob_wrap
def send_message_app(req):
form = CFormData(req)
send_message(form.name, ..)
...

Or even something like this:

@webob_wrap
def send_message_app(req):
CFormData(req).send_to(...)
...

Note that init gets called only once.

>>> class PrintAddon(ReqAddon):
... def init(self):
... print 'created', self
...
>>> PrintAddon(req) is PrintAddon(req)
created <PrintAddon object at ...>
True

Addons persist even if you don't keep a reference:

>>> pa = PrintAddon(req); del pa

This wins us:

* Should only extract data from request once

* Should not create order dependencies in user code (unlike argument
passing for ex.)

* Should scale to as many fields as needed

You can also define methods and descriptors to extract more data on
request, cache it in the object etc.

* Should be capable of extracting the data lazily (on first access)

But that is not all, you can pass additional arguments to the subclasses:

>>> ReqAddon(req, 1)
Traceback (most recent call last):
...
TypeError: init() takes exactly 1 argument (2 given)

>>> class TestAddon2(ReqAddon):
... def init(self, arg):
... self.arg = arg
...
>>> TestAddon2(req, 1)
<TestAddon2 object at ...>
>>> TestAddon2(req, 1).arg
1
>>> TestAddon2(req, 1) is TestAddon2(req, 1)
True
>>> TestAddon2(req, 1) is TestAddon2(req, 2)
False

Changes are persistent:

>>> TestAddon2(req, 1).arg = 100
>>> TestAddon2(req, 1).arg
100

(There are quirks with default parameter values, check out the doctest
for more on that.)

With this we also have:

* Ideally, the namespaces should be programmatically creatable, so
that you could, for example, reuse most of the code to handle data
from TWO contact forms on the page.

All what is left is:

* As little code as possible

I don't think you can claim the user code could be any shorter and
neither could be the implementation. See for
yourself:http://bitbucket.org/mlk/pasteob/src/tip/pasteob/__init__.py#cl-400

Other

http://bitbucket.org/mlk/pasteob/src/tip/tests/ReqAddon.txt doctest

http://pypi.python.org/pypi/AddOns
AddOns is a package that gave me the idea. I highly recommended you to
check it out, but note that it does not work as smoothly with req /
environ.

Also note that ReqAddon instances create a circular reference to the
environ. If you really don't want this, override __init__ and don't
store references to env/req in attrs.

Sergey Schetinin

unread,

Mar 13, 2010, 2:22:47 AM3/13/10

to Paste Users, better-python, Chris McDonough

I was thinking about the quirky cases for ReqAddon:
1. when the request / env is copied, existing addons are carried, so
we can either accept this or handle this case specially (currently
there's a check for this).
2. most of the time the circular reference is not necessary -- most
of the time env is only needed during init
3. we can store a weakref to req, and no direct reference to env,
which will break the circular reference as well.

4. maybe EnvAddon is unnecessary?
5. is it better to handle default arguments or should the docs just
say "don't do it"?

So now I think maybe it should go like this:

class ReqAddon(object):
class __metaclass__(type):
def __call__(cls, env, *args):
if type(env) is not dict:
env = env.environ
key = (cls, args)
if key in env:
r = env[key]
if r._env_id == id(env):
return r
# this is a copied environ, we want a new addon instance
del env[key]
req = Request(env)
inst = type.__call__(cls, req, *args)
inst._env_id = id(env)
return env.setdefault(key, inst)

def __init__(self, req):
pass

Sergey Schetinin

unread,

Mar 13, 2010, 3:31:28 AM3/13/10

to Paste Users, better-python, Chris McDonough

Done: http://bitbucket.org/mlk/pasteob/changeset/93f43634ce22/

Reply all

Reply to author

Forward