So, it's about time that WebOb came to 1.0. For 1.0 I'd like to
settle the API as much as possible. But I'd also like to move further
to getting WebOb used for more frameworks. I don't expect that to
happen before 1.0, but if there are API changes that will make that
easier later, then maybe we can get those in.
While I haven't tracked ongoing changes to frameworks, I did put
together the differences I am aware of in APIs here:
http://pythonpaste.org/webob/differences.html
Some of them are fairly trivial, and could be managed through
subclassing (e.g., req.raw_post_data vs. req.body -- semantically
identical, just different names).
Are there API changes that would help people consider WebOb for other
frameworks? The main ones I can think of is req.FILES, separating out
file uploads from other POST fields. Also then there's the issue of
what kind of object represents files. The finer details of individual
objects are also important, things like the API of req.GET/req.POST
(which are views on ordered dictionaries, and are represented somewhat
differently in different frameworks).
Also I'm planning on introducing a BaseRequest (and *maybe*
BaseResponse) class, that removes some functionality. Specifically
for Repoze they'd like to remove __getattr__ and __setattr__ (which
has some performance implications), and maybe other things are
possible (though removing writers is infeasible, IMHO, as read and
write access are not easily separated, and it would require too much
code duplication).
(Incidentally WebOb is now on bitbucket: http://bitbucket.org/ianb/webob/)
--
Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
FTR, after thinking about it, I'm not even sure BaseRequest is necessary for
this purpose. This seems to work too (at least it gets previously visible
setattr/getattr stuff out of the profiling info):
class Request(WebobRequest):
__setattr__ = object.__setattr__
__getattr__ = object.__getattribute__
__delattr__ = object.__delattr__
I'd like to see webob/__init__.py split into a few modules (at least
request, response). And not because it would change how it's used in
any way but to make reading and editing source easier. I think
__init__.py is just way too big right now.
Some other minor changes I'd like to discuss are
* remove / move to a subclass support for req/resp cross-referencing.
* ubody alias for unicode_body
* upath_info for decoded path_info and similar
--
Best Regards,
Sergey Schetinin
http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter
On Thu, Oct 29, 2009 at 4:22 PM, Sergey Schetinin <mal...@gmail.com> wrote:
> I'd like to see webob/__init__.py split into a few modules (at least
> request, response). And not because it would change how it's used in
> any way but to make reading and editing source easier. I think
> __init__.py is just way too big right now.
I agree, it's way too large, and these can be imported into __init__
so it's not a visible change. Probably the descriptor stuff (the
first 500ish lines, including the converters) can go in a separate
module, along with other ones for request and response. That seems
like a reasonable partitioning, and covers most of what is in
__init__.py now.
> Some other minor changes I'd like to discuss are
> * remove / move to a subclass support for req/resp cross-referencing.
> * ubody alias for unicode_body
> * upath_info for decoded path_info and similar
So:
* I'm not sure what you mean about req/resp cross-referencing. You
mean req.response and resp.request? I'm not sure how this would
particularly go into a subclass. Also, they are pretty optional. Is
there a particular issue you have with them?
* ubody is fine.
* I agree on upath_info (and uscript_name). If you want to put those
in, please do. I assume they should always use UTF8, and use the
unicode_errors attribute for handling errors.
I'd split it into descriptors, datetime_utils, request and response.
Do you mind if I do this part?
>> Some other minor changes I'd like to discuss are
>> * remove / move to a subclass support for req/resp cross-referencing.
>> * ubody alias for unicode_body
>> * upath_info for decoded path_info and similar
>
> So:
>
> * I'm not sure what you mean about req/resp cross-referencing. You
> mean req.response and resp.request? I'm not sure how this would
> particularly go into a subclass. Also, they are pretty optional. Is
> there a particular issue you have with them?
I often reuse Response objects (it works fine as long app_iter is
reusable, and that can be ensured by touching .body). Sometimes I
reuse Reponses returned by req.get_response. That means as long as I
keep that response I'm also keeping a request that was used to
generate it. That's not too big a deal, but feels sloppy. WebOb is
quite low-level generally and keeping a reference to request is quite
"frameworkey" IMO, so, given how easy it is to do that in a subclass,
I think it would be a good change.
req.response however is only used in webob.dec, so as long normal
get_response doesn't add that reference (which is circular) I have no
objections to that one.
> * ubody is fine.
> * I agree on upath_info (and uscript_name). If you want to put those
> in, please do. I assume they should always use UTF8, and use the
> unicode_errors attribute for handling errors.
OK, I'll add them. On them being UTF8 I'll try that first and check
how it plays with referring pages in different encodings. If needed I
think a bit more of flexibility might be required. For example a
path_charset attr (initially set to 'UTF8') which is used for
encoding/decoding path. If set to None path encode/decode would fall
back to .charset.
One more thing I remembered: the conditional_response switch doesn't
feel right to me. I'd rather have a separate ConditionalResponse
subclass and add Response.as_conditional() method returning
ConditionalResponse instance. That way default_conditional_response
goes away and its purpose instead accomplished with using
ContionalResponse as a baseclass. I think that would be a great
cleanup. I understand that would require backwards compatibility (with
warnings I suppose and eventual deprecation) and is exactly the thing
better done in a fork.
I'd like to see something to differentiate one-shot / non-oneshot
responses too, but I'm not sure what and how it should be done. Maybe
just a .oneshot bool property that tries to guess if app_iter is
reusable (if it's a list or not, essentially).
BTW, all 'defect' tickets are closed now: http://bit.ly/webob-tickets
Sure.
>>> Some other minor changes I'd like to discuss are
>>> * remove / move to a subclass support for req/resp cross-referencing.
>>> * ubody alias for unicode_body
>>> * upath_info for decoded path_info and similar
>>
>> So:
>>
>> * I'm not sure what you mean about req/resp cross-referencing. You
>> mean req.response and resp.request? I'm not sure how this would
>> particularly go into a subclass. Also, they are pretty optional. Is
>> there a particular issue you have with them?
>
> I often reuse Response objects (it works fine as long app_iter is
> reusable, and that can be ensured by touching .body). Sometimes I
> reuse Reponses returned by req.get_response. That means as long as I
> keep that response I'm also keeping a request that was used to
> generate it. That's not too big a deal, but feels sloppy. WebOb is
> quite low-level generally and keeping a reference to request is quite
> "frameworkey" IMO, so, given how easy it is to do that in a subclass,
> I think it would be a good change.
>
> req.response however is only used in webob.dec, so as long normal
> get_response doesn't add that reference (which is circular) I have no
> objections to that one.
I don't like deep class hierarchies, so I'd really rather keep this
functionality in the main class as much as possible. WebOb itself
shouldn't try to make use of req.response or resp.request, though
there are a few cases (really most specifically the setting of
.location on the response, a rather obscure case where it makes the
link absolute using the given request).
>> * ubody is fine.
>> * I agree on upath_info (and uscript_name). If you want to put those
>> in, please do. I assume they should always use UTF8, and use the
>> unicode_errors attribute for handling errors.
>
> OK, I'll add them. On them being UTF8 I'll try that first and check
> how it plays with referring pages in different encodings. If needed I
> think a bit more of flexibility might be required. For example a
> path_charset attr (initially set to 'UTF8') which is used for
> encoding/decoding path. If set to None path encode/decode would fall
> back to .charset.
I did some testing, and on Firefox it is UTF8 regardless of the page
encoding (meaning QUERY_STRING could have a different encoding). This
also fits the IRI specification. Of course it's easy to manually
create a link that is no particular encoding at all, e.g., /foo%c3bar,
but it's hard to know what encoding it is at all if it isn't UTF8.
Latin1 might be a reasonable fallback, simply because it can be undone
and never fails, unlike UTF8.
> One more thing I remembered: the conditional_response switch doesn't
> feel right to me. I'd rather have a separate ConditionalResponse
> subclass and add Response.as_conditional() method returning
> ConditionalResponse instance. That way default_conditional_response
> goes away and its purpose instead accomplished with using
> ContionalResponse as a baseclass. I think that would be a great
> cleanup. I understand that would require backwards compatibility (with
> warnings I suppose and eventual deprecation) and is exactly the thing
> better done in a fork.
Again I really don't like subclassing like this. In part because it
makes it harder for it to be further subclassed -- you have to decide
if you want to subclass ConditionalResponse, Response, BaseResponse,
etc. So I want to keep conditional_response in.
> I'd like to see something to differentiate one-shot / non-oneshot
> responses too, but I'm not sure what and how it should be done. Maybe
> just a .oneshot bool property that tries to guess if app_iter is
> reusable (if it's a list or not, essentially).
Potentially you could test if "iter(app_iter) is app_iter" -- if it
is, then it's probably one-shot (because it's an iterator, not an
iterable).
I understand your objection, but the 'location' case could be resolved
by making it absolute using environ when calling the response. And
anyway, if location is not absolute and .request is not set (or set
after .location) it will not be absolute. So it both looks not that
strictly enforced in the first place and it seems that doing it in
__call__ would be correct in more cases too.
There's no doubt class hierarchies can get unnecessarily deep, but in
this case I'd argue it's worth it simply because it actually makes
both API and subclassing simpler. Let me explain why I think so.
* A couple of things get removed: conditional_response,
default_conditional_response.
* Response cannot *become* conditional if it wasn't -- that change
would require creation of a new instance (via resp.as_conditional() or
req.get_response(resp) in the other direction). (This again is
important for reusing responses).
* as_conditional() would work fine for subclassed Responses returning
regular, non-subclassed ConditionalResponse instance.
* The choice of subclassing Response or ConditionalResponse is no
different than choosing to set default_conditional_response. And it's
a much more natural way to do it too IMO.
* And Response class is extemely big, conditional response code is an
obvious candidate for separating into a separate class.
At least I think it's worth exploring, so I'll see how it goes in a fork.
>> I'd like to see something to differentiate one-shot / non-oneshot
>> responses too, but I'm not sure what and how it should be done. Maybe
>> just a .oneshot bool property that tries to guess if app_iter is
>> reusable (if it's a list or not, essentially).
>
> Potentially you could test if "iter(app_iter) is app_iter" -- if it
> is, then it's probably one-shot (because it's an iterator, not an
> iterable).
Thanks for suggestion, I didn't think of that.
I'm not really aware of what those fixes are, or if they apply to
WebOb (I suspect they don't). If there are specific API differences
where they could be unified, well... we can discuss them. Talking
with Armin, his biggest concerns have been around handling the request
body (which is tricky at best; Wekzeug is more naive but less likely
to be unperformant; rather it just won't work in these difficult
situations). If there are nasty situations, I hope they can be fixed
in WebOb, though I'd like to make everything Work, even in cases that
aren't a priority for Wekzeug (mostly related to contention for the
request body, as with middleware).
Another thing Armin specifically mentioned is that he doesn't like
that all WebOb requests are mutable. He prefers the Werkzeug setup
where the base request object is basically read-only, and there is a
subclass that can be written to. First, that would be rather hard to
do with WebOb (maybe a read-only flag would be possible, but the
functionality would be there regardless, only explicitly disabled via
a flag), and secondly I don't particular agree with him on this
matter, and I don't think there's a strong justification for removing
this functionality (I am of a mind that if you don't want to modify
the request, then don't do it).
If there are other things I'm not aware of them; you'll have to list
them more specifically.
One thing I'd like to see in WebOb is "unicode everywhere".
Having res.body and res.unicode_body seems odd to me.
--
Jon
-1. Making request.body return encoded values is maximally backwards
compatible, even if it makes the API slightly less pretty. Having some global
knob that makes request.body and friends always return unicode would probably
just confuse things, as well (as opposed to req.ubody or however it was
proposed to be spelled).
- C
Request and response bodies aren't necessarily text, so forcing them
to always be unicode doesn't really work.
On Mon, Nov 2, 2009 at 10:10 AM, Ian Bicking <ianbi...@gmail.com> wrote:
> On Mon, Nov 2, 2009 at 6:31 AM, Jon Nelson <jne...@jamponi.net> wrote:
>> One thing I'd like to see in WebOb is "unicode everywhere".
>> Having res.body and res.unicode_body seems odd to me.
>
> Request and response bodies aren't necessarily text, so forcing them
> to always be unicode doesn't really work.
--
Jon
This isn't really the case for GAE -- while it *may* only run one
request in a process, it usually keeps a process alive for longer than
that, so you are just deferring the hit.
> Would it be possible to import webob in a more lazy manner? eg. Import
> Request without importing the code for Response and status codes
> etc.
The "API" of WebOb includes:
from webob import Request, Response
So those imports really have to keep working. There are some tricks
to sometimes make this stuff work (Zope has experimented with it a
lot) but they might only slow things down for the most common case
(when you are using all of it), and they might not work on GAE anyway.
And they are a mess to maintain.
In my experiments, it didn't seem like it was always imported
up-front, though I wasn't thorough in figuring it out (maybe it was
only the SDK where it wasn't preimported).
> There's a problem however with trying to use webob newer than the one
> provided by Google (0.9 IIRC) -- you can't just include it with your
> app, you also need to make sure to eject all webob-related modules
> from sys.modules before importing webob again. That's a PITA and makes
> one take the penalty for the huge import, which is one of the reasons
> I asked Ian about 1.0 release. I hope that way it's more likely that
> Google will upgrade their copy.
It is a nuisance, yes. There's some code here that handles it:
http://bitbucket.org/sluggo/appengine-homedir/src/tip/homedir-runner.py
Deleting the imported modules and keeping them from being imported is
slightly different in production and the SDK, which adds a bit of
complexity to it.
I think it's always preloaded when deployed and never is in the SDK.
(Can't be sure about the SDK, cause I use modified local environment,
not vanilla SDK)
>> There's a problem however with trying to use webob newer than the one
>> provided by Google (0.9 IIRC) -- you can't just include it with your
>> app, you also need to make sure to eject all webob-related modules
>> from sys.modules before importing webob again. That's a PITA and makes
>> one take the penalty for the huge import, which is one of the reasons
>> I asked Ian about 1.0 release. I hope that way it's more likely that
>> Google will upgrade their copy.
>
> It is a nuisance, yes. There's some code here that handles it:
>
> http://bitbucket.org/sluggo/appengine-homedir/src/tip/homedir-runner.py
>
> Deleting the imported modules and keeping them from being imported is
> slightly different in production and the SDK, which adds a bit of
> complexity to it.
I have the following code in one of bootstrapping modules that are
imported very early. Seems to work well for me.
mod_main = sys.modules['__main__']
main_file = mod_main.__file__
app_dir = realpath(dirname(main_file))
if app_dir in sys.path:
sys.path.remove(app_dir)
sys.path.insert(0, app_dir)
deployed = (sys.executable == '/base/')
if deployed:
for name in sys.modules.keys():
if name.startswith('webob'):
del sys.modules[name]
import webob
if not webob.__file__.startswith(app_dir):
raise ValueError("%s / %s" % (webob.__file__, app_dir))
Google probably won't due to fears about backward compatibility. When
I asked a developer whether they were going to switch to
webob.Response (they're only using Request now), he said they were
reluctant for this reason. The App Engine API has versioning, so
eventually a "Python #2" API might appear, but probably only when it
can be bundled with much more significant updates.
--
Mike Orr <slugg...@gmail.com>
> I'm not really aware of what those fixes are, or if they apply to
> WebOb (I suspect they don't). If there are specific API differences
> where they could be unified, well... we can discuss them. Talking
> with Armin, his biggest concerns have been around handling the request
> body (which is tricky at best; Wekzeug is more naive but less likely
> to be unperformant; rather it just won't work in these difficult
> situations). If there are nasty situations, I hope they can be fixed
> in WebOb, though I'd like to make everything Work, even in cases that
> aren't a priority for Wekzeug (mostly related to contention for the
> request body, as with middleware).
The fixes I was aware of that Armin has worked around in Werkzeug:
- Multipart parsing that doesn't suck, better file upload handling
- Fixed bug in Python stdlib regarding handling of 'bad' cookies. Ie, if Python is parsing 4 cookies, and the first one is 'invalid', Python *stops parsing* the rest! This is bad as several webapp systems use the character Python doesn't like, so having it on the same domain as a Python app (not werkzeug) means cookies just disappear since Python stops parsing them.
I think there was one or two other things related to having a cgi.fieldstorage that doesn't suck, and some other header parsing that Werkzeug might handle better. But these are what I'd consider critical fixes for getting into WebOb.
- Ben
The fixes I was aware of that Armin has worked around in Werkzeug:
- Multipart parsing that doesn't suck, better file upload handling
- Fixed bug in Python stdlib regarding handling of 'bad' cookies. Ie, if Python is parsing 4 cookies, and the first one is 'invalid', Python *stops parsing* the rest! This is bad as several webapp systems use the character Python doesn't like, so having it on the same domain as a Python app (not werkzeug) means cookies just disappear since Python stops parsing them.
I think there was one or two other things related to having a cgi.fieldstorage that doesn't suck, and some other header parsing that Werkzeug might handle better. But these are what I'd consider critical fixes for getting into WebOb.
On Nov 1, 2009, at 5:39 PM, Ian Bicking wrote:
>
There's at least some good arguments for immutability:
o Request object becomes much simpler with way less code (see Werkzeug)
o Becomes easier to cache attributes and avoid property overhead (see Werkzeug)
o You can still change environ you just have to do it manually
o One step towards Werkzeug possibly adopting WebOb (when the aforementioned goal was being adopted by more frameworks)
The drawbacks are we lose that sometimes handy functionality and that WebTest relies on it. How much does WebTest really rely on it though (it doesn't really seem like much)?
And just to clarify, the Werkzeug wrappers aren't totally immutable, you can still modify their attributes and add your own attributes. Its exposed data structures (like MultiDicts) are immutable and it doesn't write to environ.
A middle ground might be what Werkzeug does now but allowing writes to data structures like MultiDicts -- just not propagating the changes to environ.
> If there are other things I'm not aware of them; you'll have to list
> them more specifically.
Werkzeug also has some pieces of functionality separated out via Mixins, this mainly makes the code cleaner. It doesn't actually have a mutability Mixin, that's just a proposal by Armin to find a middle ground with WebOb.
One big difference is it stores file uploads in a different container than the main request.POST multidict, and more importantly doesn't use cgi.FieldStorage for those file objects or for parsing the form. With that alternative form parser you can also easily limit the size of form posts/file uploads.
The other API differences are probably mostly naming of attributes, existence of more or less shortcuts, things we probably don't differ very much on. Armin also claimed WebOb doesn't handle invalid cookies as well as Werkzeug.