Far too many open source projects are forever stuck < 1.0, and for
product adoption it's hard to pitch a product that has yet to see non-
beta status.
> For 1.0 I'd like to settle the API as much as possible. But I'd
> also like to move further to getting WebOb used for more frameworks.
WebOb and Paste are the foundation of YAPWF (http://www.yapwf.org/), a
new light-weight framework I'm working on that has yet to receive a
good name.
> Are there API changes that would help people consider WebOb for
> other frameworks? The main ones I can think of is req.FILES,
> separating out file uploads from other POST fields.
+1. Additionally, as post data isn't restricted to HTTP POST requests
(as mentioned in the source comments), some alternate naming should be
made available. Also, is all-caps necessary? It matches PHP's naming
style for superglobals, but I wouldn't consider PHP to be a shining
beacon of hope. Something like generic "data" (get+post merged),
files, args (post), and params (get).
> Also then there's the issue of what kind of object represents files.
One thing Python's internal file objects have always missed, to me, is
metadata. The filename, path, size, mimetype, etc. I'd like a file-
like object with properties for these additional details.
> I'd like to see webob/__init__.py split into a few modules (at least
> request, response). And not because it would change how it's used in
> any way but to make reading and editing source easier. I think
> __init__.py is just way too big right now.
+2. Reading through __init__.py to figure out why Content-Type was set
to zero for HEAD requests was impossible for me. (Found an existing
issue in Trac later, though.)
> One thing I'd like to see in WebOb is "unicode everywhere". Having
> res.body and res.unicode_body seems odd to me.
+1. I've never quite understood the reason behind the body/
unicode_body; it's trivial to isinstance(body, unicode) to determine
which it is and use isinstance(body, basestring) to enforce str/
unicode for the attribute as a whole on assignment. In this day and
age, everything from front to back should be unicode, AFIK.
> …the Werkzeug setup where the base request object is basically read-
> only, and there is a subclass that can be written to…
-1. YAPWF modifies the request iteratively as the object dispatch
mechanism does its work. (e.g. progressively moving path elements
from path_info to script_name).
My 2¢.
— Alice.
This is already done by the way.
>> One thing I'd like to see in WebOb is "unicode everywhere". Having
>> res.body and res.unicode_body seems odd to me.
>
> +1. I've never quite understood the reason behind the body/
> unicode_body; it's trivial to isinstance(body, unicode) to determine
> which it is and use isinstance(body, basestring) to enforce str/
> unicode for the attribute as a whole on assignment. In this day and
> age, everything from front to back should be unicode, AFIK.
Except for non-text data that should be represented by str/bytes and
which is the reason body and unicode_body are separate.
--
Best Regards,
Sergey Schetinin
http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter
Maybe to avoid confusion (and promote better unicode awareness) there
should be body and body_bytes or body_raw or something like that -
returning non-textual data is far less common than returning textual
data, so optimize for the common case.
--
Jon
That sounds like a good idea, but it's simply too late for that --
every webob app out there is expecting body to be str. By the way,
when you set body when instantiating Response, the first argument can
be str or unicode -- webob will handle both cases properly. So
Response('x') and Response(u'x') both work as you'd expect.
BTW, another way to clarify things would be to add `text` alias to
`unicode_body`, but I kind of don't see why is there a confusion in
the first place -- there's body and unicode_body/ubody and it's clear
which one is which.
So? WebOb isn't 1.0 yet so if something should be fixed then do so
*before* the API is considered stable. Retain backwards compatability,
if possible, and perhaps with a warning. I don't see any reason to
keep unicode body myself, actually. Just keep 'body' and be done with
it. If it's of type str, then leave it alone. If it's unicode, then
encode as appropriate. This allows the API to remain clean and
obvious, while still retaining the ability to return both non-ASCII
*text* in the appropriate encoding *and* binary data like images.
Maybe I'm missing the utility of unicode_body but it seems like a
kluge to me, and what's worse it's entirely unclear when body should
be used versus unicode body - it's not something that people should
even have to think about.
> By the way,
> when you set body when instantiating Response, the first argument can
> be str or unicode -- webob will handle both cases properly. So
> Response('x') and Response(u'x') both work as you'd expect.
> BTW, another way to clarify things would be to add `text` alias to
> `unicode_body`, but I kind of don't see why is there a confusion in
> the first place -- there's body and unicode_body/ubody and it's clear
> which one is which.
I think that's even worse. You've listed *three* ways (body,
unicode_body, ubody) to set the response content. To me, perhaps the
best way to deal with this would be to *not* have unicode body/ubody
and have just body. Orthogonality is very important.
--
Jon
The language and the libraries should make working with unicode text
transparent, but that doesn't mean that programmer shouldn't know the
difference or think about it. Some operations require unicode text and
some require 8-bit representation, having separate attributes for
these views is the proper way to make both available. You might think
that having just one attribute for both would be better but you're
wrong.
>> By the way,
>> when you set body when instantiating Response, the first argument can
>> be str or unicode -- webob will handle both cases properly. So
>> Response('x') and Response(u'x') both work as you'd expect.
>
>> BTW, another way to clarify things would be to add `text` alias to
>> `unicode_body`, but I kind of don't see why is there a confusion in
>> the first place -- there's body and unicode_body/ubody and it's clear
>> which one is which.
>
> I think that's even worse. You've listed *three* ways (body,
> unicode_body, ubody) to set the response content. To me, perhaps the
> best way to deal with this would be to *not* have unicode body/ubody
> and have just body. Orthogonality is very important.
I can't see the benefit of having to do manual conditional conversions
when you need a specific representation of response body and I think
you're just focusing on some specific use-case and missing how much of
a headache such a change would really be.
WebOb originally had two different names, but I converted to GET/POST
to be more similar to several other frameworks. I know it's a
misnomer, but I decided conformity was more important.
>> Also then there's the issue of what kind of object represents files.
>
> One thing Python's internal file objects have always missed, to me, is
> metadata. The filename, path, size, mimetype, etc. I'd like a file-
> like object with properties for these additional details.
Indeed; but it also requires some specific API, which I haven't
figured out ;) Just copy Django?
And probably a few helper methods, at least something to move the file
to non-temporary location. But I think it'll be best to keep these
minimal.
>> One thing I'd like to see in WebOb is "unicode everywhere". Having
>> res.body and res.unicode_body seems odd to me.
>
>
> +1. I've never quite understood the reason behind the body/
> unicode_body; it's trivial to isinstance(body, unicode) to determine
> which it is and use isinstance(body, basestring) to enforce str/
> unicode for the attribute as a whole on assignment. In this day and
> age, everything from front to back should be unicode, AFIK.
I feel strongly that request bodies are bytes and that is the most
natural representation (i.e., resp.body being the most natural name,
it is bytes).
I'm not terribly opposed to the idea of being able to set resp.body to
a unicode string, though it will always return bytes, so it's kind of
inelegant. You can use resp.write() to write bytes or unicode
already.
--
Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
On 2009-11-08, at 3:30 PM, Ian Bicking wrote:
> I'm not terribly opposed to the idea of being able to set resp.body to
> a unicode string, though it will always return bytes, so it's kind of
> inelegant. You can use resp.write() to write bytes or unicode
> already.
Writing is by far the more common case for Response objects, so here's
a small patch that would work nicely, as you were already testing for
unicode assignment.
http://bitbucket.org/gothalice/webob/changeset/0a1fcba7b844/
diff -r eec515fceb5b webob/response.py
--- a/webob/response.py Sun Nov 08 08:45:31 2009 +0200
+++ b/webob/response.py Sun Nov 08 16:19:37 2009 -0800
@@ -344,8 +344,9 @@
def _body__set(self, value):
if isinstance(value, unicode):
- raise TypeError(
- "You cannot set Response.body to a unicode object
(use Response.unicode_body)")
+ self.body_unicode = value
+ return
+
if not isinstance(value, str):
raise TypeError(
"You can only set the body to a str (not %s)"
— Alice.
I'm not sure I agree, but I'm willing to be convinced. I guess I've
always thought of it as a response is *always* a opaque chuck of bits
and if you wanted to make use of unicode in your application then you
should encode the body *before* setting or adding it to body. I'm all
for programmer responsibility here but having two attributes seems
like a poor choice. Since there are two attributes, what happens if I
make use of both of them? Which one "wins" if I did use both? Having
*one* attribute that leaves str alone and encodes unicode /at
assignment time/ seems to make more sense to me - however, as I said
before, I'm willing to be convinced that I'm wrong. Let's try to keep
it civil, though.
>>> By the way,
>>> when you set body when instantiating Response, the first argument can
>>> be str or unicode -- webob will handle both cases properly. So
>>> Response('x') and Response(u'x') both work as you'd expect.
>>
>>> BTW, another way to clarify things would be to add `text` alias to
>>> `unicode_body`, but I kind of don't see why is there a confusion in
>>> the first place -- there's body and unicode_body/ubody and it's clear
>>> which one is which.
>>
>> I think that's even worse. You've listed *three* ways (body,
>> unicode_body, ubody) to set the response content. To me, perhaps the
>> best way to deal with this would be to *not* have unicode body/ubody
>> and have just body. Orthogonality is very important.
>
> I can't see the benefit of having to do manual conditional conversions
> when you need a specific representation of response body and I think
> you're just focusing on some specific use-case and missing how much of
> a headache such a change would really be.
Let's talk about use cases, then. With a single .body attribute that
checks to see if the value is of type unicode or not we get this
behavior:
req.body = some_ascii_or_iso8850-1_text
# do nothing
req.body = some_binary_bits_such_as_an_image # type is str
# do nothing
req.body = some_unicode
# encode to the curently-set encoding. raise errors as appropriate.
Are there other use cases?
--
Jon
These attributes are views on the same data, so when you think about
them that way I don't think the question of which one "wins" comes up
at all. And don't forget there are .app_iter and .body_file too. They
all are the same thing in different forms.
Sure: reading and modifying the body. Sometimes you'd want to parse it
and need binary body, sometimes you want to process it as text.
Middleware and stuff.
You seem to only consider setting .body.
Also, as far as clarity goes, unicode_body interacts with .charset in
a clear way (it requires it to be set and encodes the body with it).
If .body would accept unicode then suddenly setting .body may or may
not require charset depending on the body type. I imagine for the use
cases when the library/app setting the body isn't exactly sure what is
the type of the body it is setting (otherwise, why require single attr
for both), that would often result in either errors when trying to set
the body to unicode or setting charset "just in case", which is just
as bad. And if one knows the type of the body when setting, then
having a telling name for the attr really helps when reading the code.
I'm still not convinced, but I see your reasoning. Thanks for being patient.
--
Jon
if __debug__:
from webob.rfc_references import _process_docstrings
_process_docstrings()
which would use a mapping of attributes to rfc sections an go though
those attributes and instrument their docstrings with references to
the RFC. I think that would clean up the code a little.
In fact that's the idea -- to collect all references in one place,
that way it's easier to see if any rfc sections are not implemented or
no referenced, and the modules themselves have quite a lot going on,
so removing the references in descriptor declarations could be a good
thing.
``.params`` is as good a name as any for the combined parameters, and
changing it would break many applications.
I would like a separate attribute for files, with a more robust API
that does basic security checks (e.g., is file too large, sanitizing
the filename, etc). There's no reason for every application to
reinvent the wheel here. I think Quixote has a smart upload object
that can serve as an example.
--
Mike Orr <slugg...@gmail.com>