RE: [cherrypy-devel] Re: Separation of get/post vars

49 views
Skip to first unread message

Robert Brewer

unread,
Oct 18, 2005, 4:07:13 PM10/18/05
to cherryp...@googlegroups.com
polaar wrote:
> It's not really a question of security, it's more a
> question of HTTP semantics.

Why, yes, it is. But you have to read the whole spec. ;)

> When doing a GET request, you are retrieving a resource
> identified by a URI...the difference between GET and POST
> can indeed be very important. This also means that params
> in the POSTed message body are data that you "send",
> while params in the URI are actually just part of an
> identifier. It just happens to be that they both use the
> same format (application/x-www-form-urlencoded).

That's not really true. Read sec 3.2.2:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.2

The "http" scheme is used to locate network resources
via the HTTP protocol. This section defines the scheme-
specific syntax and semantics for http URLs.

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

If the port is empty or not given, port 80 is assumed.
The semantics are that the identified resource is located
at the server listening for TCP connections on that port
of that host, and the Request-URI for the resource is abs_path
(section 5.1.2).

In other words, the "query" portion (the params included after the "?")
is *not* part of the Request-URI, and is not used to identify the
resource.

> This is because the people who invented HTML
> (that's where this format is specified) needed a format
> for form data... (see HTML spec:
> http://www.w3.org/TR/html4/interact/forms.html#submit-format)

To be painfully explicit, CherryPy is not an HTML server, it is an HTTP
application server. The HTML spec has precious little to contribute to
the design of CherryPy.

> "http://www.example.com?param=foo" is the identifier of a resource,

Nope. "http:/www.example.com" is the identifier. "?param=foo" is the
query portion of the request, and is an instance of what you're calling
"data".

> and param=bar is the data that is POSTed to this resource.

Rather, "param=bar" is data that is POSTed to the
"http://www.example.com" resource.

> Well, to end this post (which has become longer than I intended): I
> hope you get the meaning, and that I've made clear why the distinction
> might be more important than it seems. Contrary to common belief, it's
> just not about two equivalent ways of sending form data...

Sorry, it really is just about two equivalent ways.

> ...params from the POST body seem to
> have complete precedence over URI params. This means
> that the above example would produce {'param':'bar'}.
> HTML forms (and the corresponding format) allow for
> multiple occurrences of the same param, so I would
> expect {'param':['foo','bar']}, which is what you
> would get in other circumstances where the same
> param is provided twice. Is this is a bug?

Yes, it's a bug. The new ticket on that subject should fix it. Thanks
for reporting it! :)


Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

polaar

unread,
Oct 18, 2005, 7:18:10 PM10/18/05
to cherrypy-devel
Robert Brewer wrote:
> polaar wrote:
> > It's not really a question of security, it's more a
> > question of HTTP semantics.
>
> Why, yes, it is. But you have to read the whole spec. ;)

Well, I did ;-) I don't claim to know or understand it completely
though, but I still beg to differ on a few points below. And as it
turns out to be: you have to read even more than the whole spec?

> > When doing a GET request, you are retrieving a resource
> > identified by a URI...the difference between GET and POST
> > can indeed be very important. This also means that params
> > in the POSTed message body are data that you "send",
> > while params in the URI are actually just part of an
> > identifier. It just happens to be that they both use the
> > same format (application/x-www-form-urlencoded).
>
> That's not really true. Read sec 3.2.2:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.2
>
> The "http" scheme is used to locate network resources
> via the HTTP protocol. This section defines the scheme-
> specific syntax and semantics for http URLs.
>
> http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
>
> If the port is empty or not given, port 80 is assumed.
> The semantics are that the identified resource is located
> at the server listening for TCP connections on that port
> of that host, and the Request-URI for the resource is abs_path
> (section 5.1.2).
>
> In other words, the "query" portion (the params included after the "?")
> is *not* part of the Request-URI, and is not used to identify the
> resource.

Phew, this took me a long while ;-). I couldn't figure out how this
could play together with several things in section 5 (which seemed to
contradict themselves too), until I found this:
http://purl.org/NET/http-errata (via http://www.w3.org/Protocols/).
It says that:
The definition of Request-URI should be:

Request-URI = "*" | absoluteURI | abs_path [ "?" query ] |
authority

In other words...
(You had me really confused, I must admit...)

> > This is because the people who invented HTML
> > (that's where this format is specified) needed a format
> > for form data... (see HTML spec:
> > http://www.w3.org/TR/html4/interact/forms.html#submit-format)
>
> To be painfully explicit, CherryPy is not an HTML server, it is an HTTP
> application server. The HTML spec has precious little to contribute to
> the design of CherryPy.

I didn't mean to suggest that cherrypy is an HTML server, I was just
providing a little historic background: it was for html (and in the
html spec) that the format was defined, originally. But I will also
agree that this format has grown beyond HTML, although you will find it
nowhere in the HTTP spec. HTTP does define a query part for urls, but
it never says what the query format should look like.

> > "http://www.example.com?param=foo" is the identifier of a resource,
>
> Nope. "http:/www.example.com" is the identifier. "?param=foo" is the
> query portion of the request, and is an instance of what you're calling
> "data".
>
> > and param=bar is the data that is POSTed to this resource.
>
> Rather, "param=bar" is data that is POSTed to the
> "http://www.example.com" resource.

As I understand it (see above), the querystring IS part of (both the
absolute and) the request URI. So it plays a role in identifying the
resource. I don't know if you are convinced yet, but if you aren't:
feel free to give me some more counter-arguments. It could still be I'm
wrong, but it's just how I understand it.

But a more important issue than HTTP technicalities is perhaps what you
as a developer consider to be a resource. Eg: it is easy to see that
the URI for an article stored somewhere with an id "foo" could be
implemented with "articles?id=foo" or with "articles/foo" (and how you
would consider "id=foo" to be part of the identifier). Now add a
parameter "display" (which could have values like "fullcontent" and
"summary") and it becomes a different story. Although (in my opinion
;-)) this parameter would in HTTP terms be part of an identifier, I can
understand you would want to use these as "options" given to a
resource, not as part of the ID.

> > Well, to end this post (which has become longer than I intended): I
> > hope you get the meaning, and that I've made clear why the distinction
> > might be more important than it seems. Contrary to common belief, it's
> > just not about two equivalent ways of sending form data...
>
> Sorry, it really is just about two equivalent ways.
>
> > ...params from the POST body seem to
> > have complete precedence over URI params. This means
> > that the above example would produce {'param':'bar'}.
> > HTML forms (and the corresponding format) allow for
> > multiple occurrences of the same param, so I would
> > expect {'param':['foo','bar']}, which is what you
> > would get in other circumstances where the same
> > param is provided twice. Is this is a bug?
>
> Yes, it's a bug. The new ticket on that subject should fix it. Thanks
> for reporting it! :)

You're welcome!

By the way, I have made a post to cherrypy-users which touches on this
subject: I was trying out a decorator for method dispatching (based on
my - hopefully correct ;-) - assumption that the complete URI
determines the resource on which the method is performed), and I ran
into trouble with... separation of querystring/POST params ;-) I didn't
do it on purpose, I swear!
(Note that I still haven't asked to change the cherrypy way of handling
the whole querystring/post data issue, it's just that I hoped there
would still be a way to access the original POST body)

Steven

Robert Brewer

unread,
Oct 18, 2005, 8:47:56 PM10/18/05
to cherryp...@googlegroups.com
[polaar]
> When doing a GET request, you are retrieving a resource
> identified by a URI...the difference between GET and POST
> can indeed be very important. This also means that params
> in the POSTed message body are data that you "send",
> while params in the URI are actually just part of an
> identifier. It just happens to be that they both use the
> same format (application/x-www-form-urlencoded).

[fumanchu]
> That's not really true. Read sec 3.2.2:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.2
>
> The "http" scheme is used to locate network resources
> via the HTTP protocol. This section defines the scheme-
> specific syntax and semantics for http URLs.
>
> http_URL = "http:" "//" host [ ":" port ] [ abs_path [
> "?" query ]]
>
> If the port is empty or not given, port 80 is assumed.
> The semantics are that the identified resource is located
> at the server listening for TCP connections on that port
> of that host, and the Request-URI for the resource is abs_path
> (section 5.1.2).
>
> In other words, the "query" portion (the params included
> after the "?") is *not* part of the Request-URI, and is
> not used to identify the resource.

[polaar]
> Phew, this took me a long while ;-). I couldn't figure out how this
> could play together with several things in section 5 (which seemed to
> contradict themselves too), until I found this:
> http://purl.org/NET/http-errata (via http://www.w3.org/Protocols/).
> It says that:
> The definition of Request-URI should be:
>
> Request-URI = "*" | absoluteURI | abs_path [ "?" query ] |
> authority

I'm not sure what the status of the errata are in general:
http://purl.org/NET/http-errata

New drafts of both of the above are being prepared now,
incorporating all of the following corrections, in prep-
aration for requesting that they be advanced to full
Standard status. If you have an issue with any of these
resolutions, or if you think that you've found another,
you should post it to the HTTP Working Group list and
get the issue discussed there as soon as possible.

Nor am I sure how to treat this one in particular, which seems to be
based on a single email which nobody else ever responded to... hmm...
Although section 5.1.2 is corrected in the errata, section 3.2.2 is not.
Perhaps someone should mention that on the working-group mailing list?
;)

Regardless, until such time as the errata become codified, I think it's
worth sticking to the accepted standard, not only for pedantic reasons,
but also because much of CherryPy's interface is predicated on using
only path info to identify a resource (that is, objects on the
cherrypy.root tree should loosely map to "resources", and currently do
so using path portions of the URI only).

[polaar]
> I didn't mean to suggest that cherrypy is an HTML server, I was just
> providing a little historic background: it was for html (and in the
> html spec) that the format was defined, originally. But I will also
> agree that this format has grown beyond HTML, although you will find
> it nowhere in the HTTP spec. HTTP does define a query part for urls,
> but it never says what the query format should look like.

It didn't need to, because that format was already "defined" by RFC
2396. From RFC 2616:

3.2.1 General Syntax

URIs in HTTP can be represented in absolute form or relative to some
known base URI [11], depending upon the context of their use. The two
forms are differentiated by the fact that absolute URIs always begin
with a scheme name followed by a colon. For definitive information on
URL syntax and semantics, see "Uniform Resource Identifiers (URI):
Generic Syntax and Semantics," RFC 2396 [42] (which replaces RFCs
1738 [4] and RFC 1808 [11]). This specification adopts the
definitions of "URI-reference", "absoluteURI", "relativeURI", "port",
"host","abs_path", "rel_path", and "authority" from that
specification.

That spec (RFC 2396) says that, apart from some reserved characters, the
format is essentially arbitrary. CherryPy, for example, recognizes the
"application/x-www-form-urlencoded" format as a concession to HTML 2+,
but also recognizes the server-side image map (ismap) format. Other MIME
types may be supported in the future, regardless of whether they are
blessed by an HTML spec or not. ;) Perhaps CP needs a new flag,
"processQuerystring" (similar to "processRequestBody"), which can
disable the conversion of the querystring into the paramMap via
known/assumed MIME types.

Parts of RFC 2396, by the way, seem to indicate that the query component
should not be used to identify the resource:

3.4. Query Component

The query component is a string of information to be interpreted by
the resource.


5.2. Resolving Relative References to Absolute Form

This section describes an example algorithm for resolving URI
references that might be relative to a given base URI.

The base URI is established according to the rules of Section 5.1 and
parsed into the four main components as described in Section 3. Note
that only the scheme component is required to be present in the base
URI; the other components may be empty or undefined. A component is
undefined if its preceding separator does not appear in the URI
reference; the path component is never undefined, though it may be
empty. ***The base URI's query component is not used by the
resolution
algorithm and may be discarded.***

Of course, no part of the URI ever _defines_ a resource identity:
http://lists.w3.org/Archives/Public/www-tag/2002Nov/0012.html

When *used* as the target of a hypertext link, an "http" URI
identifies both an information resource and an access mechanism for
obtaining representations of that resource from its HTTP origin
server.
Neither one defines the identity of the resource. HTTP never defines
identity -- it hides it, always, and on purpose.

[polaar]
> But a more important issue than HTTP technicalities is
> perhaps what you as a developer consider to be a resource.

I disagree, and so does Tim Berners-Lee in a generic way (oddly, in the
same thread I already mentioned above):
http://lists.w3.org/Archives/Public/www-tag/2002Nov/0016.html

I disagree. It is the model for natural language but not for
specs. W3C and IETF (etc) specs determine what identifiers
identify and langauges mean. if you allow arguments that misuse
changes the meaning then you open the whole stack to destruction

[polaar]
> By the way, I have made a post to cherrypy-users which touches on
> this subject: I was trying out a decorator for method dispatching
> (based on my - hopefully correct ;-) - assumption that the complete
> URI determines the resource on which the method is performed),
> and I ran into trouble with...separation of querystring/POST
> params ;-) I didn't do it on purpose, I swear! (Note that I still
> haven't asked to change the cherrypy way of handling the whole
> querystring/post data issue, it's just that I hoped there would
> still be a way to access the original POST body)

If there is a querystring, it will be available as
cherrypy.request.querystring. If you want to process the request body
yourself (rather than having CherryPy do it), set the
"request.processRequestBody" attribute to False in a beforeRequestBody
filter method.

bon...@gmail.com

unread,
Oct 19, 2005, 8:37:22 AM10/19/05
to cherrypy-devel
Alternatively, just build a dict out of the querystring then
"diff"(some form of set operation) it against returned kwargs. Should
one really need to distinguish them. But I believe most app would at
most need to know if it is a POST/GET.

polaar

unread,
Oct 19, 2005, 8:56:57 AM10/19/05
to cherrypy-devel

I agree that it's not really clear. And I have no problem with the way
cherrypy only uses path portions for the mapping. However, I wouldn't
go as far as saying Request-URI == abs_path is the accepted standard (I
assume you don't mean "cherrypy standard"). If you inspect actual
Request-Lines issued by HTTP user agents, you will notice that the
contained Request-URI will always contain the query. (you can check
request.requestLine in cherrypy)

This is what I meant (but I may not have formulated it clearly enough).
As for the new flag: I don't think it is necessary, because the
original unparsed querystring is still available in
request.queryString.

> Parts of RFC 2396, by the way, seem to indicate that the query component
> should not be used to identify the resource:
>
> 3.4. Query Component
>
> The query component is a string of information to be interpreted by
> the resource.

This is indeed a counter-argument. It seems to be a grey zone. The best
answer to me is what I have found in rfc 1630
(http://www.w3.org/Addressing/rfc1630.txt). (As is stated in
http://www.w3.org/Addressing/ it "documents the designer's intent,
before it was revised by the standards process. It was written by Tim
Berners-Lee, but has only informational status in the IETF") It says
about query strings:

The question mark ("?", ASCII 3F hex) is used to delimit the
boundary between the URI of a queryable object, and a set of
words
used to express a query on that object. When this form is used,
the combined URI stands for the object which results from the
query being applied to the original object.

> 5.2. Resolving Relative References to Absolute Form
> This section describes an example algorithm for resolving URI
> references that might be relative to a given base URI.
>
> The base URI is established according to the rules of Section 5.1 and
> parsed into the four main components as described in Section 3. Note
> that only the scheme component is required to be present in the base
> URI; the other components may be empty or undefined. A component is
> undefined if its preceding separator does not appear in the URI
> reference; the path component is never undefined, though it may be
> empty. ***The base URI's query component is not used by the
> resolution
> algorithm and may be discarded.***

But a little later on, it says that the querystring has to be
re-appended to the result?

> [polaar]
> > But a more important issue than HTTP technicalities is
> > perhaps what you as a developer consider to be a resource.
>
> I disagree, and so does Tim Berners-Lee in a generic way (oddly, in the
> same thread I already mentioned above):
> http://lists.w3.org/Archives/Public/www-tag/2002Nov/0016.html
>
> I disagree. It is the model for natural language but not for
> specs. W3C and IETF (etc) specs determine what identifiers
> identify and langauges mean. if you allow arguments that misuse
> changes the meaning then you open the whole stack to destruction

Again, I may not have expressed myself clearly. You're absolutely right
(although the Tim Berners-Lee quote is actually about something else),
the developer shouldn't redefine HTTP or URI. What I meant is that - as
a cherrypy (or other framework) application developer) - it is you who
choose to work with the "queryable resource", or with the resource
resulting from the query (using the terms of rfc 1630). In practical
terms: you choose which one you'd like to turn into an object.

What this means for a framework like cherrypy: it knows that the
abs_path has an hierarchical form, and is therefore suitable for object
mapping. It knows nothing about what the querystring is to be used for,
other than that it should be provided to the object resulting from the
previous step (and in the case of the application/x-www-form-urlencoded
format: that it constitutes a key-value mapping). Conclusion: the
cherrypy approach is a very good one ;-) (I'm even in favour of the
concept of throwing querystring and POST params together to use as
keyword arguments, it's just that I think the ability to access the
original querystring vs. POST params can be important too, and I'm not
sure whether cherrypy doesn't lack something in that aspect)
This means that maybe this discussion is getting a little pointless
(although informative, for me anyway), I hope I'm not wasting your time
;-)

> [polaar]
> > By the way, I have made a post to cherrypy-users which touches on
> > this subject: I was trying out a decorator for method dispatching
> > (based on my - hopefully correct ;-) - assumption that the complete
> > URI determines the resource on which the method is performed),
> > and I ran into trouble with...separation of querystring/POST
> > params ;-) I didn't do it on purpose, I swear! (Note that I still
> > haven't asked to change the cherrypy way of handling the whole
> > querystring/post data issue, it's just that I hoped there would
> > still be a way to access the original POST body)
>
> If there is a querystring, it will be available as
> cherrypy.request.querystring. If you want to process the request body
> yourself (rather than having CherryPy do it), set the
> "request.processRequestBody" attribute to False in a beforeRequestBody
> filter method.
>

Well, the problem is that I'm trying to do everything in the decorator.
Filters don't seem like the right approach for this (too low-level).
What's more: it seems there is no way to set the filter from the
decorator (or is there?), and as filters are inherited down the tree,
it would imply that the filter is also applied for non-decorated
methods, which doesn't sound like a good idea.

Another thing: I've tried experimenting with a filter like you said,
and request.processRequestBody is indeed set to False, but when I want
to access request body, I get "AttributeError: 'thread._local' object
has no attribute 'body'" (even for a very simple example without the
decorator etc)

Steven

Sylvain Hellegouarch

unread,
Oct 19, 2005, 9:07:40 AM10/19/05
to cherryp...@googlegroups.com

> Another thing: I've tried experimenting with a filter like you said,
> and request.processRequestBody is indeed set to False, but when I want
> to access request body, I get "AttributeError: 'thread._local' object
> has no attribute 'body'" (even for a very simple example without the
> decorator etc)

Please have a look at http://www.cherrypy.org/wiki/XForm

There is an example in the attached ZIP file about a filter setting
request.processRequestBody to False.

It might help.

- Sylvain

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Sylvain Hellegouarch

unread,
Oct 19, 2005, 9:39:14 AM10/19/05
to cherryp...@googlegroups.com
For people still wondering what fumanchu and polaar are still debating about,
here is an interesting link:

http://www.cs.tut.fi/~jkorpela/forms/methods.html

It looks really that there is no clear and official answer to the question.

- Sylvain

Selon Sylvain Hellegouarch <s...@defuze.org>:

polaar

unread,
Oct 19, 2005, 10:35:41 AM10/19/05
to cherrypy-devel

Hmmm, I see: request.body is not set IF the body is processed (and if
the content-type is not application/x-www-urlencoded-form). That wasn't
clear, although now I know it (from looking at _cphttptools.py), it
seems to be clearly stated in the api docs ;-)

It turns out you have to read from request.rfile instead (for which
purpose you need the content-length), eg:
length = int(cherrypy.request.headerMap['Content-Length'])
body = cherrypy.request.rfile.read(length)

I'm not sure the above would work in all occasions (ie. if
Content-Length is not available), but in principle, this would solve at
least the AttributeError problem.

I was wondering, would it be a good idea to:
- make request.rfile.read() by default use the content-length for the
size? (or at least taking care of this in some way instead of leaving
it to the application developer)
- make accessing a request.body of None call the above (if applicable,
maybe returning an empty string if not)
- make an attempt to read from an already consumed rfile read from the
body
This would seem more developer-friendly, as you could just use the
rfile like you would a normal file object, and you would not have to
care about whether the rfile has been consumed into the body or not
etc. (it could be that this would lead to numerous problems, I don't
know, it's just a suggestion)

Steven

Sylvain Hellegouarch

unread,
Oct 19, 2005, 10:52:53 AM10/19/05
to cherryp...@googlegroups.com

> Hmmm, I see: request.body is not set IF the body is processed (and if
> the content-type is not application/x-www-urlencoded-form). That wasn't
> clear, although now I know it (from looking at _cphttptools.py), it
> seems to be clearly stated in the api docs ;-)

I agree it wasn't clear. The filter tries to handle two cases of form POSTing
by FormFaces (an XForm implementation)

>
> It turns out you have to read from request.rfile instead (for which
> purpose you need the content-length), eg:
> length = int(cherrypy.request.headerMap['Content-Length'])
> body = cherrypy.request.rfile.read(length)

RFC 2616 says:

The presence of a message-body in a request is signaled by the inclusion of a
Content-Length or Transfer-Encoding header field in the request's
message-headers.

It might not be safe to assume all user agents will follow that rule I agree.

>
> I'm not sure the above would work in all occasions (ie. if
> Content-Length is not available), but in principle, this would solve at
> least the AttributeError problem.

RFC 2616 gives some hints on that occasion:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

You can also raise a 411 error:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.12

>
> I was wondering, would it be a good idea to:
> - make request.rfile.read() by default use the content-length for the
> size? (or at least taking care of this in some way instead of leaving
> it to the application developer)
> - make accessing a request.body of None call the above (if applicable,
> maybe returning an empty string if not)
> - make an attempt to read from an already consumed rfile read from the
> body
> This would seem more developer-friendly, as you could just use the
> rfile like you would a normal file object, and you would not have to
> care about whether the rfile has been consumed into the body or not
> etc. (it could be that this would lead to numerous problems, I don't
> know, it's just a suggestion)

True and it might already be possible as I could be unaware of it.

My filter is definitely a test case and not THE example to follow but it might
give you some hints on how to go with your ideas.

Robert Brewer

unread,
Oct 19, 2005, 11:31:07 AM10/19/05
to cherryp...@googlegroups.com
Sylvain wrote:
> For people still wondering what fumanchu and polaar are still
> debating about, here is an interesting link:
> http://www.cs.tut.fi/~jkorpela/forms/methods.html

Yes, but our discussion goes a bit further, addressing the case when the
METHOD is POST but the URI also contains a query string.

Sylvain Hellegouarch

unread,
Oct 19, 2005, 11:37:07 AM10/19/05
to cherryp...@googlegroups.com
Yeap :)

However the link I've provided is a nice summary of the base of the discussion
:)

But yes, your discussion goes further.

Selon Robert Brewer <fuma...@amor.org>:

Robert Brewer

unread,
Oct 19, 2005, 6:14:53 PM10/19/05
to cherryp...@googlegroups.com
polaar wrote:
> I agree that it's not really clear. And I have no problem
> with the way cherrypy only uses path portions for the mapping.
> However, I wouldn't go as far as saying Request-URI == abs_path
> is the accepted standard (I assume you don't mean "cherrypy
> standard"). If you inspect actual Request-Lines issued by
> HTTP user agents, you will notice that the contained
> Request-URI will always contain the query. (you can check
> request.requestLine in cherrypy)

Yes, but we're starting to mix terms here, or at least talk past each
other. I'm interested in how the "Request-URI" identifies a "resource".
It seems to me that, when RFC 2616 (HTTP/1.1) and 2396 (URI) were
written, the identification of a "resource" did not include the "query"
component of a "Request-URI". However, the most recent RFC 3986
explicitly states that the query component is part of that
identification:
http://www.ietf.org/rfc/rfc3986.txt

3.4. Query

The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any).

In addition, Roy Fielding explicitly answered this topic on
comp.web.services.rest back in 2002:
http://article.gmane.org/gmane.comp.web.services.rest/319

BTW, for server-contructed URI, there is no effective difference
between query and path info -- both distinguish a resource. It is
only when the URI is client-constructed (as in GET forms or ISINDEX
or the old server-side image maps) that the client has any way to
distinguish the non-query path as a resource, and even then the
various URI with the query info tagged-onto the end identify separate
resources.

A resource is defined not by its implementation, but by a user's
expectation of results from future actions on that resource.
If two URI are expected to respond to GET differently, then they
must identify different resources.

All of which is to say, I wanted the REST concept of "resource" to map
more or less to the CherryPy concept of "exposed callable", but I can
see now that it doesn't and can't. It *does* map to the (callable,
request.querystring) tuple. I think what CP needs is:

1. A definitive term for CP callables besides "exposed callables",
"things on the cherrypy.root tree" or "page handlers", all of which
terms fall short of the actual role they play.

2. In the book, to explicitly spell out the relationship between a REST
"resource", a REST "representation", and CP callables and
cherrypy.request attributes. In particular, the standard CherryPy
dispatching mechanism (mapPathToObject + paramMap) is only one way of
mapping requests to resources/representations, and that this process can
be circumvented (via filters) or ignored (in the page handlers) if
desired.

3. We should probably set cherrypy.request.body to the entity body
regardless of its Content-Type (at the moment, it's only set if the body
is not application/x-www-form-urlencoded).

> As for the new flag: I don't think it is necessary, because the
> original unparsed querystring is still available in
> request.queryString.

Perhaps. I guess if a filter is going to use the querystring as part of
the resource identifier, then it's probably also going to override the
standard CP dispatch mechanism via a beforeMain method.

> The question mark ("?", ASCII 3F hex) is used to delimit the
> boundary between the URI of a queryable object, and a set of
> words used to express a query on that object. When this form
> is used, the combined URI stands for the object which results
> from the query being applied to the original object.

The problem is that "object" is ambiguous. In REST terms, it is being
used to mean both "handler" and "representation". :/

> What I meant is that - as a cherrypy (or other framework)
> application developer - it is you who choose to work with
> the "queryable resource", or with the resource resulting
> from the query (using the terms of rfc 1630). In practical
> terms: you choose which one you'd like to turn into an object.

Yes; the problem is that "the terms of rfc 1630" really aren't useful
anymore. HTTP/1.1 came about in part to address the content-negotiation
aspects which were missing from HTTP/1.0, and to bring HTTP more in line
with the REST ideal: that resources are never exchanged (only
representations of those resources), and that resources are not
"objects", but mappings. From
http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluation.htm:

An origin server maintains a mapping from resource identifiers
to the set of representations corresponding to each resource.

and

The resource is a conceptual mapping -- the server receives
the identifier (which identifies the mapping) and applies it
to its current mapping implementation (usually a combination
of collection-specific deep tree traversal and/or hash tables)
to find the currently responsible handler implementation and
the handler implementation then selects the appropriate
action+response based on the request content.

> What this means for a framework like cherrypy: it knows that the
> abs_path has an hierarchical form, and is therefore suitable
> for object mapping. It knows nothing about what the querystring
> is to be used for, other than that it should be provided to the
> object resulting from the previous step (and in the case of the
> application/x-www-form-urlencoded format: that it constitutes
> a key-value mapping). Conclusion: the cherrypy approach is a
> very good one ;-) (I'm even in favour of the concept of throwing
> querystring and POST params together to use as keyword arguments,
> it's just that I think the ability to access the original
> querystring vs. POST params can be important too, and I'm not
> sure whether cherrypy doesn't lack something in that aspect)
> This means that maybe this discussion is getting a little pointless
> (although informative, for me anyway), I hope I'm not wasting
> your time ;-)

Dead on. And not a waste at all. :)

> > [polaar]
> > > By the way, I have made a post to cherrypy-users which touches on
> > > this subject: I was trying out a decorator for method dispatching
> > > (based on my - hopefully correct ;-) - assumption that the
> > > complete URI determines the resource on which the method is
> > > performed), and I ran into trouble with...separation of
> > > querystring/POST params ;-) I didn't do it on purpose, I swear!
> >
> > If there is a querystring, it will be available as
> > cherrypy.request.querystring. If you want to process the
> > request body yourself (rather than having CherryPy do it),
> > set the "request.processRequestBody" attribute to False
> > in a beforeRequestBody filter method.
>
> Well, the problem is that I'm trying to do everything in the
> decorator.
> Filters don't seem like the right approach for this (too low-level).
> What's more: it seems there is no way to set the filter from the
> decorator (or is there?), and as filters are inherited down the tree,
> it would imply that the filter is also applied for non-decorated
> methods, which doesn't sound like a good idea.

Anytime you're overriding the dispatch mechanism, you should probably
use a filter, since only they are able to circumvent the built-in
dispatch mechanism (_cphttptools.main). It's unfortunate that they
inherit in your case. One of my dreams for CP 2.2 has been to better
support various dispatch mechanisms without using filters. That is, CP
would massage the request data into usable structures, which various
dispatchers would then use to identify, call, supply args to, and
translate the output of, the correct handler.

> Another thing: I've tried experimenting with a filter like you said,
> and request.processRequestBody is indeed set to False, but when I want
> to access request body, I get "AttributeError: 'thread._local' object
> has no attribute 'body'" (even for a very simple example without the
> decorator etc)

Right. Currently, request.body is only set if the entity Content-Type is
*not* "application/x-www-form-urlencoded". That should probably be
changed as I noted above, in order to facilitate your use case. I'll see
if we can sneak that into 2.1 final before it's released.

Istvan Albert

unread,
Oct 20, 2005, 8:59:11 AM10/20/05
to cherrypy-devel
> Yes, but our discussion goes a bit further, addressing the case when the
> METHOD is POST but the URI also contains a query string.

What's kind of odd about it this whole discusson is that it was brought
on by a completely misundestood use case (post parameters are
"safer"). The matter is further confounded by the fact that one could
not use a GET method with a URI that contains a query since it would
result in a URI that contains two ? symbols:

http://foo.com?foo=1?foo=2

which is not a valid URI anymore. Then one must ask, why would you guys
want to support one peculiar case where the method was POST and the URI
already contains a query string?

Seems like a feature that is needed only in cases where one is not
doing the right thing...

Istvan.

Sylvain Hellegouarch

unread,
Oct 20, 2005, 9:21:40 AM10/20/05
to cherryp...@googlegroups.com
Istvan,

Interestingly the sentence you quoted specifies clearly the case where you POST
a request which already contains a query string in its URI. We haven't really
talked about doing a GET in that case AFAIK.

So although your point is valid and does make sense, I don't see where this
discussion was confusing to you.

The discussion started because Steven wanted to know whether or not CP should
make a distinction between POST and GET parameters :)

Anyway, your point makes sense of course.

- Sylvain

Selon Istvan Albert <istvan...@gmail.com>:

polaar

unread,
Oct 20, 2005, 9:42:47 AM10/20/05
to cherrypy-devel
Istvan Albert wrote:
> > Yes, but our discussion goes a bit further, addressing the case when the
> > METHOD is POST but the URI also contains a query string.
>
> What's kind of odd about it this whole discusson is that it was brought
> on by a completely misundestood use case (post parameters are
> "safer").

Well, I guess we just got carried away... ;-)

> The matter is further confounded by the fact that one could
> not use a GET method with a URI that contains a query since it would
> result in a URI that contains two ? symbols:
>
> http://foo.com?foo=1?foo=2
>
> which is not a valid URI anymore.

Hmmm... I'm assuming you're talking about using the GET method in an
HTML form, as it is theoretically always possible to append things to
an existing querystring (just not with an extra '?'). If you take the
strict wording of the HTML spec, a browser should indeed do exactly
what you stated. Kind of a surprise for me, I must say, I'd assumed the
browser was supposed to add the querystring to the URI, with either a
'?' or a '&', depending on the presence of an existing querystring. But
it literally says they should add a '?' and the querystring, which
should result in an invalid URI, as you say.

Now, as it turns out to be, this is not what browsers do (at least not
Firefox and IE): they throw away the existing querystring, and add the
new one. This would suggest that (at least for this purpose), they
don't consider the querystring to be part of the URI! Grmmbl! Just when
it seemed that we were getting to agree that the querystring IS part of
the URI. What is somewhat comforting, is that the HTML spec explicitly
considers the result of appending the querystring to the URI of the
action attribute... a URI!
see: http://www.w3.org/TR/html401/interact/forms.html#submit-format

> Then one must ask, why would you guys
> want to support one peculiar case where the method was POST and the URI
> already contains a query string?
> Seems like a feature that is needed only in cases where one is not
> doing the right thing...
>
> Istvan.

Wait a minute, you were talking about problems using GET (form method)
with URI's that already contain a querystring. Now you're talking about
POST? I'm afraid I don't see what you mean. (as the POST case could
never result in an invalid URI)

Steven

polaar

unread,
Oct 20, 2005, 10:17:56 AM10/20/05
to cherrypy-devel
So you're turning away from your original standpoint ;-)? (which was
more or less: querystring does not play a role in identifying the
resource, and both querystring and POST params are just data provided
to the resource)

Robert Brewer wrote:
> All of which is to say, I wanted the REST concept of "resource" to map
> more or less to the CherryPy concept of "exposed callable", but I can
> see now that it doesn't and can't. It *does* map to the (callable,
> request.querystring) tuple. I think what CP needs is:
>
> 1. A definitive term for CP callables besides "exposed callables",
> "things on the cherrypy.root tree" or "page handlers", all of which
> terms fall short of the actual role they play.

Oh, so that's what "page handler" means ;-) I don't really have
anything against "exposed callables", as long as it's sufficiently
defined what "exposed" means, but maybe a better term can be found. It
would be useful to always use the same term though.

> 2. In the book, to explicitly spell out the relationship between a REST
> "resource", a REST "representation", and CP callables and
> cherrypy.request attributes. In particular, the standard CherryPy
> dispatching mechanism (mapPathToObject + paramMap) is only one way of
> mapping requests to resources/representations, and that this process can
> be circumvented (via filters) or ignored (in the page handlers) if
> desired.

If you want the book to explain about the way REST terms relate to CP
terms, it might be useful to explain that to be really "RESTful", you
shouldn't map using POST params (which are included in paramMap).
Again, I think mapPathToObject + paramMap is a great solution for 9 out
of 10 use cases. But if you're explicitly spelling out the
relationship, this should be mentioned.

> 3. We should probably set cherrypy.request.body to the entity body
> regardless of its Content-Type (at the moment, it's only set if the body
> is not application/x-www-form-urlencoded).

This would be nice (and consistent with request.querystring, which is
also kept in original form, even though it is parsed into
request.paramMap). Maybe something like request.queryStringParamMap and
request.postParamMap (perhaps with nicer names) would be useful too.
(even if just to avoid the developer having to use cgi.parse_qs
himself)

> > As for the new flag: I don't think it is necessary, because the
> > original unparsed querystring is still available in
> > request.queryString.
>
> Perhaps. I guess if a filter is going to use the querystring as part of
> the resource identifier, then it's probably also going to override the
> standard CP dispatch mechanism via a beforeMain method.

I'm not too sure about using filters for this purpose. I was under the
impression that they were mainly meant for making (low-impact) changes
to request and response, not to override the dispatching mechanism.
Most included filters seem to follow this rule. One of the problems
when using filters for such changes to the CP mechanism is that it can
be hard to know if somewhere higher up the tree, a filter is being
applied which completely changes the way everything works. Other things
like decorators, subclassing or simply further dispatching yourself do
not have this problem (or not to the same extent).

> > The question mark ("?", ASCII 3F hex) is used to delimit the
> > boundary between the URI of a queryable object, and a set of
> > words used to express a query on that object. When this form
> > is used, the combined URI stands for the object which results
> > from the query being applied to the original object.
>
> The problem is that "object" is ambiguous. In REST terms, it is being
> used to mean both "handler" and "representation". :/

Well, I guess that if this text would have been written more recently,
"object" would have been replaced by "resource".

> > What I meant is that - as a cherrypy (or other framework)
> > application developer - it is you who choose to work with
> > the "queryable resource", or with the resource resulting
> > from the query (using the terms of rfc 1630). In practical
> > terms: you choose which one you'd like to turn into an object.
>
> Yes; the problem is that "the terms of rfc 1630" really aren't useful
> anymore. HTTP/1.1 came about in part to address the content-negotiation
> aspects which were missing from HTTP/1.0, and to bring HTTP more in line
> with the REST ideal: that resources are never exchanged (only
> representations of those resources), and that resources are not
> "objects", but mappings. From
> http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluation.htm:
>
> An origin server maintains a mapping from resource identifiers
> to the set of representations corresponding to each resource.
>
> and
>
> The resource is a conceptual mapping -- the server receives
> the identifier (which identifies the mapping) and applies it
> to its current mapping implementation (usually a combination
> of collection-specific deep tree traversal and/or hash tables)
> to find the currently responsible handler implementation and
> the handler implementation then selects the appropriate
> action+response based on the request content.

Indeed, and this means that the actual mapping implementation is (to
the client) irrelevant. Applied to my above "developer choice" comment:
you can choose how far you want to go in explicitly mapping identifiers
to objects (I mean in this case "python objects", as the term "object"
is getting rather ambiguous). You could:
- use pathToObject mapping, and return responses from there, thereby -
implicitly - implementing a new mapping (and thus a resource), because
you're returning a representation (this is actually only done on a GET,
but I hope you get my meaning) of a different conceptual resource for
different URI's. Even if you're not actually creating corresponding
python objects for each URI, you're implementing a new mapping because
you're reacting differently depending on the contents of the
querystring. This is you would normally do in CP (with some allowances
for things like the - extremely useful - default() method), and it
works well for the majority of use cases.
- go one step further and explicitly create such a corresponding
object. This more or less what I was trying to do in my @restful.expose
decorator recipe, meanwhile adding HTTP method dispatching. (although
it still allows for reacting on the querystring params during execution
of the GET() method for example)
- take it all the way and automatically map to objects based on the
whole URI (don't know how this should be done though)
- probably use lots of other solutions

As I 've said above: I'm not too sure about filters for this purpose.
Furthermore, I was actually trying to find something that didn't need
to override the built-in dispatch mechanism. Just adding a little
dispatching afterwards. To put it simply, it's actually just about not
having to write things like:

@cherrypy.expose
def foo(self, **kwargs):
kwargs = cgi.parse_qs(cherrypy.request.queryString)
resource = Resource(**kwargs)
if cherrypy.request.method == 'GET':
return resource.GET()
# elif other methods here if needed
else:
cherrypy.response.headerMap['Allow'] = 'GET'
raise cherrypy.HTTPStatusError('405')

(well, you could do a better dispatching than using if, elif... but the
point was to have the decorator take care of that)
but instead writing:

@restful.expose
def foo(self, **kwargs):
return Resource(**kwargs)

I don't think it's really "overriding" the dispatch mechanism, do you?
And it seems strange having to use filters for something like that...

> > Another thing: I've tried experimenting with a filter like you said,
> > and request.processRequestBody is indeed set to False, but when I want
> > to access request body, I get "AttributeError: 'thread._local' object
> > has no attribute 'body'" (even for a very simple example without the
> > decorator etc)
>
> Right. Currently, request.body is only set if the entity Content-Type is
> *not* "application/x-www-form-urlencoded". That should probably be
> changed as I noted above, in order to facilitate your use case. I'll see
> if we can sneak that into 2.1 final before it's released.

Well, it's not that I really need it now, so you don't have to sneak it
in just for me. But I think it would be a good idea.

Steven

Istvan Albert

unread,
Oct 20, 2005, 10:29:15 AM10/20/05
to cherrypy-devel
>Wait a minute, you were talking about problems using GET (form method)
> with URI's that already contain a querystring. Now you're talking about
> Now you're talking about POST? I'm afraid I don't see what you mean.

what I was trying to say there was that this whole issue of submitting
a form to a URI that already contains a query element is ill defined,
and it seems that it cannot be solved universally for both form
submission methods.

Why have a machinery to support one case, POST if you can't do GET, all
it will do is sow confusion on how/when it actually works, its a
source complexity and bugs seemingly withouth a clear payoff in the
end.

(Personally I think that is just an oversight in the spec, and one
should not be able to submit a form to a URI that already has a query
string in it. As demonstrated in this thread it raises way too many
questions withouth a good answer to them)

Istvan.

Sylvain Hellegouarch

unread,
Oct 20, 2005, 10:37:04 AM10/20/05
to cherryp...@googlegroups.com
I quite agree to be honest. But then the problem is what should CP do about such
corner cases?

Maybe raisong 400?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1

Although it would be a strong response in terms of semantic. I'm not sure it'd
be the right one.

Selon Istvan Albert <istvan...@gmail.com>:

polaar

unread,
Oct 20, 2005, 11:51:13 AM10/20/05
to cherrypy-devel
I don't really agree to be honest ;-). The only thing that is ill
defined is the desired behaviour of user agents for HTML form
submission if the URI contains a querystring AND IF the method is GET.
No problems at all for POST. (browsers do not throw away the query part
for POST forms, by the way)
To be exact: the problem never even reaches the server. It's the HTML
user agent that has to deal with the following conflict:
- The URI format doesn't allow stacking querystrings on top of each
other
- yet this is exactly what it's supposed to do according to the HTML
spec
- what to do?

It basically has four choices:
- refusing to resolve the conflict, and thus not submitting the form
(hopefully informing the user)
- changing the method to POST, which would also be against the spec
(and unacceptable because POST is an 'unsafe' methods. Since GET has no
side effects, it's best to keep the method)
- deciding that what the HTML spec really meant was adding the form
dataset to the existing querystring (as opposed to adding a second '?'
and querystring), which was my first thought
- deciding that:
a) what the HTML spec meant was that the GET method for forms should
not be used with URI's containing a querystring (Istvan's idea),
concluding that it must be an author error, and throwing away the
querystring of the action URI.
b) deciding that the HTML spec may be wrong, but the author should be
aware of it's limitations, therefore considering it an author error,
proceed like in a)

And it seems like the consensus was to take the last solution. Oh
well...
In any case, it certainly is an oversight in the HTML spec, but this
only concerns HTML form submission by the user agent! (and authoring
HTML forms of course)

As for not supporting form submission to URI's containing querystrings:
- you cannot 'disable' this for GET on the server, as you cannot know
whether the request URI resulted from a form or something else (and
which parts came from the form, and which from the action URI)
- so why disable it for the other method, to paraphrase Istvan? (and
for which content-types, only for application/x-www-form-urlencoded?
This makes little sense, as it has nothing to do with the URI, the
request body just happens to be in a similar 'urlencoded' format) I'm
all for promoting "RESTful URI's", and resorting to querystrings only
when necessary (say for search terms for 'real' query result pages),
but this goes very far... (well, I guess you could also argument that
it's just about enforcing "good practices" on CP application
developer...)

(And again: GET and POST are not just "form submission methods", HTML
forms just happen to support both these methods, using a similar form
dataset->string conversion method)

About 400: I've always been a bit uncertain about that. Does "malformed
syntax" mean "malformed according to the http spec" (which seems most
likely), or does it mean "malformed according to the application
developer". In the first case, 400 would be wrong? On the other hand:
400 (Bad Request) is also the "generic" status code for client errors
(which officially only means that unrecognized 4xx errors should be
treated as 400, but it suggests that 400 can be used for more generic
purposes?) 403 Forbidden? ("the server understood the request, but is
refusing to fullfill it")

Anyway, I wouldn't do it ;-)

Robert Brewer

unread,
Oct 20, 2005, 1:40:35 PM10/20/05
to cherryp...@googlegroups.com
polaar wrote:
> So you're turning away from your original standpoint ;-)? (which was
> more or less: querystring does not play a role in identifying the
> resource, and both querystring and POST params are just data provided
> to the resource)

Yup.

> > 1. A definitive term for CP callables besides "exposed callables",
> > "things on the cherrypy.root tree" or "page handlers", all of which
> > terms fall short of the actual role they play.
>
> Oh, so that's what "page handler" means ;-) I don't really have
> anything against "exposed callables", as long as it's sufficiently
> defined what "exposed" means, but maybe a better term can be found. It
> would be useful to always use the same term though.

In the book, I just now used the term "handler", since that's what
Fielding used in his dissertation. I think it's fine as long as it's not
expanded to "page handler", since not every request results in a "page"
or page-like response.

> > 2. In the book, to explicitly spell out the relationship
> between a REST
> > "resource", a REST "representation", and CP callables and
> > cherrypy.request attributes. In particular, the standard CherryPy
> > dispatching mechanism (mapPathToObject + paramMap) is only
> one way of
> > mapping requests to resources/representations, and that
> this process can
> > be circumvented (via filters) or ignored (in the page handlers) if
> > desired.
>
> If you want the book to explain about the way REST terms relate to CP
> terms, it might be useful to explain that to be really "RESTful", you
> shouldn't map using POST params (which are included in paramMap).
> Again, I think mapPathToObject + paramMap is a great solution
> for 9 out
> of 10 use cases. But if you're explicitly spelling out the
> relationship, this should be mentioned.

I chose not to mention paramMap at all.

> > 3. We should probably set cherrypy.request.body to the entity body
> > regardless of its Content-Type (at the moment, it's only
> set if the body
> > is not application/x-www-form-urlencoded).
>
> This would be nice (and consistent with request.querystring, which is
> also kept in original form, even though it is parsed into
> request.paramMap). Maybe something like
> request.queryStringParamMap and
> request.postParamMap (perhaps with nicer names) would be useful too.
> (even if just to avoid the developer having to use cgi.parse_qs
> himself)

I'm not so sure. Dispatch based on the querystring is designed to be
opaque; that's why the query is considered to be part of the
Request-URI. By being opaque, proxies and gateways, for example, can
handle querystrings as part of the URI without regard for their
semantics. Therefore, I don't think it's really appropriate for CP to
set those request attributes. What it should do instead, is provide a
bare function in lib/cptools which does the parsing for you:

def parse_qs(qs, keep_blank_values=True):
if re.match(r"[0-9]+,[0-9]+", qs):
# Server-side image map. Map the coords to 'x' and 'y'
# (like CGI::Request does).
pm = qs.split(",")
pm = {'x': int(pm[0]), 'y': int(pm[1])}
else:
pm = cgi.parse_qs(qs, keep_blank_values)
for key, val in pm.items():
if len(val) == 1:
pm[key] = val[0]
return pm

Then _cphttptools.Request.processRequestHeaders would call that (since I
stole the function body from it).

> > > As for the new flag: I don't think it is necessary, because the
> > > original unparsed querystring is still available in
> > > request.queryString.
> >
> > Perhaps. I guess if a filter is going to use the
> querystring as part of
> > the resource identifier, then it's probably also going to
> override the
> > standard CP dispatch mechanism via a beforeMain method.
>
> I'm not too sure about using filters for this purpose. I was under the
> impression that they were mainly meant for making (low-impact) changes
> to request and response, not to override the dispatching mechanism.
> Most included filters seem to follow this rule. One of the problems
> when using filters for such changes to the CP mechanism is that it can
> be hard to know if somewhere higher up the tree, a filter is being
> applied which completely changes the way everything works.
> Other things
> like decorators, subclassing or simply further dispatching yourself do
> not have this problem (or not to the same extent).

In CP 2.1, filters are the only semi-elegant way to override handler
dispatch, because only they can act outside of the handler call itself.
To do dispatch any other way means that CP is doing a lot of work which
then just gets thrown away (overwritten). In a future version, perhaps,
this could change in order to get around the inheritance issues.

> > > The question mark ("?", ASCII 3F hex) is used to delimit the
> > > boundary between the URI of a queryable object, and a set of
> > > words used to express a query on that object. When this form
> > > is used, the combined URI stands for the object which results
> > > from the query being applied to the original object.
> >
> > The problem is that "object" is ambiguous. In REST terms,
> it is being
> > used to mean both "handler" and "representation". :/
>
> Well, I guess that if this text would have been written more recently,
> "object" would have been replaced by "resource".

Somewhat. It should be:

The question mark ("?", ASCII 3F hex) is used to delimit the
boundary between the URI of resource A, and a set of
words used to express a query on that resource. When this form
is used, the combined URI stands for resource B which results
from the query being applied to resource A.
Yes, but to be pedantic: If you use mapPathToObject, there is no
provision for implementing mappings based on the query. It's not enough
to "react differently" inside your handler--that's not a new mapping,
it's just a different "action+response".

> As I 've said above: I'm not too sure about filters for this purpose.
> Furthermore, I was actually trying to find something that didn't need
> to override the built-in dispatch mechanism. Just adding a little
> dispatching afterwards. To put it simply, it's actually just about not
> having to write things like:
>
> @cherrypy.expose
> def foo(self, **kwargs):
> kwargs = cgi.parse_qs(cherrypy.request.queryString)
> resource = Resource(**kwargs)
> if cherrypy.request.method == 'GET':
> return resource.GET()
> # elif other methods here if needed
> else:
> cherrypy.response.headerMap['Allow'] = 'GET'
> raise cherrypy.HTTPStatusError('405')
>
> (well, you could do a better dispatching than using if,
> elif... but the
> point was to have the decorator take care of that)
> but instead writing:
>
> @restful.expose
> def foo(self, **kwargs):
> return Resource(**kwargs)
>
> I don't think it's really "overriding" the dispatch mechanism, do you?

Yes, I do. :) And I look at the above as wasteful: CP does its dispatch,
then you do your own on top of that; CP forms its arg map, then you
throw that away and form your own.

polaar

unread,
Oct 20, 2005, 3:37:04 PM10/20/05
to cherrypy-devel
Robert Brewer wrote:
> > > 2. In the book, to explicitly spell out the relationship
> > between a REST
> > > "resource", a REST "representation", and CP callables and
> > > cherrypy.request attributes. In particular, the standard CherryPy
> > > dispatching mechanism (mapPathToObject + paramMap) is only
> > one way of
> > > mapping requests to resources/representations, and that
> > this process can
> > > be circumvented (via filters) or ignored (in the page handlers) if
> > > desired.
> >
> > If you want the book to explain about the way REST terms relate to CP
> > terms, it might be useful to explain that to be really "RESTful", you
> > shouldn't map using POST params (which are included in paramMap).
> > Again, I think mapPathToObject + paramMap is a great solution
> > for 9 out
> > of 10 use cases. But if you're explicitly spelling out the
> > relationship, this should be mentioned.
>
> I chose not to mention paramMap at all.

But you did mention it? Or do you mean in the book? But if you don't
mention paramMap, I assume you do mention the keyword arguments?
Anyway, never mind ;-)

Agree about the opacity rule. But what you say after that sounds like
CP shouldn't create a paramMap at all, and shouldn't do the keyword
argument thing, instead leaving it to the developer. Which seems
rather... er... "uncherrypic"? Do you really want to go that far, or
did you mean something else?
Note that the querystring doesn't have to be opaque to the server, as
Tim Berners-Lee states here:
http://www.w3.org/DesignIssues/Axioms.html#Query
(of course: "server" is a very general term. In this case, it means
more or less CP + application, but CP could encourage - even enforce -
the use of this particular format of querystring to its application
developers. At this moment, CP clearly encourages the use of this
format by assuming it by default to provide the keyword arguments.
That's a choice. You could choose to be more neutral about it I guess.
But whether CP application developers are going to like this?)

> > > > The question mark ("?", ASCII 3F hex) is used to delimit the
> > > > boundary between the URI of a queryable object, and a set of
> > > > words used to express a query on that object. When this form
> > > > is used, the combined URI stands for the object which results
> > > > from the query being applied to the original object.
> > >
> > > The problem is that "object" is ambiguous. In REST terms,
> > it is being
> > > used to mean both "handler" and "representation". :/
> >
> > Well, I guess that if this text would have been written more recently,
> > "object" would have been replaced by "resource".
>
> Somewhat. It should be:
>
> The question mark ("?", ASCII 3F hex) is used to delimit the
> boundary between the URI of resource A, and a set of
> words used to express a query on that resource. When this form
> is used, the combined URI stands for resource B which results
> from the query being applied to resource A.

I can live with that ;-)

In CP terms, you are indeed not making a new "mapping". But I think the
"conceptual mapping" is something different, and that is what counts,
not how it is implemented on the server (at least that is what *I*
think Fielding means): as long as you perform the correct action/return
the correct response based on the request, you are making this
conceptual mapping. An identifier identifies "something" (let's say
your homepage), the mapping between the identifier and this
"something", and the fact that the server implements this mapping makes
it (your homepage) a resource. It doesn't matter how it is stored, it
doesn't matter what software serves it, it doesn't matter how that
software works. As long as the URI always identifies your homepage, as
long as every GET request to that URI for example always returns a
representation of your homepage (possibly representations in a
different format, a different language... but always a representation
of your homepage, of this "something"), then there is such a
"conceptual mapping". (well, that was many words just to explain what I
meant by "implicitly implementing a mapping").

"Extending", perhaps? ;-)
As for wasteful:
- adding the dispatch: looking at the amount of code, I'd consider the
first example to be more wasteful... And the first example also does
"dispatching", even if it is with a simple "if". And any time you want
to do something based on the request method (which would be impossible
not to do in a real REST service), you're going to have to look at
request.method and deciding your actions depending on the value, which
would also constitute "doing dispatching on top of CP's". That's
inevitable. And I prefer dispatching "on top of", to completely
replacing CP's.
- throwing away the arg map: yes, that's wasteful, but if you don't
want the arg map to be possibly "polluted" with POST data, there is (as
far as I could find) no option but to create your own. (and at least,
the decorator would take care of that so the developer needn't worry
about it and do it again, which would also be wasteful)

But well, I was just trying something out, hoping that it could perhaps
be a useful approach to the method dispatching question. I know there
is a ticket 102, which is postponed to CP 2.2, but I don't know exactly
how that would work, and it seems to be not without problems, whatever
they may be. I was just trying to make something "very simple", and
seeing if it would work/be useful. It seemed like a nice approach to
me. But it's OK if you don't like it ;-) Or if you think it doesn't fit
CP (which I was hoping it did).

Steven

Robert Brewer

unread,
Oct 20, 2005, 4:06:02 PM10/20/05
to cherryp...@googlegroups.com
polaar wrote:
> > Then _cphttptools.Request.processRequestHeaders would call
> > that (since I stole the function body from it).
>
> Agree about the opacity rule. But what you say after that sounds like
> CP shouldn't create a paramMap at all, and shouldn't do the keyword
> argument thing, instead leaving it to the developer. Which seems
> rather... er... "uncherrypic"? Do you really want to go that far, or
> did you mean something else?

I meant this: CherryPy uses a mechanism to turn a querystring into a
paramMap. Developers who wish to override CherryPy should have access to
that mechanism. I've felt for a few months now that CherryPy should
itself be an application built on top of useful library functions;
parse_qs would be one of those functions.

> Note that the querystring doesn't have to be opaque to the server, as
> Tim Berners-Lee states here:
> http://www.w3.org/DesignIssues/Axioms.html#Query
> (of course: "server" is a very general term. In this case, it means
> more or less CP + application, but CP could encourage - even enforce -
> the use of this particular format of querystring to its application
> developers.
> At this moment, CP clearly encourages the use of this
> format by assuming it by default to provide the keyword arguments.
> That's a choice. You could choose to be more neutral about it I guess.
> But whether CP application developers are going to like this?)

Looking at CP-as-a-framework, it's not really CP's place to enforce a
given format. It already encourages simply by having a default behavior,
as you say. I think CP should be more neutral about it in its
architecture, not in its default implementation. That is, 99% of CP app
developers will use the default, without knowing there are options. The
1% will have to do a little work to use the options, but that amount of
work can be minimized by being more "neutral" in CP internals.

> But well, I was just trying something out, hoping that it
> could perhaps
> be a useful approach to the method dispatching question. I know there
> is a ticket 102, which is postponed to CP 2.2, but I don't
> know exactly
> how that would work, and it seems to be not without problems, whatever
> they may be. I was just trying to make something "very simple", and
> seeing if it would work/be useful. It seemed like a nice approach to
> me. But it's OK if you don't like it ;-) Or if you think it
> doesn't fit CP (which I was hoping it did).

I think it works. It doesn't "fit" CP 2.1, but CP 2.2 might enable a
better fit.

polaar

unread,
Oct 20, 2005, 4:49:47 PM10/20/05
to cherrypy-devel
Well, this one is easier, at last...

Robert Brewer wrote:
> polaar wrote:
> > > Then _cphttptools.Request.processRequestHeaders would call
> > > that (since I stole the function body from it).
> >
> > Agree about the opacity rule. But what you say after that sounds like
> > CP shouldn't create a paramMap at all, and shouldn't do the keyword
> > argument thing, instead leaving it to the developer. Which seems
> > rather... er... "uncherrypic"? Do you really want to go that far, or
> > did you mean something else?
>
> I meant this: CherryPy uses a mechanism to turn a querystring into a
> paramMap. Developers who wish to override CherryPy should have access to
> that mechanism. I've felt for a few months now that CherryPy should
> itself be an application built on top of useful library functions;
> parse_qs would be one of those functions.

I see now. (I don't know much about CherryPy internals, that's why I
didn't get it I think. You could wonder what I am doing here in
cherrypy-devel ;-))
But it sounds like a good idea.

> > Note that the querystring doesn't have to be opaque to the server, as
> > Tim Berners-Lee states here:
> > http://www.w3.org/DesignIssues/Axioms.html#Query
> > (of course: "server" is a very general term. In this case, it means
> > more or less CP + application, but CP could encourage - even enforce -
> > the use of this particular format of querystring to its application
> > developers.
> > At this moment, CP clearly encourages the use of this
> > format by assuming it by default to provide the keyword arguments.
> > That's a choice. You could choose to be more neutral about it I guess.
> > But whether CP application developers are going to like this?)
>
> Looking at CP-as-a-framework, it's not really CP's place to enforce a
> given format. It already encourages simply by having a default behavior,
> as you say. I think CP should be more neutral about it in its
> architecture, not in its default implementation. That is, 99% of CP app
> developers will use the default, without knowing there are options. The
> 1% will have to do a little work to use the options, but that amount of
> work can be minimized by being more "neutral" in CP internals.

I totally agree.

> > But well, I was just trying something out, hoping that it
> > could perhaps
> > be a useful approach to the method dispatching question. I know there
> > is a ticket 102, which is postponed to CP 2.2, but I don't
> > know exactly
> > how that would work, and it seems to be not without problems, whatever
> > they may be. I was just trying to make something "very simple", and
> > seeing if it would work/be useful. It seemed like a nice approach to
> > me. But it's OK if you don't like it ;-) Or if you think it
> > doesn't fit CP (which I was hoping it did).
>
> I think it works. It doesn't "fit" CP 2.1, but CP 2.2 might enable a
> better fit.

Looking forward to CP 2.2 then. (maybe I'll play around a little more
with the idea, and if it turns into something working/useful, I might
post it as a recipe)

And by the way: thanks for the interesting discussion. (Well, you never
know, it might not be finished yet... But I'll say it now anyway.)

Steven

Reply all
Reply to author
Forward
0 new messages