[Web-SIG] Future of WSGI

1 view
Skip to first unread message

Malthe Borch

unread,
Nov 24, 2009, 3:58:24 AM11/24/09
to web...@python.org
I disagree that the current 1.x track of the WSGI specification [1]
supports Python 3 in any reasonable way. Recently I suggested the
following rule as a guideline [2]:

Strings should be strings, chunks should be bytes.

What this really suggests is that everything that looks and feels like a
human-readable string (almost everything in HTTP except the input
content and the output response) should be a (unicode) string. As I read
the proposed 1.1 revision, this is not the case.

However, there is another fish to fry here too, and I'd like to propose
a new 2.x track altogether. In the outset, this would pertain to Python
3 only.

Instead of passing ``environ`` and violate its contract by adding
'wsgi.*' entries, we must pass in an object which actually represents
the HTTP request, e.g.

Request = namedtuple("Request", "environ input")

There could be other properties of this request-object. I haven't
considered the details.

To consider for this track is also the possibility of changing the
application call signature (I heard this proposal from Daniel Holth, but
it's probably been suggested before):

def __call__(self, request):
return status, headers, app_iter

I don't mind ``start_response`` terribly, but it's worth discussing.
Certainly returning this triple makes things easier.

\malthe

[1] http://bitbucket.org/ianb/wsgi-peps/src/tip/pep-0333.txt
[2] http://mockit.blogspot.com/2009/11/dont-look-back-in-anger.html

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Ian Bicking

unread,
Nov 24, 2009, 12:44:56 PM11/24/09
to Malthe Borch, web...@python.org
Have you read the threads on WSGI 2?  These issues are discussed at length, though they haven't been put into a spec.

The proposal that seemed to work best was to keep the environ as str (i.e., unicode in Python 3), and eliminate the problematic SCRIPT_NAME and PATH_INFO, replacing them with url-encoded values.  Also I think everyone is okay with removing start_response.  All text would be decoded as latin1 on Python 3 (which allows for transcoding; also most text is not unicode).  The request and response body would remain bytes.

Malthe Borch

unread,
Nov 24, 2009, 4:28:31 PM11/24/09
to Ian Bicking, public-web-sig-+ZN...@plane.gmane.org, Malthe Borch

On 11/24/09 6:44 PM, Ian Bicking wrote:
> Have you read the threads on WSGI 2? These issues are discussed at
> length, though they haven't been put into a spec.

Okay, sounds good. I have tried to follow up on the discussion, but
there's just too much noise to find out what the consensus is.

> The proposal that seemed to work best was to keep the environ as str
> (i.e., unicode in Python 3), and eliminate the problematic SCRIPT_NAME
> and PATH_INFO, replacing them with url-encoded values. Also I think
> everyone is okay with removing start_response. All text would be
> decoded as latin1 on Python 3 (which allows for transcoding; also most
> text is not unicode). The request and response body would remain bytes.

I assume with "all text" you mean all header text, e.g. all header values.

Can we talk briefly then about wsgi.*? I think we should eliminate them
and in their place put a real request object, something very basic that
has only what's absolutely necessary to communicate the essential data
from the low-level HTTP request.

There is no way that the environment can express an HTTP request. This
was a mistake in my view and we should rectify it either in 1.1 or 2.0.

\malthe

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Ian Bicking

unread,
Nov 24, 2009, 4:35:11 PM11/24/09
to Malthe Borch, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch
On Tue, Nov 24, 2009 at 3:28 PM, Malthe Borch <mbo...@gmail.com> wrote:

The proposal that seemed to work best was to keep the environ as str
(i.e., unicode in Python 3), and eliminate the problematic SCRIPT_NAME
and PATH_INFO, replacing them with url-encoded values.  Also I think
everyone is okay with removing start_response.  All text would be
decoded as latin1 on Python 3 (which allows for transcoding; also most
text is not unicode).  The request and response body would remain bytes.

I assume with "all text" you mean all header text, e.g. all header values.

All the things that are specified to be str, would stay str in Python 3.  This includes all keys, headers, and stuff like wsgi.url_scheme.
 
Can we talk briefly then about wsgi.*? I think we should eliminate them and in their place put a real request object, something very basic that has only what's absolutely necessary to communicate the essential data from the low-level HTTP request.

There is no way that the environment can express an HTTP request. This was a mistake in my view and we should rectify it either in 1.1 or 2.0.

I'm not aware of any problems with representing the request with a dictionary.  Can you give examples?

 

Malthe Borch

unread,
Nov 24, 2009, 4:40:03 PM11/24/09
to Ian Bicking, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch

2009/11/24 Ian Bicking <ia...@colorstudy.com>:


> All the things that are specified to be str, would stay str in Python 3.

Gotcha.

> I'm not aware of any problems with representing the request with a
> dictionary.  Can you give examples?

The body stream is not part of the HTTP environment. It's an abuse and
it has the very negative effect of luring developers into further
abuse.

Ian Bicking

unread,
Nov 24, 2009, 4:43:46 PM11/24/09
to Malthe Borch, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch
On Tue, Nov 24, 2009 at 3:40 PM, Malthe Borch <mbo...@gmail.com> wrote:
> I'm not aware of any problems with representing the request with a
> dictionary.  Can you give examples?

The body stream is not part of the HTTP environment. It's an abuse and
it has the very negative effect of luring developers into further
abuse.
 
You mean specifically environ['wsgi.input'] ?  While the file-like interface is difficult, other possible interfaces aren't so great either.  As to putting the request body in the environment, I don't know what the problem is?  Or are you just concerned that people put arbitrary things in the environ?  There's far too many important use cases that are satisfied by the extensible nature of the environ to give it up just because some people believe it is overused.

Malthe Borch

unread,
Nov 24, 2009, 4:50:00 PM11/24/09
to Ian Bicking, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch

2009/11/24 Ian Bicking <ia...@colorstudy.com>:


> You mean specifically environ['wsgi.input'] ?  While the file-like interface
> is difficult, other possible interfaces aren't so great either.  As to
> putting the request body in the environment, I don't know what the problem
> is?  Or are you just concerned that people put arbitrary things in the
> environ?  There's far too many important use cases that are satisfied by the
> extensible nature of the environ to give it up just because some people
> believe it is overused.

How people use or abuse software is not our concern; but the standard
library should not itself abuse its own abstractions.

The file-like (stream) interface is fine, but it must not live in the
HTTP environment. I don't know of any other languages that mix the two
(Perl's CGI.pm does, but that's another matter).

Rather, what we need a request object. Don't think WebOb or
ZPublisher. This is just a decoder for the socket response.

It's quite symmetric:

Request = namedtuple("Request", "environ body")
Response = namedtuple("Response", "status headers iterable")

Iterable might be "body" or "chunks" or some other term.

Ian Bicking

unread,
Nov 24, 2009, 4:51:43 PM11/24/09
to Malthe Borch, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch
On Tue, Nov 24, 2009 at 3:50 PM, Malthe Borch <mbo...@gmail.com> wrote:
2009/11/24 Ian Bicking <ia...@colorstudy.com>:
> You mean specifically environ['wsgi.input'] ?  While the file-like interface
> is difficult, other possible interfaces aren't so great either.  As to
> putting the request body in the environment, I don't know what the problem
> is?  Or are you just concerned that people put arbitrary things in the
> environ?  There's far too many important use cases that are satisfied by the
> extensible nature of the environ to give it up just because some people
> believe it is overused.

How people use or abuse software is not our concern; but the standard
library should not itself abuse its own abstractions.

The file-like (stream) interface is fine, but it must not live in the
HTTP environment. I don't know of any other languages that mix the two
(Perl's CGI.pm does, but that's another matter).

Why does this matter?

Ian Bicking

unread,
Nov 24, 2009, 5:30:20 PM11/24/09
to Sylvain Hellegouarch, web...@python.org
On Tue, Nov 24, 2009 at 4:16 PM, Sylvain Hellegouarch <s...@defuze.org> wrote:
I'm not aware of any problems with representing the request with a dictionary.  Can you give examples?

Though it shouldn't be considered as a problem, the fact that probably no existing framework actually use the raw dictionary (there is, in almost all cases, a wrapping into a friendlier object), one might wonder why keeping such a low level interface rather than directly provide a higher level interface is a good idea. After all creating those dictionaries for no good reason aside from sending them to the next layer which will map them into a WebOb, a yaro, a cherrypy request, or zope request, etc. seems slightly pointless (I'm not versed into Python internals, but doesn't it have also a cost of creating rather useless objects repeatedly like that?) I know WSGI tries hard not to force into one implementation but still...

Well, that's hardly a trivial requirement, nor a trivial accomplishment.  Also the dictionary is a complete and inspectable representation of the environment, divorced from any possible trickery on the part of frameworks.  It's a common gateway between servers and frameworks, and can be used as a gateway between middleware and applications.  And it's really fairly common for middleware to use the raw dictionary without any object involved.

Sylvain Hellegouarch

unread,
Nov 24, 2009, 5:16:05 PM11/24/09
to Ian Bicking, web...@python.org

>
> I'm not aware of any problems with representing the request with a
> dictionary. Can you give examples?

Though it shouldn't be considered as a problem, the fact that probably

no existing framework actually use the raw dictionary (there is, in
almost all cases, a wrapping into a friendlier object), one might wonder
why keeping such a low level interface rather than directly provide a
higher level interface is a good idea. After all creating those
dictionaries for no good reason aside from sending them to the next
layer which will map them into a WebOb, a yaro, a cherrypy request, or
zope request, etc. seems slightly pointless (I'm not versed into Python
internals, but doesn't it have also a cost of creating rather useless
objects repeatedly like that?) I know WSGI tries hard not to force into
one implementation but still...

- Sylvain

Henry Precheur

unread,
Nov 24, 2009, 5:47:53 PM11/24/09
to Malthe Borch, web...@python.org
On Tue, Nov 24, 2009 at 11:36:57PM +0100, Malthe Borch wrote:
> 2009/11/24 Henry Precheur <he...@precheur.org>:
> > (See http://tools.ietf.org/html/rfc2616#section-5)
> >
> > The request body, the request method (GET, POST, ...), the request URL,
> > the HTTP version are all in `environ`.
>
> That reference does not mention the environment. It's not an official
> term.

Are you talking about PEP-333 or RFC 2616?

> > namedtuple is Python 2.6+: WSGI can't use it. WSGI must work w/ older
> > versions of Python.
>
> It was meant as illustration, but sure.

Then what? Your proposal doesn't work. So let's forget about it and
stick to dict?

--
Henry Prêcheur

Malthe Borch

unread,
Nov 24, 2009, 5:57:08 PM11/24/09
to Henry Precheur, web...@python.org
2009/11/24 Henry Precheur <he...@precheur.org>:

> Are you talking about PEP-333 or RFC 2616?

RFC 2616, which you linked to.

> Then what? Your proposal doesn't work. So let's forget about it and
> stick to dict?

class Request(object):
...

class Response(object):
...

Now, what do you mean by "let's forget about it". Maybe what you want
to say is: I'll forget about it and stick to dict – because that's
what you know how to? I mean this congenially; but please don't
patronize.

\malthe

Henry Precheur

unread,
Nov 24, 2009, 6:00:06 PM11/24/09
to Sylvain Hellegouarch, web...@python.org
On Tue, Nov 24, 2009 at 11:16:05PM +0100, Sylvain Hellegouarch wrote:
> Though it shouldn't be considered as a problem, the fact that probably
> no existing framework actually use the raw dictionary (there is, in
> almost all cases, a wrapping into a friendlier object), one might wonder
> why keeping such a low level interface rather than directly provide a
> higher level interface is a good idea. After all creating those
> dictionaries for no good reason aside from sending them to the next
> layer which will map them into a WebOb, a yaro, a cherrypy request, or
> zope request, etc. seems slightly pointless

1. Would you say that POSIX is useless because there are lots of
libraries and applications build on top of it? Why not implement
those libraries and applications directly without using POSIX?

2. Guess what: WebOb, Werkzeug, Yaro, Django, CherryPy, and the others
have a different interfaces for their Request/Response objects.
Because for Request/Response there's hardly one-size fits all.
There's certainly some common ground, but every framework has
different needs.

> (I'm not versed into Python internals, but doesn't it have also a cost
> of creating rather useless objects repeatedly like that?)

The dictionary is passed as a reference like every Python objects. So it
doesn't cost anything to use it instead of an object.

--
Henry Prêcheur

Henry Precheur

unread,
Nov 24, 2009, 6:18:39 PM11/24/09
to Malthe Borch, web...@python.org
On Tue, Nov 24, 2009 at 11:57:08PM +0100, Malthe Borch wrote:
> 2009/11/24 Henry Precheur <he...@precheur.org>:
> > Are you talking about PEP-333 or RFC 2616?
>
> RFC 2616, which you linked to.

Environment is not an 'official' term in RFC 2616, because it's about
HTTP, not WSGI.

> > Then what? Your proposal doesn't work. So let's forget about it and
> > stick to dict?
>
> class Request(object):
> ...
>
> class Response(object):
> ...

Please replace '...' with actual code or at least some description of
what it's doing. Lots of people have been trying to define a nice
interface for these objects for YEARS. People who know a great deal
about HTTP, and Python. And yet there's not a single implementation
that's widely accepted as the "best of breed".

--
Henry Prêcheur

Malthe Borch

unread,
Nov 24, 2009, 6:29:08 PM11/24/09
to Henry Precheur, web...@python.org
2009/11/25 Henry Precheur <he...@precheur.org>:

> Please replace '...' with actual code or at least some description of
> what it's doing. Lots of people have been trying to define a nice
> interface for these objects for YEARS. People who know a great deal
> about HTTP, and Python. And yet there's not a single implementation
> that's widely accepted as the "best of breed".

class Request(object):
def __init__(self, stream):
self.environ = read_headers_until_crlf(stream)
self.stream = stream

These headers are then "general-header", "request-header",
"entity-header". The stream is what remains.

Ian argues that the stream is part of the environment since
``CONTENT_LENGTH`` is there. However, it is not always there. It is to
be understood as a hint.

Why is this a good separation? For two reasons:

1) Everybody else does it;
2) This stream should be handled carefully throughout the WSGI
pipeline. Keeping it as a separate property helps to make this point
clear.

As an alternative to a trivial request class, I propose:

(environ, stream, [start_response])

(It seems ``start_response`` might go out altogether in a revised
specification in favor of returning a response tuple over an app
iterable).

\malthe

Sylvain Hellegouarch

unread,
Nov 25, 2009, 2:51:22 AM11/25/09
to Henry Precheur, web...@python.org
Henry Precheur a écrit :

> On Tue, Nov 24, 2009 at 11:16:05PM +0100, Sylvain Hellegouarch wrote:
>
>> Though it shouldn't be considered as a problem, the fact that probably
>> no existing framework actually use the raw dictionary (there is, in
>> almost all cases, a wrapping into a friendlier object), one might wonder
>> why keeping such a low level interface rather than directly provide a
>> higher level interface is a good idea. After all creating those
>> dictionaries for no good reason aside from sending them to the next
>> layer which will map them into a WebOb, a yaro, a cherrypy request, or
>> zope request, etc. seems slightly pointless
>>
>
> 1. Would you say that POSIX is useless because there are lots of
> libraries and applications build on top of it? Why not implement
> those libraries and applications directly without using POSIX?
>

If I'm not mistaken that's what people do when they want performances
rather than portability. But point taken.

> 2. Guess what: WebOb, Werkzeug, Yaro, Django, CherryPy, and the others
> have a different interfaces for their Request/Response objects.
> Because for Request/Response there's hardly one-size fits all.
> There's certainly some common ground, but every framework has
> different needs.
>

Well thank you for the reminder but I kind of knew that ;)
It doesn't mean it's neither elegant nor efficient to create such a low
level object.

>
>> (I'm not versed into Python internals, but doesn't it have also a cost
>> of creating rather useless objects repeatedly like that?)
>>
>
> The dictionary is passed as a reference like every Python objects. So it
> doesn't cost anything to use it instead of an object.
>
>

I talked about object creation not object passing.

- Sylvain

Sylvain Hellegouarch

unread,
Nov 25, 2009, 2:56:43 AM11/25/09
to Henry Precheur, web...@python.org

>
> Please replace '...' with actual code or at least some description of
> what it's doing. Lots of people have been trying to define a nice
> interface for these objects for YEARS. People who know a great deal
> about HTTP, and Python. And yet there's not a single implementation
> that's widely accepted as the "best of breed".
>
>
Mostly because no such discussion ever took place. Everyone does its own
recipe and yet most request interface actually offer a very similar API.

If it wasn't for WSGI, most frameworks wouldn't even talk to each other
yet. But since it's the time of "what could be improved", it seemed
right to suggest to do better there too. Now I don't have a proposal so
I wouldn't be upset if the community simply says no (I can understand
Ian's point in an earlier response) but the question is rather valid
nonetheless.

- Sylvain

Sylvain Hellegouarch

unread,
Nov 25, 2009, 3:40:11 AM11/25/09
to Malthe Borch, web...@python.org

> 2009/11/25 Henry Precheur <he...@precheur.org>:
>> Please replace '...' with actual code or at least some description of
>> what it's doing. Lots of people have been trying to define a nice
>> interface for these objects for YEARS. People who know a great deal
>> about HTTP, and Python. And yet there's not a single implementation
>> that's widely accepted as the "best of breed".
>
> class Request(object):
> def __init__(self, stream):
> self.environ = read_headers_until_crlf(stream)
> self.stream = stream
>
> These headers are then "general-header", "request-header",
> "entity-header". The stream is what remains.
>

Personally, I would favor the idea that WSGI2 specifies the way headers
should be mapped to object attributes (e.g. Content-Type would become
content_type) and then let duck typing magic happen rather than specifying
a class from which to inherit for instance.

Instead of a dictionary, you'd provide an object that maps headers and a
few other attributes accordingly.

But again, it's just wishful thinking ;)

- Sylvain
--
Sylvain Hellegouarch
http://www.defuze.org

Chris Dent

unread,
Nov 25, 2009, 6:42:15 AM11/25/09
to web...@python.org
On Nov 24, 2009, at 10:16 PM, Sylvain Hellegouarch wrote:

> Though it shouldn't be considered as a problem, the fact that
> probably no existing framework actually use the raw dictionary
> (there is, in almost all cases, a wrapping into a friendlier
> object), one might wonder why keeping such a low level interface
> rather than directly provide a higher level interface is a good
> idea. After all creating those dictionaries for no good reason aside
> from sending them to the next layer which will map them into a
> WebOb, a yaro, a cherrypy request, or zope request, etc. seems
> slightly pointless (I'm not versed into Python internals, but
> doesn't it have also a cost of creating rather useless objects
> repeatedly like that?) I know WSGI tries hard not to force into one
> implementation but still...

I use the raw dictionary in TiddlyWeb <http://tiddlyweb.com/>, and I
like it that way. I've resisted using existing frameworks exactly
because they _do_ package the WSGI stuff up in what I perceive to be
complex classes that obscure the overall flexibility and transparency
presented by the WSGI dict.

I prefer middleware that just uses the dict that it gets, maybe making
a few tweaks or additions and then passes it along to the next layer.
This behavior is _exactly_ what makes WSGI great and useful.

Beyond that I just have a knee jerk reaction to the creation of
gratuitous classes for something that is essentially a (very flexible)
data structure.

I can (barely) relate to some of the complaints that start_response is
a pain in the ass, but environ, to me, is not broken.

On start_response, I find that I can mess with it (replacing it with
something else, usually) before I go deeper into a stack of WSGI
applications is sometimes useful, so I would be disappointed if I lost
that option.

P.J. Eby

unread,
Nov 25, 2009, 10:41:24 AM11/25/09
to Chris Dent, web...@python.org
At 11:42 AM 11/25/2009 +0000, Chris Dent wrote:
>On start_response, I find that I can mess with it (replacing it with
>something else, usually) before I go deeper into a stack of WSGI
>applications is sometimes useful, so I would be disappointed if I lost
>that option.

Note that in the WSGI 2 calling protocol, you would simply modify
your return values, rather than needing to create a function and pass
it down the call chain.

Robert Brewer

unread,
Nov 25, 2009, 11:54:05 AM11/25/09
to s...@defuze.org, Malthe Borch, web...@python.org
Sylvain Hellegouarch wrote:
> Personally, I would favor the idea that WSGI2 specifies the way
headers
> should be mapped to object attributes (e.g. Content-Type would become
> content_type) and then let duck typing magic happen rather than
> specifying a class from which to inherit for instance.

How would you handle HTTP extension headers like
X-MyEnterprise-Metadata?

Cook [1] might be appropriate to read here: "...abstract data types
facilitate adding new operations, while [objects] facilitate adding new
representations... Abstract data types define operations that collect
together the behaviors for a given action. Objects organize the matrix
the other way, collecting together all the actions associated with a
given representation. It is easier to add new operations in an ADT, and
new representations using objects."

IMO, it's quite appropriate that we essentially use an ADT (a dict) at
the lowest level, precisely because it constrains the representation.
This is the essence of The Zen of CherryPy #8 "Subclassed builtins are
better than custom types" (really, custom _classes_) and #9 "But builtin
types are even better". People can then objectify those ADTs to their
representational taste.


Robert Brewer
fuma...@aminus.org

[1] http://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf
[2]
http://www.cherrypy.org/wiki/ZenOfCherryPy#a8.Subclassedbuiltinsarebette
rthancustomtypes.

Sylvain Hellegouarch

unread,
Nov 25, 2009, 12:30:18 PM11/25/09
to Robert Brewer, web...@python.org
Robert Brewer a écrit :

> Sylvain Hellegouarch wrote:
>
>> Personally, I would favor the idea that WSGI2 specifies the way
>>
> headers
>
>> should be mapped to object attributes (e.g. Content-Type would become
>> content_type) and then let duck typing magic happen rather than
>> specifying a class from which to inherit for instance.
>>
>
> How would you handle HTTP extension headers like
> X-MyEnterprise-Metadata?
>

x_myenteprise_metadata

Now I get this only makes sense where the header is valid as a Python
identifier so more limited than a dict key for sure.

> Cook [1] might be appropriate to read here: "...abstract data types
> facilitate adding new operations, while [objects] facilitate adding new
> representations... Abstract data types define operations that collect
> together the behaviors for a given action. Objects organize the matrix
> the other way, collecting together all the actions associated with a
> given representation. It is easier to add new operations in an ADT, and
> new representations using objects."
>

But that's my point, we discuss request representation, not behavior.

> IMO, it's quite appropriate that we essentially use an ADT (a dict) at
> the lowest level, precisely because it constrains the representation.
> This is the essence of The Zen of CherryPy #8 "Subclassed builtins are
> better than custom types" (really, custom _classes_) and #9 "But builtin
> types are even better". People can then objectify those ADTs to their
> representational taste.
>

That's fine but looks redundant eventually in my opinion.

- Sylvain

Aaron Watters

unread,
Nov 25, 2009, 2:50:22 PM11/25/09
to web...@python.org, Chris Dent

--- On Wed, 11/25/09, Chris Dent <chris...@gmail.com> wrote:

> From: Chris Dent <chris...@gmail.com>
> I can (barely) relate to some of the complaints that
> start_response is a pain in the ass, but environ, to me, is
> not broken.

I agree. It maps nicely onto the underlying protocol
and WSGI is supposed to be low level right?

The biggest problem with start_response is that after
you evaluate

iterable = application(env, start_response)

Sometimes the start_response has been called and sometimes
it hasn't, and this can break middlewares when they haven't
been tested both ways (repose.who for example seems to
assume it has been called).

By the way, I created a little interface for archiving
notes about wsgi2 here

http://listtree.appspot.com/wsgi2

To add to it you need to fill in a captcha and use the
password "wsgi". I thought I announced this to web-sig
yesterday, but apparently I messed up my reply-to.

If you like, please add something there. I'd be delighted
if you did. I think it might be an interface that is easier
to "scan" than a million emails.
-- Aaron Watters

Malthe Borch

unread,
Nov 25, 2009, 3:00:58 PM11/25/09
to Aaron Watters, web...@python.org
2009/11/25 Aaron Watters <arw...@yahoo.com>:

>> From: Chris Dent <chris...@gmail.com>
>> I can (barely) relate to some of the complaints that
>> start_response is a pain in the ass, but environ, to me, is
>> not broken.
>
> I agree.  It maps nicely onto the underlying protocol
> and WSGI is supposed to be low level right?

It's not ``environ`` which is broken, it is the "special" entries like
wsgi.input and wsgi.multithread.

That's because I equate ``environ`` with the request headers. It may
be wrong. But if ``environ`` reflects the entire request and not just
the headers, why is it then not called ``request``.

Can we really talk about the request stream as the "environment" in
which the request must be answered? Or is it in fact, the "payload".

\malthe

Tres Seaver

unread,
Nov 25, 2009, 3:03:26 PM11/25/09
to web...@python.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Watters wrote:
>
> --- On Wed, 11/25/09, Chris Dent <chris...@gmail.com> wrote:
>
>> From: Chris Dent <chris...@gmail.com>
>> I can (barely) relate to some of the complaints that
>> start_response is a pain in the ass, but environ, to me, is
>> not broken.
>
> I agree. It maps nicely onto the underlying protocol
> and WSGI is supposed to be low level right?
>
> The biggest problem with start_response is that after
> you evaluate
>
> iterable = application(env, start_response)
>
> Sometimes the start_response has been called and sometimes
> it hasn't, and this can break middlewares when they haven't
> been tested both ways (repose.who for example seems to
> assume it has been called).

Since version 1.0.13 (2009-04-24), repoze.who's middleware is very
careful to dance around the fact that an application is not required to
have called 'start_response' on return, but *must* call it before
returning the first chunk from its iterator. That bit of flexibility in
PEP 333 is likely there to support *some* use case, but it makes
'start_response' a *big* pain to work with in middleware which needs to
to "egress" processing of headers.


Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tse...@palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksNjYkACgkQ+gerLs4ltQ6M4ACgj8Ist6sCLUgJ/BrAlXP0QlN4
OEMAnjuWY0NEK+3IKc8igdaJx0wrlNqy
=ncqc
-----END PGP SIGNATURE-----

Tres Seaver

unread,
Nov 25, 2009, 3:08:40 PM11/25/09
to web...@python.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Malthe Borch wrote:
> 2009/11/25 Aaron Watters <arw...@yahoo.com>:
>>> From: Chris Dent <chris...@gmail.com>
>>> I can (barely) relate to some of the complaints that
>>> start_response is a pain in the ass, but environ, to me, is
>>> not broken.
>> I agree. It maps nicely onto the underlying protocol
>> and WSGI is supposed to be low level right?
>
> It's not ``environ`` which is broken, it is the "special" entries like
> wsgi.input and wsgi.multithread.

How about 'PATH_INFO', 'SCRIPT_NAME', etc: none of those are headers.
Please re-read PEP 333[1] for the rationale.

> That's because I equate ``environ`` with the request headers. It may
> be wrong.

You are: the environ is modeled on the CGI environment, which has lots
more stuff in it than headers.

> But if ``environ`` reflects the entire request and not just
> the headers, why is it then not called ``request``.

Because it is a dictionary like the one passed to CGI applications, ot a
"request object". The looseness of a dict is part of why WSGI works for
interoperability.


[1] http://www.python.org/dev/peps/pep-0333/#id17


Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tse...@palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksNjscACgkQ+gerLs4ltQ5SXQCfQeRgnX6OUL+2d3vU7LQmqRoK
fS0AoK+fPxXi9BYEqQw+UI9y7/OK3trV
=mHsV
-----END PGP SIGNATURE-----

Ian Bicking

unread,
Nov 25, 2009, 3:48:30 PM11/25/09
to Tres Seaver, web...@python.org
On Wed, Nov 25, 2009 at 2:03 PM, Tres Seaver <tse...@palladion.com> wrote:
Aaron Watters wrote:
>
> --- On Wed, 11/25/09, Chris Dent <chris...@gmail.com> wrote:
>
>> From: Chris Dent <chris...@gmail.com>
>> I can (barely) relate to some of the complaints that
>> start_response is a pain in the ass, but environ, to me, is
>> not broken.
>
> I agree.  It maps nicely onto the underlying protocol
> and WSGI is supposed to be low level right?
>
> The biggest problem with start_response is that after
> you evaluate
>
>     iterable = application(env, start_response)
>
> Sometimes the start_response has been called and sometimes
> it hasn't, and this can break middlewares when they haven't
> been tested both ways (repose.who for example seems to
> assume it has been called).

Since version 1.0.13 (2009-04-24), repoze.who's middleware is very
careful to dance around the fact that an application is not required to
have called 'start_response' on return, but *must* call it before
returning the first chunk from its iterator.  That bit of flexibility in
 PEP 333 is likely there to support *some* use case, but it makes
'start_response' a *big* pain to work with in middleware which needs to
to "egress" processing of headers.

Just in terms of history, I think I'm to blame on this one, as I argued quite vigorously for start_response.  The reason being that at the time frameworks that had a concept of "streaming" usually did it by writing to the response.  While the names were different depending on the framework, this was the common way to do streaming:

def file_app(req):
    filename = ...
    req.response.setHeader('Content-Type', mimetypes.guess_type(os.path.splitext(filename)[1])[0])
    # I believe most did not stream by default...
    req.response.stream()
    fp = open(filename, 'rb')
    while 1:
        chunk = fp.read(4096)
        if not chunk: break
        req.response.write(chunk)

To support that style of streaming start_response was added.  I think PJE also had some notion of Comet-style interactions, and maybe something related to async, leading to the specific restrictions on how written content should be handled.  I still don't entirely understand the use case underlying that.  But anyway, that's some of the motivation.  start_response is still useful for retrofitting support for frameworks from time to time, but all the modern frameworks work differently these days making start_response seem less necessary.

And Clover

unread,
Nov 26, 2009, 8:06:32 AM11/26/09
to Web SIG
Ian Bicking wrote:

> The proposal that seemed to work best was to keep the environ as str (i.e.,
> unicode in Python 3), and eliminate the problematic SCRIPT_NAME and
> PATH_INFO, replacing them with url-encoded values.

Ah, OK, if that's where we got to I'm happy with that - as long as the
application/framework can tell the difference between (a) old-school
WSGI 1.0 decoded PATH_INFO, (b) new verbatim PATH_INFO, and (c) a new
verbatim PATH_INFO that has been created from an old PATH_INFO by a WSGI
handler unfortunate enough to be running under CGI or IIS, potentially
including mangled characters. I would prefer to avoid the latter completely.

This could be achieved by giving the new variables a different name and
only including them if they're safe (leaving the application to fall
back to the old variables where unavailable), or by having a flag to
specify they're verbatim and leaving it unset when unmangled verbatim is
unavailable.

> Also I think everyone is okay with removing start_response.

+0.5: very much happy to see it gone, but if it causes any more delay to
a WSGI update I'm also not unhappy if it stays. My primary concern is
that a Python-3-compatible WSGI is available as soon as possible; every
long argument in here seems to lead to no resolution. I want to release
Python 3 web code, and cannot whilst WSGI remains in flux.

Whilst in principle I kind of agree with Malthe that keeping the
CGI-derived environ separate from items like wsgi.input would be
appropriate, in practice I don't give a stuff about it: the merged
dictionary causes no practical problems, and changing it would be an
enormous upheaval for all WSGI users.

WSGI doesn't need to be pretty, it needs to be widely-compatible.
Authors who want pretty can use frameworks, which will be happy to
deliver elegant Request and Response objects.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

Malthe Borch

unread,
Nov 26, 2009, 9:54:51 AM11/26/09
to And Clover, Web SIG
2009/11/26 And Clover <and...@doxdesk.com>:

> Whilst in principle I kind of agree with Malthe that keeping the CGI-derived
> environ separate from items like wsgi.input would be appropriate, in
> practice I don't give a stuff about it: the merged dictionary causes no
> practical problems, and changing it would be an enormous upheaval for all
> WSGI users.

It will anyway (moving to Python 3).

> WSGI doesn't need to be pretty, it needs to be widely-compatible. Authors
> who want pretty can use frameworks, which will be happy to deliver elegant
> Request and Response objects.

It's not about pretty more than PEP 8 is about pretty (hint: it's not).

\malthe

Graham Dumpleton

unread,
Nov 26, 2009, 4:59:50 PM11/26/09
to And Clover, Web SIG
2009/11/27 And Clover <and...@doxdesk.com>:

> Ian Bicking wrote:
>
>> The proposal that seemed to work best was to keep the environ as str
>> (i.e.,
>> unicode in Python 3), and eliminate the problematic SCRIPT_NAME and
>> PATH_INFO, replacing them with url-encoded values.
>
> Ah, OK, if that's where we got to I'm happy with that - as long as the
> application/framework can tell the difference between (a) old-school WSGI
> 1.0 decoded PATH_INFO, (b) new verbatim PATH_INFO, and (c) a new verbatim
> PATH_INFO that has been created from an old PATH_INFO by a WSGI handler
> unfortunate enough to be running under CGI or IIS, potentially including
> mangled characters. I would prefer to avoid the latter completely.

I was determined to stay out of this conversation, as don't
particularly care anymore, but want to set the record straight.

What Ian was describing was just one of a few proposals which were put
up about additional changes to WSGI on top of what is already the
defacto WSGI 1.X definition for Python 3.X as defined by existing
practice in the form of wsgiref in Python 3.1 and mod_wsgi 3.0 (as
implemented for more than a year, and recently released officially due
to being fed up with waiting). One of the other major WSGI servers
also is implementing that defacto WSGI 1.X definition for Python 3.X.
That WSGI server hasn't as far as I know been officially released, so
will leave it up to author to comment on whether they are still
intending it to release it that way.

Anyway, Ian's proposal just so happened to be the last one which was
discussed. Just like the other proposals there were issues with it and
not everyone necessarily agreed. Note that lack of response doesn't
mean consent, and frankly various people were quite tired of the
discussion at that point and various people whose opinions would be
important to know had dropped out of the discussion.

I would be even further disappointed in Python WEB-SIG if that last
proposal now simply got rubber stamped purely because it was the last
proposal anyone remembered, without some critical study done on
whether it is even practical to implement in a reliable way across
hosting mechanisms which don't have direct access to the actual
processing of request headers and in particular the decoding of the
original REQUEST_URI into SCRIPT_NAME and PATH_INFO. Specifically, in
relation to the inability or potential difficulty in such hosting
mechanisms to extract out from REQUEST_URI the original parts which
mapped to the final SCRIPT_NAME and PATH_INFO.

No matter what you all end up deciding to do, and whether or not
start_response() is dropped, any new specification will have to be
called WSGI 2.0 due to the existence of the defacto WSGI 1.X
definition for Python 3.X.

Graham

P.J. Eby

unread,
Nov 26, 2009, 8:48:06 PM11/26/09
to Graham Dumpleton, And Clover, Web SIG
At 08:59 AM 11/27/2009 +1100, Graham Dumpleton wrote:
>Just like the other proposals there were issues with it and
>not everyone necessarily agreed.

True, but it also (mostly) reflected the discussions prior to that
point, striking me as a fairly good compromise. I wasn't totally
keen on the special URI handling, but was willing to accept it in
order to just get something done. While it might be meaningful to
have some critical study on the URI handling, there is also the issue
that this might sit around for another year or two with no action.

Sometimes, the time when everyone is tired of arguing is *precisely*
the time to push forward and actually get something done. ;-)

Malthe Borch

unread,
Nov 24, 2009, 5:08:41 PM11/24/09
to Ian Bicking, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch

2009/11/24 Ian Bicking <ia...@colorstudy.com>:
> Why does this matter?

It's all convention, but the CGI interpretation was to read the HTTP
request line by line until a blank line came and that was the
environment. Everything after that is the body.

If you want to obtain a shorter call signature – e.g. (environ,
start_response) instead of (environ, body, start_response), that's
fine; but maybe this should be a decorator.

You could take this argument further and do this in two steps (leaving
out ``start_response`` in the following):

(stream,) => (cgi_environ, body) => (hybrid_environ,)

That would preserve all information.

Why does it matter? This is the single most difficult question to
answer in software design because it's a matter of balance. On the one
hand we strive to find the best abstractions to express our problems
which will eventually be serialized into one or more tracks of
assembler code. On the other, we must be pragmatic and stop our quest
in time to still get things done in reasonable time.

I'm not sure the balance is in favor of the hybrid model; when you
google "environ http" you don't see a lot of body input stream in
there. You don't see "multi-threading" in there either; however, in
the WSGI environment, you do! We just put it there, because we don't
know where else to put it! Unable to find or respect the abtractions,
we are lucky to have Python's versatile dictionary. The downside:
bitrot.

Henry Precheur

unread,
Nov 24, 2009, 5:25:07 PM11/24/09
to Malthe Borch, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch


On Tue, Nov 24, 2009 at 10:50:00PM +0100, Malthe Borch wrote:
> How people use or abuse software is not our concern; but the standard
> library should not itself abuse its own abstractions.

Your assumption is that `environ` == HTTP headers. That's simply NOT the
case. A request is:
- A request line
- Some headers
- A body

(See http://tools.ietf.org/html/rfc2616#section-5)

The request body, the request method (GET, POST, ...), the request URL,
the HTTP version are all in `environ`.

If you really want to separate the headers from the rest you would put
another dictionary containing the headers inside `environ`. Instead WSGI
puts the headers prefixed with HTTP_ in `environ`, because that's what
CGI is doing. It might not be 100% clean, or logic, but it's SIMPLER,
there's no need to deal with nested dictionaries or other more complex
structure, and it's extensible.

> Request = namedtuple("Request", "environ body")
> Response = namedtuple("Response", "status headers iterable")
>
> Iterable might be "body" or "chunks" or some other term.

namedtuple is Python 2.6+: WSGI can't use it. WSGI must work w/ older
versions of Python.

--
Henry Prêcheur

Malthe Borch

unread,
Nov 24, 2009, 5:36:57 PM11/24/09
to Henry Precheur, public-web-sig-+ZN...@plane.gmane.org, Ian Bicking, Malthe Borch

2009/11/24 Henry Precheur <he...@precheur.org>:


> (See http://tools.ietf.org/html/rfc2616#section-5)
>
> The request body, the request method (GET, POST, ...), the request URL,
> the HTTP version are all in `environ`.

That reference does not mention the environment. It's not an official term.

> If you really want to separate the headers from the rest you would put


> another dictionary containing the headers inside `environ`. Instead WSGI
> puts the headers prefixed with HTTP_ in `environ`, because that's what
> CGI is doing. It might not be 100% clean, or logic, but it's SIMPLER,
> there's no need to deal with nested dictionaries or other more complex
> structure, and it's extensible.

I don't mind those. CGI does it too, like you say.

> namedtuple is Python 2.6+: WSGI can't use it. WSGI must work w/ older
> versions of Python.

It was meant as illustration, but sure.

\malthe

Tres Seaver

unread,
Dec 27, 2009, 9:26:14 AM12/27/09
to web...@python.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Malthe Borch wrote:
>
> 2009/11/24 Ian Bicking <ia...@colorstudy.com>:
>> Why does this matter?
>
> It's all convention, but the CGI interpretation was to read the HTTP
> request line by line until a blank line came and that was the
> environment. Everything after that is the body.

"Headers", not environment: the CGI environment is literally the
os.environ set up by the CGI parent process before forking and execing
the script.

Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tse...@palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAks3boYACgkQ+gerLs4ltQ5coACg0ijXgG1wy1BdNnPzN2Jm2FLG
1R0Anj0/o6zwjtatFERoQ2HS3BOgyVEA
=RhAH
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages