Re: [Web-SIG] Are you going to convert Pylons code into Python 3000?

0 views
Skip to first unread message

Martijn Faassen

unread,
Mar 4, 2008, 9:13:19 PM3/4/08
to Graham Dumpleton, pylons-discuss, Web SIG
Hey,

On Wed, Mar 5, 2008 at 1:48 AM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
[snip]
> In the case of code which directly talks to the interface defined by
> WSGI specification I very much doubt the py2to3 script will help. This
> is because for WSGI to work with Python 3.0 there needs to be a change
> from use of string type objects to byte string type objects. I would
> suspect that py2to3 is only get help in any sort of automated way with
> the fact that a string object becomes unicode aware, not where with
> WSGI the code would have to change to use and deal with a different
> type of object completely. The implications of this change to a byte
> string type object are going to be much more complicated.

I have no idea what the capabilities of this script are. I would
*imagine* it would convert classic strings into the bytes types, and
unicode strings into the new string type.

> What I fear is that if Python 3.0 isn't used as a trigger to push out
> WSGI 2.0, we will end up being stuck with WSGI 1.0 forever and there
> will never ever be any momentum to updating it even though a range of
> deficiencies and shortcomings have been identified in the
> specification as far as the way it is drafted, with the functionality
> it provides and how that functionality is described as needing to be
> implemented.
[snip XML-RPC example]

That argument doesn't work for me. You're implying that if Python 3.0
did not exist, there would be no way to
come out with a new version of the specification to fix shortcomings?
We can't fix APIs unless we have the momentum given by a language
change? You better never have any ideas on WSGI 3.0 then, as it's
unlikely you'll have another such opportunity.

> With Python 3.0 people are going to have to change their code anyway
> and so it is an ideal time to push to a new version of WSGI
> specification which fixes its warts and eliminates the oddities it had
> to support certain legacy systems, something which is now not seen as
> necessary.

"With Python 3.0 people are going to have their change their code
anyway as the language changes, so we're going to make it harder for
them by breaking their libraries too"

Having one thing change is hard enough on people. It's then nice to be
able to run your tests and have some indication it works. It's also
nice to be able to continue releasing for Python 2.x for a while, and
release the converted code using the conversion script. I'm not making
up this plan, that's the official plan. Changing libraries will break
this plan.

[WSGI is hidden, so it will be a low-impact change]

This may be true. I still don't see a reason to connect it to the
language change. Anyway, I'll stop on this now. I just think it's a
worrying trend.

> As much as I'd like to see everything move to a better WSGI 2.0, if
> there are components which people don't want to update, then a WSGI
> 2.0 to 1.0 bridging middleware can be used to adapt them.

Yes, that would help people using Python 2.x, but would WSGI 1.0 even
be available in Python 3.0 given your plan?

Regards,

Martijn
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Guido van Rossum

unread,
Mar 4, 2008, 9:25:21 PM3/4/08
to Martijn Faassen, pylons-discuss, Web SIG, Graham Dumpleton
On Tue, Mar 4, 2008 at 6:13 PM, Martijn Faassen <faa...@startifact.com> wrote:
> Hey,
>
> On Wed, Mar 5, 2008 at 1:48 AM, Graham Dumpleton
> <graham.d...@gmail.com> wrote:
> [snip]
>
> > In the case of code which directly talks to the interface defined by
> > WSGI specification I very much doubt the py2to3 script will help. This
> > is because for WSGI to work with Python 3.0 there needs to be a change
> > from use of string type objects to byte string type objects. I would
> > suspect that py2to3 is only get help in any sort of automated way with
> > the fact that a string object becomes unicode aware, not where with
> > WSGI the code would have to change to use and deal with a different
> > type of object completely. The implications of this change to a byte
> > string type object are going to be much more complicated.
>
> I have no idea what the capabilities of this script are. I would
> *imagine* it would convert classic strings into the bytes types, and
> unicode strings into the new string type.

It does nothing of the kind. It leaves 'xxx' literals alone and
translates u'xxx' to 'xxx'. That's because (in many apps) both are
used primarily for text.

BTW I suggest that you play with it at least a little bit (run it on
its own example.py file) before diving into this discussion...

> Unsubscribe: http://mail.python.org/mailman/options/web-sig/guido%40python.org
>

--
--Guido van Rossum (home page: http://www.python.org/~guido/)

Ian Bicking

unread,
Mar 4, 2008, 10:53:49 PM3/4/08
to Graham Dumpleton, pylons-discuss, Web SIG
Graham Dumpleton wrote:
> Personally I believe that WSGI 1.0 should die along with Python 2.X. I
> believe that WSGI 2.0 should be developed to replace it and the
> introduction of Python 3.0 would be a great time to do that given that
> people are going to have to change their code anyway and that code
> isn't then likely to be backward compatible with Python 2.X.

I don't believe it should just *die*. But I agree that this is a good
time to revisit the specification. Especially since I have no idea how
the change to unicode text would effect the WSGI environment. Having
the environment hold bytes seems weird, but having it hold unicode is a
substantial change.

I don't think it will be as bad as Martijn thinks, because the libraries
people use will probably have relatively few interface changes. Pylons
and WebOb for instance should maintain largely the same interface (and
they already expose unicode when possible). None of the changes
proposed for WSGI 2 would change this.

If I'm maintaining two versions of a library (one for Python 2, one for
Python 3), then at least I'd like to get a little benefit out of it, and
a revised WSGI would give some benefit.

I think we might still need some kind of WSGI 1.1 to clarify what WSGI 1
(-like semantics) means in a Python 3.0 environment. Creating adapters
from WSGI 1 to WSGI 2 should be easy enough that we could still offer
some support for minimally-translated WSGI code.

Ian

Graham Dumpleton

unread,
Mar 4, 2008, 6:05:56 PM3/4/08
to pylons-discuss, Web SIG
Jose Galvez wrote:
> this is an interesting issue, because I would suspect that all our pylons
> applications will have to be converted as well as the pylons base code. I
> know that there is going to be a program which will automate the
> translation, but not having used it I don't know what issues the translation
> will cause. The other big question is will eggs will they be able to tell
> the difference between python 2.x and 3.x since the code will be different
> Jose
>
> On Tue, Mar 4, 2008 at 3:17 AM, Leo <Mor...@gmail.com> wrote:
>
> >
> > Subj.
> > Is Python 3000 migration planned?

There is more to it than just that. One problem is that the WSGI 1.0
specification is incompatible with Python 3.0. There were some
preliminary discussions about how the specification would need to be
changed, but no real final outcome. The discussions also probably
didn't cover everything that would need to be changed in the
specification. For example, wsgi.file_wrapper and how it would have to
be changed wasn't discussed.

The main issues were captured in:

http://www.wsgi.org/wsgi/Amendments_1.0

Note though that that page is merely a collection of points discussed
and is itself not any sort of official set of amendments to the WSGI
specification.

Personally I believe that WSGI 1.0 should die along with Python 2.X. I
believe that WSGI 2.0 should be developed to replace it and the
introduction of Python 3.0 would be a great time to do that given that
people are going to have to change their code anyway and that code
isn't then likely to be backward compatible with Python 2.X.

Graham

Martijn Faassen

unread,
Mar 4, 2008, 6:17:40 PM3/4/08
to Graham Dumpleton, pylons-discuss, Web SIG
Hey,

On Wed, Mar 5, 2008 at 12:05 AM, Graham Dumpleton
<Graham.D...@gmail.com> wrote:
[snip]


> Personally I believe that WSGI 1.0 should die along with Python 2.X. I
> believe that WSGI 2.0 should be developed to replace it and the
> introduction of Python 3.0 would be a great time to do that given that
> people are going to have to change their code anyway and that code
> isn't then likely to be backward compatible with Python 2.X.

I think lots of Python projects reason this way: Python 3 transition
is the right time to break backwards compatibility in our
library/framework. It's understandable.

Unfortunately this means that for people adjusting their code, they
won't just have to deal with the large Python 3 transition, but also
with lots of their frameworks and libraries making
backwards-incompatible changes. That's unfortunate, as that means any
automatic conversion strategy using the py2to3 script won't be
possible, and there won't be any way to keep libraries in transition
working in both Python 2 and 3 for a while (which is Guido's plan), as
their dependencies don't support it.

Regards,

Martijn

Graham Dumpleton

unread,
Mar 4, 2008, 7:48:44 PM3/4/08
to Martijn Faassen, pylons-discuss, Web SIG
On 05/03/2008, Martijn Faassen <faa...@startifact.com> wrote:
> Hey,
>
> On Wed, Mar 5, 2008 at 12:05 AM, Graham Dumpleton
> <Graham.D...@gmail.com> wrote:
> [snip]
>
> > Personally I believe that WSGI 1.0 should die along with Python 2.X. I
> > believe that WSGI 2.0 should be developed to replace it and the
> > introduction of Python 3.0 would be a great time to do that given that
> > people are going to have to change their code anyway and that code
> > isn't then likely to be backward compatible with Python 2.X.
>
> I think lots of Python projects reason this way: Python 3 transition
> is the right time to break backwards compatibility in our
> library/framework. It's understandable.
>
> Unfortunately this means that for people adjusting their code, they
> won't just have to deal with the large Python 3 transition, but also
> with lots of their frameworks and libraries making
> backwards-incompatible changes. That's unfortunate, as that means any
> automatic conversion strategy using the py2to3 script won't be
> possible, and there won't be any way to keep libraries in transition
> working in both Python 2 and 3 for a while (which is Guido's plan), as
> their dependencies don't support it.

In the case of code which directly talks to the interface defined by


WSGI specification I very much doubt the py2to3 script will help. This
is because for WSGI to work with Python 3.0 there needs to be a change
from use of string type objects to byte string type objects. I would
suspect that py2to3 is only get help in any sort of automated way with
the fact that a string object becomes unicode aware, not where with
WSGI the code would have to change to use and deal with a different
type of object completely. The implications of this change to a byte
string type object are going to be much more complicated.

What I fear is that if Python 3.0 isn't used as a trigger to push out


WSGI 2.0, we will end up being stuck with WSGI 1.0 forever and there
will never ever be any momentum to updating it even though a range of
deficiencies and shortcomings have been identified in the
specification as far as the way it is drafted, with the functionality
it provides and how that functionality is described as needing to be
implemented.

I'd rather not see another XML-RPC where in practice it was a good
first attempt, but with a little bit of tweaking would have made it so
much better, but still keep its simplicity. And no I don't mean SOAP,
that went too far. Problem with XML-RPC from what I saw at the time is
that the original author had a lot invested in software that used the
original XML-RPC and he wasn't going to budge as he didn't want to
have to change his own systems based on it.

With Python 3.0 people are going to have to change their code anyway
and so it is an ideal time to push to a new version of WSGI
specification which fixes its warts and eliminates the oddities it had
to support certain legacy systems, something which is now not seen as
necessary.

Also, for most systems that use WSGI it would be quite minimal impact,
as they often use it merely as a bridge to some existing web server
interface. Thus changes would be very localised. Even something like
Paste/Pylons hides a lot of what is WSGI behind its own veneer, for
example WebOb and its predecessor and so higher layers may not even be
affected much.

As much as I'd like to see everything move to a better WSGI 2.0, if
there are components which people don't want to update, then a WSGI
2.0 to 1.0 bridging middleware can be used to adapt them.

Graham

Manlio Perillo

unread,
Mar 5, 2008, 3:39:26 AM3/5/08
to Graham Dumpleton, pylons-discuss, Web SIG
Graham Dumpleton ha scritto:
> [...]

>
> Personally I believe that WSGI 1.0 should die along with Python 2.X. I
> believe that WSGI 2.0 should be developed to replace it and the
> introduction of Python 3.0 would be a great time to do that given that
> people are going to have to change their code anyway and that code
> isn't then likely to be backward compatible with Python 2.X.
>

Fine with me but there is a *big* problem.

WSGI 2.0 "breaks" support for asynchronous applications (since you can
no more send headers in the app iter).

I have finally implemented an extension for the Nginx's WSGI module that
give support to asynchronos applications.

I *need* it because in a application I'm developing I have to talk with
a web service on the Internet, and not using an asynchronous http client
(I'm using pycurl) is a suicide.


> Graham
> _______________________________________________


Manlio Perillo

Martijn Faassen

unread,
Mar 5, 2008, 3:40:23 AM3/5/08
to Guido van Rossum, pylons-discuss, Web SIG, Graham Dumpleton
Hey,

On Wed, Mar 5, 2008 at 3:25 AM, Guido van Rossum <gu...@python.org> wrote:
> On Tue, Mar 4, 2008 at 6:13 PM, Martijn Faassen <faa...@startifact.com> wrote:
> > Hey,
> >
> > On Wed, Mar 5, 2008 at 1:48 AM, Graham Dumpleton
> > <graham.d...@gmail.com> wrote:
> > [snip]
> >
> > > In the case of code which directly talks to the interface defined by
> > > WSGI specification I very much doubt the py2to3 script will help. This
> > > is because for WSGI to work with Python 3.0 there needs to be a change
> > > from use of string type objects to byte string type objects. I would
> > > suspect that py2to3 is only get help in any sort of automated way with
> > > the fact that a string object becomes unicode aware, not where with
> > > WSGI the code would have to change to use and deal with a different
> > > type of object completely. The implications of this change to a byte
> > > string type object are going to be much more complicated.
> >
> > I have no idea what the capabilities of this script are. I would
> > *imagine* it would convert classic strings into the bytes types, and
> > unicode strings into the new string type.
>
> It does nothing of the kind. It leaves 'xxx' literals alone and
> translates u'xxx' to 'xxx'. That's because (in many apps) both are
> used primarily for text.

> BTW I suggest that you play with it at least a little bit (run it on
> its own example.py file) before diving into this discussion...

I accurately described my lack of knowledge of the script, then. :)
Sure, I need to play with the script. I guess the best route would be
to introduce bytes in your code in Python 2.x and have the script
leave that alone. If WSGI 2.0 then makes it into Python 2.x as well,
then there's no problem with API breakage.

Playing with the script will happen sometime, but I think it's quite
clear the script will be of no help if important library APIs also
break down because people take their chances during transition (and
the script doesn't take care of it, which it can't for third party
APIs).

WSGI is probably not the best example given the string issue and its
inclusion in the Python core, though: as Graham expressed, it's
probably going to have problems no matter what. I also think any new
version could be developed on Python 2.6 first, as this will support
the bytes type as far as I understand. And yes, I need to try the
Python 2.6 alpha interpreter first too. :)

Regards,

Martijn
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Martijn Faassen

unread,
Mar 5, 2008, 3:43:41 AM3/5/08
to Ian Bicking, pylons-discuss, Web SIG
Hey,

On Wed, Mar 5, 2008 at 4:53 AM, Ian Bicking <ia...@colorstudy.com> wrote:
> Graham Dumpleton wrote:
> > Personally I believe that WSGI 1.0 should die along with Python 2.X. I
> > believe that WSGI 2.0 should be developed to replace it and the
> > introduction of Python 3.0 would be a great time to do that given that
> > people are going to have to change their code anyway and that code
> > isn't then likely to be backward compatible with Python 2.X.
>
> I don't believe it should just *die*. But I agree that this is a good
> time to revisit the specification. Especially since I have no idea how
> the change to unicode text would effect the WSGI environment. Having
> the environment hold bytes seems weird, but having it hold unicode is a
> substantial change.

> I don't think it will be as bad as Martijn thinks, because the libraries
> people use will probably have relatively few interface changes. Pylons
> and WebOb for instance should maintain largely the same interface (and
> they already expose unicode when possible). None of the changes
> proposed for WSGI 2 would change this.

That's probably true. WSGI is likely not the best example for this
case, just the trigger murmur that caused me to speak out. The WSGI
spec is not the only place where people will take the opportunity to
break APIs. Unfortunately as with WSGI, API breakage may in many cases
be unavoidable..

I would like to encourage the adoption of any new such standard in the
Python 2.6 environment already, if at all possible. This way it's not
an extra step for people to be burdened with when they move to Python
3, but something they can prepare for gradually.

Regards,

Martijn

Manlio Perillo

unread,
Mar 5, 2008, 11:37:48 AM3/5/08
to Web SIG
Brian Smith ha scritto:

> Manlio Perillo wrote:
>> Fine with me but there is a *big* problem.
>>
>> WSGI 2.0 "breaks" support for asynchronous applications
>> (since you can no more send headers in the app iter).
>
> WSGI 1.0 doesn't guarentee that all asynchronous applications will work
> either, because it allows the WSGI gateway to wait for and buffer all
> the input from the client before even calling the application callable.
> And, it doesn't provide a way to read an indefinite stream of input from
> the client, which is also problematic.
>
> Anyway, please post a small example of a program that fails to work
> because of these proposed changes for WSGI 2.0.
>
> Thanks,
> Brian
>


Attached there are two working examples (I have not committed it yet,
because I'm still testing - there are some problems that I need to solve).


The `curl_client` module is an high level interface to pycurl.

The `nginx-poll-proxy.py` script is an asyncronous WSGI application that
implements an HTTP proxy.

The `nginx-poll-sleep.py` script is a simple asynchronous WSGI
application that get the content of an HTTP resource using poll just to
"sleep" (suspend execution) for a fixed amount of time.


NOTE: I have also added a `ngx.sleep` extension, but I'm going to remove
it since the same behaviour can be obtained with ngx.poll.


An explanation of the interfaces
--------------------------------

The ngx.poll extension is based on the Python stdlib select.poll interface.

There are two constants: `ngx.WSGI_POLLIN` and `ngx.WSGI_POLLOUT`.
These are defined in the WSGI environment, but their value is "know"
(`0x01` and `0x04`) and can be used for bit masking.

The `ngx.connection_wrapper(fd)` function takes as input a file
descriptor (as integer) and returns a Connection wrapper object, to be
used for later operations.


The Connection wrapper object has the following methods:
- fileno():
return the associated socket descriptor
- register(flags):
register the connection with the server "reactor";
flags is a bit mask of ngx.WSGI_POLLIN and ngx.WSGI_POLLOUT
- deregister(flags=None):
deregister the connection from the server "reactor"
- close:
close the connection object, deregisterering it from the server
"reactor" if still active.
XXX it also can close the socket, but this should be done by the
client

The last function is `ngx.poll(timeout)`.
When called, the user *should* yield an empty string (yielding a non
empty string will result in an "undefined behaviour").

The WSGI application iteration will be suspended until a connection is
ready for reading or writing, or the timeout expires.

The `ngx.poll` function returns a callable that, when called, returns a
tuple with the connection object "ready" (or None if timedout) and a
flag indicating if the connection is ready for reading or writing.

NOTE: due to the internal architecture of the Nginx event module (it
have to support several different event systems), mod_wsgi for
Nginx will only return ngx.WSGI_POLLIN or ngx.WSGI_POLLPUT,
*never* ngx.WSGI_POLLIN | ngx.WSGI_POLLPUT.

Also, no error status is reported.

That's all.

An asynchronous application is simply impossible to develope with the
current draft of WSGI 2.0, since I need to send the headers after some
steps in the application iterator.


So, please, don't "ruin" the WSGI specification just to make it more
easy to implement and to use.
For me asynchronous support is very important.


P.S: I have chosen to implement this interface, instead of
`wsgi.pause_output`, because IMHO it is very easy to implement for
"normal" servers.

Moreover it is also more simple to use, with a very "natural"
interface, and it avoids the use of callbacks and a more strict
interaction with the server "reactor".


Regards Manlio Perillo

curl_client.py
nginx-poll-proxy.py
nginx-poll-sleep.py

Graham Dumpleton

unread,
Mar 5, 2008, 5:37:45 PM3/5/08
to Manlio Perillo, Web SIG
Let me get this right. You are complaining that the WSGI 2.0 would
break your non standard extension which was never a part of the WSGI
1.0 specification to begin with.

I also find it interesting that in the very early days you were
pushing very very hard for WSGI 2.0 to be specified and you had no
intention of even supporting WSGI 1.0 style interface. Now things seem
to be the complete opposite.

Anyway, your complaint seems to resolve around:

"""An asynchronous application is simply impossible to develope with the
current draft of WSGI 2.0, since I need to send the headers after some
steps in the application iterator."""

You probably need to explain the second half of that sentence a bit
better. From memory the WSGI 1.0 specification says that for an
iterable, the headers should be sent upon the generation of the first
non empty string being yielded. How does what you are doing relate to
that, are you not doing that? Why would WSGI 2.0 necessarily be any
different and cause a problem?

Graham

> _______________________________________________
> Web-SIG mailing list
> Web...@python.org
> Web SIG: http://www.python.org/sigs/web-sig

> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com

Phillip J. Eby

unread,
Mar 5, 2008, 9:05:38 PM3/5/08
to Graham Dumpleton, Manlio Perillo, Web SIG
At 09:37 AM 3/6/2008 +1100, Graham Dumpleton wrote:
>You probably need to explain the second half of that sentence a bit
>better. From memory the WSGI 1.0 specification says that for an
>iterable, the headers should be sent upon the generation of the first
>non empty string being yielded. How does what you are doing relate to
>that, are you not doing that? Why would WSGI 2.0 necessarily be any
>different and cause a problem?

Because (in concept anyway) WSGI 2.0 is synchronous with respect to
headers -- you don't get to yield empty strings and *then* return the headers.

Personally, I see truly-async web apps as a niche, because in order
to write a useful async app, you need *other* async APIs besides your
incoming HTTP one. Which means you're going to have to write to
Twisted or some other library's API, or else roll your own. At which
point, connecting your app to a web server is the least of your
concerns. (Since it has to be a web server that's compatible with
the API you're using, which means you might as well use its native API.)

That having been said, I don't see a problem with having a Web Server
Asynchronous Interface (WSAI?) for folks who want that sort of
thing. Ideally, such a thing would be the CPS (continuation-passing
style) mirror of WSGI 2.0. Where in WSGI 2.0 you return a 3-tuple,
in WSAI you'd essentially use start_response() and write().

In essence, you might say that WSGI 1.0 is a broken-down version of a
hideous crossbreeding of pure WSGI and pure WSAI. It would probably
be better to split them and have bridges. A truly-async system like
Twisted has to (effectively) do WSAI-WSGI bridging right now, but if
we had a WSAI standard, then there could perhaps be third-party bridges.

Even so, it's quite a niche: Twisted, nginx, and...? I know there
are a handful of async frameworks, and how many of those have web
servers included?

Manlio Perillo

unread,
Mar 6, 2008, 5:12:49 AM3/6/08
to Graham Dumpleton, Web SIG
Graham Dumpleton ha scritto:

> Let me get this right. You are complaining that the WSGI 2.0 would
> break your non standard extension which was never a part of the WSGI
> 1.0 specification to begin with.
>

No, you are wrong.
WSGI *allows* an implementation to develope extensions.

I'm complaining that WSGI 2.0 will break support for truly-async web apps.

> I also find it interesting that in the very early days you were
> pushing very very hard for WSGI 2.0 to be specified and you had no
> intention of even supporting WSGI 1.0 style interface. Now things seem
> to be the complete opposite.
>

First of all, in the early days I had very little experience with WSGI
and Nginx internals.

Moreover, as I can remember, I have never said that I was not going to
support WSGI 1.0.

I have started with an implementation of WSGI 2.0 because it was more
"easy" to implement and it allowed me (with little experience at that
time) to have a working implementation as soon as possible.


> Anyway, your complaint seems to resolve around:
>
> """An asynchronous application is simply impossible to develope with the
> current draft of WSGI 2.0, since I need to send the headers after some
> steps in the application iterator."""
>

Right.

> You probably need to explain the second half of that sentence a bit
> better. From memory the WSGI 1.0 specification says that for an
> iterable, the headers should be sent upon the generation of the first
> non empty string being yielded. How does what you are doing relate to
> that, are you not doing that? Why would WSGI 2.0 necessarily be any
> different and cause a problem?
>

See the response from Phillip J. Eby.

> Graham
>


Manlio Perillo
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Manlio Perillo

unread,
Mar 6, 2008, 5:34:54 AM3/6/08
to Phillip J. Eby, Web SIG, Graham Dumpleton
Phillip J. Eby ha scritto:

> At 09:37 AM 3/6/2008 +1100, Graham Dumpleton wrote:
>> You probably need to explain the second half of that sentence a bit
>> better. From memory the WSGI 1.0 specification says that for an
>> iterable, the headers should be sent upon the generation of the first
>> non empty string being yielded. How does what you are doing relate to
>> that, are you not doing that? Why would WSGI 2.0 necessarily be any
>> different and cause a problem?
>
> Because (in concept anyway) WSGI 2.0 is synchronous with respect to
> headers -- you don't get to yield empty strings and *then* return the
> headers.
>
> Personally, I see truly-async web apps as a niche, because in order to
> write a useful async app, you need *other* async APIs besides your
> incoming HTTP one.


Yes, this is true.
But I have to say that:

1) the asynchronous model is the "right" model to use to develope
robust and scalable applications (expecially in Python).

The fact that it is a niche does not means that it should not be
supported and promoted.

> Which means you're going to have to write to Twisted
> or some other library's API, or else roll your own.

This is true, but there are already some working(?) asynchronous clients:
pycurl and psycopg2.

You don't need to use the web server "private" API.

An HTTP client and an a database client is usually all you need in a web
application (well, you usually need also an SMTP client, but since a
server probably has a local SMTP daemon running, this should not be a
problem)


> At which point,
> connecting your app to a web server is the least of your concerns.

This is not always true.

> (Since it has to be a web server that's compatible with the API you're
> using, which means you might as well use its native API.)
>

No, this is not correct.
The ngx.poll extension should be easy to implement in a "standard"
server (I would like to write a reference implementation for wsgiref).

Moreover it is not impossible to write a pure async WSGI implementation
in Twisted Web, and then having it support the poll extension.

Then, a portable application can just use pycurl or psycopg2 + the poll
extension and should be portable.

Of course many WSGI implementations will not implements an "optimized"
version of the poll extension, but isn't the same true for
wsgi.file_wrapper?

> That having been said, I don't see a problem with having a Web Server
> Asynchronous Interface (WSAI?) for folks who want that sort of thing.
> Ideally, such a thing would be the CPS (continuation-passing style)
> mirror of WSGI 2.0. Where in WSGI 2.0 you return a 3-tuple, in WSAI
> you'd essentially use start_response() and write().
>

Why write?
It's only a problem.

An asynchronous application should just use a generator.
This solves some problems, like the consumer producer problem.

Moreover it is also more convienent to use (IMHO).

> In essence, you might say that WSGI 1.0 is a broken-down version of a
> hideous crossbreeding of pure WSGI and pure WSAI. It would probably be
> better to split them and have bridges. A truly-async system like
> Twisted has to (effectively) do WSAI-WSGI bridging right now, but if we
> had a WSAI standard, then there could perhaps be third-party bridges.
>
> Even so, it's quite a niche: Twisted, nginx, and...? I know there are a
> handful of async frameworks, and how many of those have web servers
> included?
>

Yes, this is a problem
But what makes WSGI 1.0 great, is that it is able to support this niche.


Thanks Manlio Perillo

Lawrence Oluyede

unread,
Mar 6, 2008, 6:03:48 AM3/6/08
to Manlio Perillo, Web SIG, Graham Dumpleton
> No, you are wrong.
> WSGI *allows* an implementation to develope extensions.
>
> I'm complaining that WSGI 2.0 will break support for truly-async web apps.

Correct me if I'm wrong. WSGI is great on paper and almost great in
daily use. One of this peculiarities in the "middleware extension
pattern", which has to foster reuse and spread of middleware doing (I
hope) one thing and doing right. AFAIK most of the middleware out
there are not written thinking about async at all. I don't see Twisted
developers crying out loud begging people to write async middlewares
and never block.

Don't take it the wrong way but what's the point in fighting so hard
for WSGI when there's plenty of ways to just ignore it?

I know that my statement will upset someone but I think the idea of
two separate web standard is great. It's too late to force in async in
the WSGI world and you, with your twisted expertise, should now that
writing async is hard and asking everyone to not block is even harder
(that's why even Twisted Matrix has callInThread and something like
that).

Manlio Perillo

unread,
Mar 6, 2008, 6:44:30 AM3/6/08
to Lawrence Oluyede, Web SIG, Graham Dumpleton
Lawrence Oluyede ha scritto:

>> No, you are wrong.
>> WSGI *allows* an implementation to develope extensions.
>>
>> I'm complaining that WSGI 2.0 will break support for truly-async web apps.
>
> Correct me if I'm wrong. WSGI is great on paper and almost great in
> daily use. One of this peculiarities in the "middleware extension
> pattern", which has to foster reuse and spread of middleware doing (I
> hope) one thing and doing right. AFAIK most of the middleware out
> there are not written thinking about async at all. I don't see Twisted
> developers crying out loud begging people to write async middlewares
> and never block.
>
> Don't take it the wrong way but what's the point in fighting so hard
> for WSGI when there's plenty of ways to just ignore it?
>

Because I don't care if people implements WSGI in the wrong way :).

If there is a good middleware, but it is not async friendly, I simply
will not use it and I will try to rewrite it, if it is feasible.

I'm fighting so hard because I think that it is wrong to try to simplify
the WSGI spec so much to make it not usable for writing pute async
applications.

> I know that my statement will upset someone but I think the idea of
> two separate web standard is great. It's too late to force in async in
> the WSGI world and you, with your twisted expertise, should now that
> writing async is hard and asking everyone to not block is even harder
> (that's why even Twisted Matrix has callInThread and something like
> that).
>

I'm not asking everyone to not block! This is not pratical.

And yes, a sync application is very different from a async application.

As an example of HTTP client:

def application(environ, start_response):
c = Connection(...)
r = c.request(...)
for block in r:
yield block

data = r.get_response()

VS

def application(environ, start_response):
c = Connection(...)
r = c.request(...)

data = r.get_response()


I'm not sure that having two standards is the best solution, since it
will complicate the implementation of a WSGI middleware.

Right now, the WSGI module for Nginx can serve both sync and async
applications without problems.


Manlio Perillo

Manlio Perillo

unread,
Mar 6, 2008, 7:11:14 AM3/6/08
to Manlio Perillo, Web SIG, Graham Dumpleton
Manlio Perillo ha scritto:
> [...]

>
> I'm not sure that having two standards is the best solution, since it
> will complicate the implementation of a WSGI middleware.

A correction: it should be WSGI gateway and not WSGI middleware.

Brian Smith

unread,
Mar 6, 2008, 10:08:18 AM3/6/08
to Web SIG
Manlio Perillo wrote:
> Brian Smith ha scritto:
> > Manlio Perillo wrote:
> >> Fine with me but there is a *big* problem.
> >>
> >> WSGI 2.0 "breaks" support for asynchronous applications (since you
> >> can no more send headers in the app iter).
> >
> > WSGI 1.0 doesn't guarentee that all asynchronous applications will
> > work either, because it allows the WSGI gateway to wait for
> and buffer
> > all the input from the client before even calling the
> application callable.
> > And, it doesn't provide a way to read an indefinite stream of input
> > from the client, which is also problematic.
> >
> > Anyway, please post a small example of a program that fails to work
> > because of these proposed changes for WSGI 2.0.
> >
> > Thanks,
> > Brian
> >
>
>
> Attached there are two working examples (I have not committed
> it yet, because I'm still testing - there are some problems
> that I need to solve).

I looked at your examples and now I understand better what you are
trying to do. I think what you are trying to do is reasonable but it
isn't something that is supported even by WSGI 1.0. It happens to work
efficiently for your particular gateway, but that isn't what WSGI is
about. In fact, any WSGI application that doesn't run correctly with an
arbitrary WSGI gateway (assuming no bugs in any gateway) isn't a WSGI
application at all.

It seems that the problem with your examples is not that they won't work
with WSGI 2.0. Rather, the problem is that the applications block too
long. The application will still work correctly, but will not be
efficient when run in nginx's mod_wsgi. However, that isn't a problem
with the specification or with the application; it is a problem with
nginx's mod_wsgi. I hate reading about the "Pythonic way" of doing
things, but writing a WSGI application so that it doesn't block too much
or too long is simply not Pythonic. The WSGI gateway needs to abstract
away those concerns so that they aren't an issue. Otherwise, the gateway
will only be useful for specialized applications designed to run well on
that particular gateway. Such specialized applications might as well use
specialized (gateway-specific) APIs, if they have to be designed
specifically for a particular gateway anyway.

Further, it is impossible to write a good HTTP proxy with WSGI. The
control over threading, blocking, I/O, and buffer management is just not
there in WSGI. In order to support efficient implementations of such
things, WSGI would have to become so low-level that it would become
pointless--it would be exposing an interface that is so low-level that
it wouldn't even be cross-platform. It wouldn't abstract away anything.

At the same time, the current WSGI 2.0 proposal abstracts too much. It
is good for applications that are written directly on top of the
gateway, and for simple middleware. But, it is not appropriate for a
serious framework to be built on. It is wrong to think that the same
interface is suitable for frameworks, middleware developers, and
application developers. I would rather see WSGI 2.0 become a much
lower-level framework that works at the buffer level (not strings), with
the ability to do non-blocking reads from wsgi.input, and the ability to
let the WSGI gateway do buffering in a sane and efficient manner
(there's no reason for the application to do a bunch of string joins
when the gateway could just send all the pieces in a single writev()).
Some control over blocking, HTTP chunked encoding, etc. could be
included as well. The current suggestions for WSGI 2.0 would then just
be a sample framework layered on top of this low-level interface, for
developers that don't want to use a big framework like DJango or Pylons.
But, the big frameworks and middleware would use the low-level interface
to run efficiently.

- Brian

Phillip J. Eby

unread,
Mar 6, 2008, 11:59:57 AM3/6/08
to Manlio Perillo, Web SIG, Graham Dumpleton
At 01:11 PM 3/6/2008 +0100, Manlio Perillo wrote:
>Manlio Perillo ha scritto:
> > [...]
> >
> > I'm not sure that having two standards is the best solution, since it
> > will complicate the implementation of a WSGI middleware.
>
>A correction: it should be WSGI gateway and not WSGI middleware.

On the contrary, it will simplify gateway implementation, if bridges
are available. Async gateways would implement WSAI, synchronous
gateways would implement WSGI.

The wsgiref library could include a standard bridge or two to go in
each direction (WSGI->WSAI and WSAI->WSGI), and the gateway would
provide some support for spawning, pooling, or queueing of threads,
where threads are needed to make the conversion from WSAI to WSGI
(since in the other direction, you can simply block waiting for a
callback). The APIs could be provided through some standardized
environ keys defined in the WSAI spec.

Graham Dumpleton

unread,
Mar 6, 2008, 12:16:23 PM3/6/08
to Brian Smith, Web SIG

In part adding to what Brian is saying, you (Manlio) speak as if WSGI
2.0 is already somehow set in stone and because you can't do what you
want, then it is no good and we should keep the WSGI 1.0 way of doing
things.

Like Brian is starting to think about what else WSGI 2.0 could be so
as to allow other ways of doing things, why don't you try the same
thing and think about how you could do what you want in a similar
style to WSGI 2.0, but adapting the WSGI 2.0 interface in some way. If
the changes make sense and don't deviate too far from where we have
been going, maybe people might accept it.

This following idea may not make much sense, but baby keeping me up,
its 4am and I am probably not going to get back to sleep until I get
this idea out of my head now.

Anyway, WSGI 2.0 currently talks about returning a single tuple
containing status, headers and iterable. What if it actually
optionally allowed the response to itself be an iterable, such that
you could do:

yield ('102 Processing', [], None)
...
yield ('102 Processing', [], None)
...
yield ('200 OK', [...], [...])

I'll admit that I am not totally across what the HTTP 102 status code
is meant to be used for and am sort of presuming that this might make
sense. Am sure though that Brian who understands this sort of level
better than me will set me straight.

That said, could the return of 102 like this allow the same result as
what you are currently doing with yielding empty strings prior to
setting up headers?

Going a bit further with this, would it make sense for an application
to also be able to return a 100 to force server layer to tell client
to start sending data if 100-continue expect header sent.

Could it also be used in some way to allow better control over output
chunking by allowing:

yield ('200 OK', [...], [...])
...
yield (None, None, [...])

In other words the application could effectively yield up multiple
iterables related to the actual response content.

Not that all HTTP servers support it, could this be a way of allowing
an application when using output chunking to specify trailer headers
for after last response content chunk.

yield ('200 OK', [...], [...])
...
yield (None, None, [...])
...
yield (None, [...], None)

Important thing though is I am not suggesting this be the default way
of doing responses, but that be an optionally available lower level
layer for doing it. An application could still just return a single
tuple as per WSGI 2.0 now. A good server adapter may optionally also
allow this more low level interface which allows some better measure
of control. Support of this low level interface could be optional,
with WSGI environment used to indicate if server supports it or not.

Now, this doesn't deal with request content and an alternative to
current wsgi.input so that one could do the non blocking read to get
back just what was available, ie. next chunk, but surely we can come
up with solutions for that as well. Thus I don't see it as impossible
to also handle input chunked content as well. We just need to stop
thinking that what has been proposed for WSGI 2.0 so far is the full
and complete interface.

Okay, I feel I can go back to sleep now. You can all start laughing
now if this insomnia driven idea is plain stupid. :-)

Graham

Manlio Perillo

unread,
Mar 6, 2008, 2:20:19 PM3/6/08
to Phillip J. Eby, Web SIG, Graham Dumpleton
Phillip J. Eby ha scritto:
> At 01:11 PM 3/6/2008 +0100, Manlio Perillo wrote:
>> Manlio Perillo ha scritto:
>> > [...]
>> >
>> > I'm not sure that having two standards is the best solution, since it
>> > will complicate the implementation of a WSGI middleware.
>>
>> A correction: it should be WSGI gateway and not WSGI middleware.
>
> On the contrary, it will simplify gateway implementation, if bridges are
> available.

I can confirm that implementing WSGI 2.0 is far more simple, however:
1) This is not an issue, since we already have many implementations
of WSGI 1.0: wsgiref, Twisted, Apache, Nginx, flup, ...

2) If you need to implement some extensions (like file_wrapper), then
the implementation is going to become more complex anyway.

> Async gateways would implement WSAI, synchronous gateways
> would implement WSGI.
>

Ok.
But I see no need to "invent" a new term (WSAI): the current
specification of WSGI is already good for async gateways/applications.

Is it really the best solution to split WSGI 1.0 into two separate
specifications?


> The wsgiref library could include a standard bridge or two to go in each
> direction (WSGI->WSAI and WSAI->WSGI), and the gateway would provide
> some support for spawning, pooling, or queueing of threads, where
> threads are needed to make the conversion from WSAI to WSGI (since in
> the other direction, you can simply block waiting for a callback).

If a specification explicitly requires the use of threads, then there is
something bad in it :).

Simply speaking: I want to avoid to use threads in Nginx.
They are not supported by the server.

> The
> APIs could be provided through some standardized environ keys defined in
> the WSAI spec.
>


Can you make an example? Thanks.
I'm not sure to understand you architecture.

Manlio Perillo

Brian Smith

unread,
Mar 6, 2008, 2:42:43 PM3/6/08
to Web SIG
Graham Dumpleton wrote:
> This following idea may not make much sense, but baby keeping
> me up, its 4am and I am probably not going to get back to
> sleep until I get this idea out of my head now.

:) I think you need to have a serious discussion with the baby. Maybe if
she got a job she wouldn't sleep all day, and she would sleep through
the night. I had such a talk with my roommate a few years ago, and we
got along much better after that.

> Anyway, WSGI 2.0 currently talks about returning a single
> tuple containing status, headers and iterable. What if it
> actually optionally allowed the response to itself be an
> iterable, such that you could do:
>
> yield ('102 Processing', [], None)
> ...
> yield ('102 Processing', [], None)
> ...
> yield ('200 OK', [...], [...])
>
> I'll admit that I am not totally across what the HTTP 102
> status code is meant to be used for and am sort of presuming
> that this might make sense. Am sure though that Brian who
> understands this sort of level better than me will set me straight.

The application should definitely be able to send as many 1xx status
lines as it wants. However, I expect any yielded status line to be sent
to the client, and there should be no need to include other headers or a
body. I will write more about this below.

That idea doesn't really benefit Manlio's programs. Manlio's program is
trying to say "use my thread for some other processing until some
(external) event happens." We already have standard mechanisms for doing
something similar in WSGI: multi-threaded and multi-process WSGI
gateways that let applications block indefinitely while letting other
applications run. A polling interface like Manlio proposes does help for
applications that are doing I/O via TCP(-like) protocols. But, it
doesn't provide a way to wait for a database query to finish, or for any
other kind of IPC to complete, unless everything is rebuilt around that
polling mechanism. It isn't a general enough interface to become a part
of WSGI. I think it is safe to say that multi-threaded or multi-process
execution is something that is virtually required for WSGI.

> Going a bit further with this, would it make sense for an
> application to also be able to return a 100 to force server
> layer to tell client to start sending data if 100-continue
> expect header sent.

The handling of 1xx status codes is inhibited by the current state of
CGI and FastCGI gateways. In particular, most CGI and FastCGI modules do
not provide useful support for "Expect: 100-continue"; they always send
the "100 Continue" even when you don't want them to. As long as CGI and
FastCGI have to be supported as gateways, the design of WSGI will not be
able to change substantially from WSGI 1.0 (the current proposed changes
for WSGI 2.0 are really just cosmetic except for the removal of
start_response.write()).

Consequently, support for 1xx status lines must be optional, so it might
as well be done as a WSGI 1.0-compatible extension like this:

def application(environ, start_response):
def ignore(x):
pass
send_provisional_response = environ.get(
"wsgi.send_provisional_response",
ignore)
...
send_provisional_response("102 Processing")
...
send_provisional_response("102 Processing")

> Could it also be used in some way to allow better control
> over output chunking by allowing:
>
> yield ('200 OK', [...], [...])
> ...
> yield (None, None, [...])
>
> In other words the application could effectively yield up
> multiple iterables related to the actual response content.

Again, I like the simplification that WSGI 2.0 applications are always
functions or function-like callables, and never iterables. It would be
easy to create a WSGI-1.0-compatible interface for efficient batching of
output strings, which could also then support buffer objects instead of
just (byte)strings:

def application(environ, start_response):
def join_buffers(buffers):
return "".join([str(b) for b in buffers])
vectorize = environ.get("wsgi.vectorize", join_buffers)
return vectorize(buffers)

> Not that all HTTP servers support it, could this be a way of
> allowing an application when using output chunking to specify
> trailer headers for after last response content chunk.

The trailers feature is something I haven't thought a lot about. Again,
that is something that CGI doesn't support (I don't think FastCGI
supports it either). So, that is something that has to also be done in a
way similar to the above:

def application(environ, start_response):
headers = [...]
trailers = environ.get("wsgi.trailers")
if trailers is None:
# inefficiently calculate the trailer fields
# in advance
headers.append("Header-A", ...)
headers.append("Header-B", ...)
...
start_response("200 OK", headers)
...
while ...:
if trailers is not None:
# calculate trailer fields as we yield
# output
yield output

trailers.append("Header-A", ...)
trailers.append("Header-B", ...)

It would be nice of the specification for the trailers extension
specified that the trailers list is included in the WSGI environment if
and only if (1) we are talking HTTP/1.1, and (2) the gateway and web
server support trailers.

> Important thing though is I am not suggesting this be the
> default way of doing responses, but that be an optionally
> available lower level layer for doing it. An application
> could still just return a single tuple as per WSGI 2.0 now. A
> good server adapter may optionally also allow this more low
> level interface which allows some better measure of control.
> Support of this low level interface could be optional, with
> WSGI environment used to indicate if server supports it or not.

Right, but if these features are all optional, then they can be spec'd
to work with WSGI 1.0.

> Now, this doesn't deal with request content and an
> alternative to current wsgi.input so that one could do the
> non blocking read to get back just what was available, ie.
> next chunk, but surely we can come up with solutions for that
> as well. Thus I don't see it as impossible to also handle
> input chunked content as well. We just need to stop thinking
> that what has been proposed for WSGI 2.0 so far is the full
> and complete interface.

We can just say that WSGI-2.0-style applications must support chunked
request bodies, but gateways are not required to support them.
WSGi-2.0-style applications would have to check for CONTENT_LENGTH, and
if that is missing, check to see if environ['HTTP_TRANSFER_ENCODING']
includes the "chunked" token. wsgi_input.read() would have to stop at
the end of the request; applications would not restricted from
attempting to read more than CONTENT_LENGTH bytes.

WSGI gateways would have to support an additional (keyword?) argument to
wsgi.input.read() that controls whether it is blocking or non-blocking.
It seems pretty simple.

Notice that all of this can be done even with WSGI 1.0, if these
additional features were broken out into their own PEP(s).

- Brian

Manlio Perillo

unread,
Mar 6, 2008, 4:06:29 PM3/6/08
to Brian Smith, Web SIG
Brian Smith ha scritto:
>
> [...]
>
> That idea doesn't really benefit Manlio's programs. Manlio's program is
> trying to say "use my thread for some other processing until some
> (external) event happens."

Right.

> We already have standard mechanisms for doing
> something similar in WSGI: multi-threaded and multi-process WSGI
> gateways that let applications block indefinitely while letting other
> applications run.

Ok, but this is not the best solution to the problem!

> A polling interface like Manlio proposes does help for
> applications that are doing I/O via TCP(-like) protocols.

This is true only on Windows.

> But, it
> doesn't provide a way to wait for a database query to finish, or for any
> other kind of IPC to complete, unless everything is rebuilt around that
> polling mechanism.

This is not generally true.

> It isn't a general enough interface to become a part
> of WSGI.

I'm not proposing it to become part of WSGI (since it should not be stay
here), but part of the wsgiorg "namespace", or an officially
asynchronous extensions interface.

> I think it is safe to say that multi-threaded or multi-process
> execution is something that is virtually required for WSGI.
>

but only if the application is synchronous and heavy I/O bound.

Note that Nginx is multi-process, but it only executes a fixed number of
worker processes, so if an I/O request can block for a significative
amount of time, you can not afford to let it block.


Moreover with an asynchronous gateway it is possible to implement a
"middleware" that can execute an application inside a thread.

This is possible by creating a pipe, starting a new thread, having the
main thread polling the pipe, and having the thread write some data in
the pipe to "wake" the main thread when finish its job.

I'm going to write a sample implementation when I find some time.


Yes, we need to use a thread, but this can be done in pure Python code
only (altought I'm not sure if this can have side effects on Nginx).


> [...]


>
> Again, I like the simplification that WSGI 2.0 applications are always
> functions or function-like callables, and never iterables.

Where is the simplification?

This is an interesting idea.

Unfortunately right now Nginx does not supports trailing headers, and I
don't know if common browsers support them.

> [...]

>> Now, this doesn't deal with request content and an
>> alternative to current wsgi.input so that one could do the
>> non blocking read to get back just what was available, ie.
>> next chunk, but surely we can come up with solutions for that
>> as well. Thus I don't see it as impossible to also handle
>> input chunked content as well. We just need to stop thinking
>> that what has been proposed for WSGI 2.0 so far is the full
>> and complete interface.
>
> We can just say that WSGI-2.0-style applications must support chunked
> request bodies, but gateways are not required to support them.
> WSGi-2.0-style applications would have to check for CONTENT_LENGTH, and
> if that is missing, check to see if environ['HTTP_TRANSFER_ENCODING']
> includes the "chunked" token. wsgi_input.read() would have to stop at
> the end of the request; applications would not restricted from
> attempting to read more than CONTENT_LENGTH bytes.
>
> WSGI gateways would have to support an additional (keyword?) argument to
> wsgi.input.read() that controls whether it is blocking or non-blocking.
> It seems pretty simple.
>

How should be written an application to use this feature?

> Notice that all of this can be done even with WSGI 1.0, if these
> additional features were broken out into their own PEP(s).
>
> - Brian
>


Manlio Perillo

Graham Dumpleton

unread,
Mar 6, 2008, 5:46:12 PM3/6/08
to Brian Smith, Web SIG
On 07/03/2008, Brian Smith <br...@briansmith.org> wrote:
> Graham Dumpleton wrote:
> > Anyway, WSGI 2.0 currently talks about returning a single
> > tuple containing status, headers and iterable. What if it
> > actually optionally allowed the response to itself be an
> > iterable, such that you could do:
> >
> > yield ('102 Processing', [], None)
> > ...
> > yield ('102 Processing', [], None)
> > ...
> > yield ('200 OK', [...], [...])
> >
> > I'll admit that I am not totally across what the HTTP 102
> > status code is meant to be used for and am sort of presuming
> > that this might make sense. Am sure though that Brian who
> > understands this sort of level better than me will set me straight.
>
> That idea doesn't really benefit Manlio's programs. Manlio's program is
> trying to say "use my thread for some other processing until some
> (external) event happens.

Okay, like some of our protocol discussions before, you possibly don't
see what I see in how what I suggested could be used. ;-)

Anyway, I am not the one battling for this, so will not try and
explain it further. Or I'll leave it to my sleep deprived hours in the
middle of the night. :-)

Graham

Graham Dumpleton

unread,
Mar 6, 2008, 6:13:46 PM3/6/08
to Manlio Perillo, Web SIG
On 06/03/2008, Manlio Perillo <manlio_...@libero.it> wrote:
> But I have to say that:
>
> 1) the asynchronous model is the "right" model to use to develope
> robust and scalable applications (expecially in Python).

No it isn't. It is one model, it is not necessarily the 'right' model.

The asynchronous model actually has worse drawbacks than the GIL
problem when multithreading is used and you have a multi core or multi
process system. This is because in an asynchronous system with only a
single thread, it is theoretically impossible to use more than one
processor at a time. Even with the Python GIL as a contention point,
threads in C extension modules can at least release the GIL and
perform work in parallel and so theoretically the process can consume
the resources of more than one core or processor at a time.

The whole nature of web applications where requests perform small
amounts of work and then complete actually simplifies the use of
multithreading. This is because unlike complex applications where
there are long running activities occurring in different threads there
is no real need for the threads handling different requests to
communicate with each other. Thus the main problem is merely
protecting concurrent access to shared resources. Even that is not so
bad as each request handler is mostly operating on data specific to
the context of that request rather than shared data.

Thus, whether one uses multithreading or an event driven system, one
can't but avoid use of multiple processes to build a really good
scalable system. This is where nginx limits itself a bit as the number
of worker processes is fixed, whereas with Apache it can create
additional worker processes if demand requires and reap them when no
longer required. You can therefore with Apache factor in some slack to
cope with bursts in demand and it will scale up the number of
processes as necessary. With nginx you have to have a really good idea
in advance of what sort of maximum load you will need to handle as you
need to fix the number of worker processes. For static file serving
the use of an event driven system may make this easier, but factor in
a Python web application where each request has a much greater
overhead and possibility of blocking and it becomes a much tricker
proposition to plan how many worker processes you may need.

No matter what technology one uses there will be such trade offs and
they will vary depending on what you are doing. Thus it is going to be
very rare that one technology is always the "right" technology. Also,
as much as people like to focus on raw performance of the web server
for hosting Python web applications, in general the actual performance
matters very little in the greater scheme of things (unless your
stupid enough to use CGI). This is because that isn't where the
bottlenecks are generally going to be. Thus, that one hosting solution
may for a hello world program be three times faster than another,
means absolutely nothing if that ends up translating to less than 1
percent throughput when someone loads and runs their mega Python
application. This is especially the case when the volume of traffic
the application receives never goes any where near fully utilising the
actual resources available. For large systems, you would never even
depend on one machine anyway and load balance across a cluster. Thus
the focus by many on raw speed in many cases is just plain ridiculous
as there is a lot more to it than that.

Graham

Graham Dumpleton

unread,
Mar 6, 2008, 6:25:08 PM3/6/08
to Manlio Perillo, Web SIG
On 07/03/2008, Manlio Perillo <manlio_...@libero.it> wrote:
> Moreover with an asynchronous gateway it is possible to implement a
> "middleware" that can execute an application inside a thread.
>
> This is possible by creating a pipe, starting a new thread, having the
> main thread polling the pipe, and having the thread write some data in
> the pipe to "wake" the main thread when finish its job.
>
> I'm going to write a sample implementation when I find some time.
>
> Yes, we need to use a thread, but this can be done in pure Python code
> only (altought I'm not sure if this can have side effects on Nginx).

So you do understand this technique of using a socketpair() pipe as a
way of communicating between code which is thread safe and other code
which is potentially non thread safe. This makes moot your prior point
that they (threads) are not supported by the server and thus you want
to avoid using them.

In other words, as I have pointed out previously, in practice it would
be possible to implement a thread pool mechanism on top of nginx such
that you could avoid this whole problem of the asynchronous model at
the WSGI level.

I still don't understand why you are so resistant to going this path
given that for Python web applications, the event driven model doesn't
necessarily provide any benefits when one looks at the bigger picture
and perhaps just makes it harder to implement application code.

If you want to pursue an even driven model because you find it an
interesting area to work in then fine, but you shouldn't expect
everyone else to try and accommodate that way of thinking when people
are happy with the alternative.

Graham

Brian Smith

unread,
Mar 6, 2008, 7:29:07 PM3/6/08
to Web SIG
Manlio Perillo wrote:
> Brian Smith ha scritto:
> > We already have standard mechanisms for doing something
> > similar in WSGI: multi-threaded and multi-process WSGI
> > gateways that let applications block indefinitely while
> > letting other applications run.
>
> Ok, but this is not the best solution to the problem!

Why not?

> > I think it is safe to say that multi-threaded or multi-process
> > execution is something that is virtually required for WSGI.
>
> but only if the application is synchronous and heavy I/O bound.

Isn't that almost every WSGI application?

> Note that Nginx is multi-process, but it only executes a
> fixed number of worker processes, so if an I/O request can
> block for a significative amount of time, you can not afford
> to let it block.

Can't you just increase the number of processes?

> Moreover with an asynchronous gateway it is possible to
> implement a "middleware" that can execute an application
> inside a thread.
>
> This is possible by creating a pipe, starting a new thread,
> having the main thread polling the pipe, and having the
> thread write some data in the pipe to "wake" the main thread
> when finish its job.

Right. This is exactly what I was saying. By using
multiprocessing/multithreading, each application can block as much as it
wants.

> > Again, I like the simplification that WSGI 2.0 applications
> > are always functions or function-like callables, and never
> > iterables.
>
> Where is the simplification?

My understanding is that the application callable never returns an
interator (it never yields, it only returns). This is simpler to explain
to people that are new to WSGI. It also simplifies the language in the
specification. The difference is basically immaterial to WSGI gateway
implementers, but that is because the WSGI specification is biased
towards making gateways simple to implement.

> Unfortunately right now Nginx does not supports trailing
> headers, and I don't know if common browsers support them.

Right, trailers are not really that useful right now. Too many
applications expect to get all header fields first, and most people
don't even know about trailers in the first place.

> > We can just say that WSGI-2.0-style applications must
> > support chunked request bodies, but gateways are not
> > required to support them.
> > WSGi-2.0-style applications would have to check for
> > CONTENT_LENGTH, and if that is missing, check to see if
> > environ['HTTP_TRANSFER_ENCODING'] includes the "chunked"
> > token. wsgi_input.read() would have to stop at the end
> > of the request; applications would not restricted from
> > attempting to read more than CONTENT_LENGTH bytes.
> >
> > WSGI gateways would have to support an additional
> > (keyword?) argument to wsgi.input.read() that
> > controls whether it is blocking or non-blocking.
> > It seems pretty simple.
>
> How should be written an application to use this feature?

For chunked request bodies: instead of reading until exactly
CONTENT_LENGTH bytes have been read, keep reading until
environ["wsgi.input"].read(chunk_size) returns "".

For "non-blocking reads", given environ["wsgi.input"].read(64000,
min=8000):

1. If more than 64000 bytes are available without blocking, 8192 bytes
are returned.
2. If less than 8000 bytes are available without blocking, then the
gateway blocks until at least 1024 bytes are available.
3. When 8000-63999 bytes are available, then all those bytes are
returned.

The non-blocking behavior is useful when the application can process
arbitrary chunks of input without having all the input available. For
example, if you are transcoding a POSTed video, you probably can
transcode the video with arbitrarily-sized chunks of input. If you
already have 32K of input available, you don't really need to wait
around for 32K more input before you start processing. But, if you have
64K of input ready to process, then you might as well process all of it
at once.

My understanding is that nginx completely buffers all input, so that all
reads from wsgi.input are basically non-blocking.

- Brian

Manlio Perillo

unread,
Mar 7, 2008, 4:16:39 AM3/7/08
to Graham Dumpleton, Web SIG
Graham Dumpleton ha scritto:

> On 06/03/2008, Manlio Perillo <manlio_...@libero.it> wrote:
>> But I have to say that:
>>
>> 1) the asynchronous model is the "right" model to use to develope
>> robust and scalable applications (expecially in Python).
>
> No it isn't. It is one model, it is not necessarily the 'right' model.
>

Ok.

> The asynchronous model actually has worse drawbacks than the GIL
> problem when multithreading is used and you have a multi core or multi
> process system. This is because in an asynchronous system with only a
> single thread, it is theoretically impossible to use more than one
> processor at a time.

This is the reason why I'm using Nginx instead of Twisted.

> Even with the Python GIL as a contention point,
> threads in C extension modules can at least release the GIL and
> perform work in parallel and so theoretically the process can consume
> the resources of more than one core or processor at a time.
>
> The whole nature of web applications where requests perform small
> amounts of work and then complete actually simplifies the use of
> multithreading.


Yes, this is true most of the time.

But the reason I have finally added the poll extension in my WSGI
implementation for Nginx is that I have some requests that *do not* take
small amounts of work to be served.

Database queries, as an example, are not a problem if executed
synchronously, since Nginx has multiple worker processes, and the
environment is "controlled" (that is, I can optimize the query/database,
the connection is on the localhost or on a LAN, and so on).

> This is because unlike complex applications where
> there are long running activities occurring in different threads there
> is no real need for the threads handling different requests to
> communicate with each other. Thus the main problem is merely
> protecting concurrent access to shared resources. Even that is not so
> bad as each request handler is mostly operating on data specific to
> the context of that request rather than shared data.
>

Again, this is true.
However the problem is that multithreaded servers usually does not
scales well as asynchronous one.
http://blog.emmettshear.com/post/2008/03/03/Dont-use-Pound-for-load-balancing

Of course this is special case, a server that is mostly I/O bound.

> Thus, whether one uses multithreading or an event driven system, one
> can't but avoid use of multiple processes to build a really good
> scalable system. This is where nginx limits itself a bit as the number
> of worker processes is fixed, whereas with Apache it can create
> additional worker processes if demand requires and reap them when no
> longer required.

Right.
But this is a subject that needs more discussion (and I suspect that we
are going off topic).

Is it true that Apache can spawn additional processes, but (again, when
the request is mainly I/O bound) each process does very little work
*but* using not little amount of system resources.

Nginx instead use a fixed (and small) number of processes, but each
process is used at 100%.


Apache model is great when you need to run generic embedded applications.

I think that Nginx is great for serving static content, proxing, and
serving embedded application that are written with the asynchronous
nature of Nginx in mind.

> You can therefore with Apache factor in some slack to
> cope with bursts in demand and it will scale up the number of
> processes as necessary. With nginx you have to have a really good idea
> in advance of what sort of maximum load you will need to handle as you
> need to fix the number of worker processes.

Right.

> For static file serving
> the use of an event driven system may make this easier,


By the way, I know there is an event based worker in Apache.
Have you exterience with it?

> but factor in
> a Python web application where each request has a much greater
> overhead and possibility of blocking and it becomes a much tricker
> proposition to plan how many worker processes you may need.
>

Right.

> No matter what technology one uses there will be such trade offs and
> they will vary depending on what you are doing. Thus it is going to be
> very rare that one technology is always the "right" technology. Also,
> as much as people like to focus on raw performance of the web server
> for hosting Python web applications, in general the actual performance
> matters very little in the greater scheme of things (unless your
> stupid enough to use CGI). This is because that isn't where the
> bottlenecks are generally going to be. Thus, that one hosting solution
> may for a hello world program be three times faster than another,
> means absolutely nothing if that ends up translating to less than 1
> percent throughput when someone loads and runs their mega Python
> application. This is especially the case when the volume of traffic
> the application receives never goes any where near fully utilising the
> actual resources available. For large systems, you would never even
> depend on one machine anyway and load balance across a cluster. Thus
> the focus by many on raw speed in many cases is just plain ridiculous
> as there is a lot more to it than that.
>


There is not only the problem on raw speed.
There is also a problem of server resources usage.

As an example, an Italian hosting company poses strict limits on
resource usage for each client.

They do not use Apache, since they fear that serving embedded
applications limits their control (but, if I'm not wrong, you have
implemented a solution for this problem in mod_wsgi).

Using Nginx + the wsgi module has the benefit to require less system
resources than flup (as an example) and, probabily, Apache.

> Graham
>


Manlio Perillo

Manlio Perillo

unread,
Mar 7, 2008, 4:48:27 AM3/7/08
to Brian Smith, Web SIG


No, this is not true.
First of all, this extension should be easy to implement for any WSGI
implementation (maybe even with a middleware? I have to check).

Lastly, truly portability is a complex topic.

The WSGI spec allows the implementation of extensions.
Of course if an application uses an extension it is no more portable;
maybe it should check the presence of the extension and execute an
alternative code if it is not available.

This is possible in the example I have posted.

I like to think about WSGI the same way as OpenGL or SQL.

There are well established standards, but an application *should* be
allowed to use specialized extension.


> It seems that the problem with your examples is not that they won't work
> with WSGI 2.0. Rather, the problem is that the applications block too
> long. The application will still work correctly, but will not be
> efficient when run in nginx's mod_wsgi. However, that isn't a problem
> with the specification or with the application; it is a problem with
> nginx's mod_wsgi.

No.
It's a problem for every server, even for Apache.
Apache, as an example, can spawn additional processes; but there is a limit.

What happens if 500+ concurrent requests run the blocking code?
If you do not set a limit of child processes in Apache, the system will
very probably "die".

If you set a limit, than some requests will have to wait.

Writing an asynchronous client is really the most sensate solution of
this problem.

Of course, again, the application should work with any WSGI implementation.

But the solution *is not* to write "generic" code.
The solution is to write specialized code, and to write a version of the
code for each of the possible server architecture
(multithread/multiprocess/CGI, asynchronous).

This is where it is important to standardize an interface for
asynchronous extensions.

If in future a new asynchronous WSGI implementation will be developed, I
would like to use the same interface, so I will not have to write yet
another specialized version of my code.


> I hate reading about the "Pythonic way" of doing
> things, but writing a WSGI application so that it doesn't block too much
> or too long is simply not Pythonic.


Sorry, but this is absurd ;-).
I need to talk with a web service on Internet: I have *no* control on this.

The solution is to not have to use a web service, but this, again, is
not under my control.


In general, however, I agree.
A web application should be written in the most efficient way;
this is the reason why I try to avoid to use object relationals mappers,
as an example.


> The WSGI gateway needs to abstract
> away those concerns so that they aren't an issue.

What concerns?

> Otherwise, the gateway
> will only be useful for specialized applications designed to run well on
> that particular gateway. Such specialized applications might as well use
> specialized (gateway-specific) APIs, if they have to be designed
> specifically for a particular gateway anyway.
>
> Further, it is impossible to write a good HTTP proxy with WSGI. The
> control over threading, blocking, I/O, and buffer management is just not
> there in WSGI.

No.
With WSGI this is possible.

> In order to support efficient implementations of such
> things, WSGI would have to become so low-level that it would become
> pointless--it would be exposing an interface that is so low-level that
> it wouldn't even be cross-platform. It wouldn't abstract away anything.
>
> At the same time, the current WSGI 2.0 proposal abstracts too much. It
> is good for applications that are written directly on top of the
> gateway, and for simple middleware. But, it is not appropriate for a
> serious framework to be built on. It is wrong to think that the same
> interface is suitable for frameworks, middleware developers, and
> application developers. I would rather see WSGI 2.0 become a much
> lower-level framework that works at the buffer level (not strings), with
> the ability to do non-blocking reads from wsgi.input, and the ability to
> let the WSGI gateway do buffering in a sane and efficient manner
> (there's no reason for the application to do a bunch of string joins
> when the gateway could just send all the pieces in a single writev()).

I agree.
The WSGI 1.0 spec disallows a WSGI implementation to do buffering, but I
think that it should allow it.

The WSGI implementation for Nginx already do this, when enabling an
option (disabled as default).

Nginx will use writev.


> Some control over blocking, HTTP chunked encoding, etc. could be
> included as well. The current suggestions for WSGI 2.0 would then just
> be a sample framework layered on top of this low-level interface, for
> developers that don't want to use a big framework like DJango or Pylons.

+1.

This is what I would like to have.
A WSGI 1.1 spec, based on WSGI 1.0 with some corrections.

And a simplified interface for people who want to use it.


> But, the big frameworks and middleware would use the low-level interface
> to run efficiently.
>
> - Brian
>

Manlio Perillo

Manlio Perillo

unread,
Mar 7, 2008, 4:54:48 AM3/7/08
to Graham Dumpleton, Web SIG
Graham Dumpleton ha scritto:
> [...]
>
> In part adding to what Brian is saying, you (Manlio) speak as if WSGI
> 2.0 is already somehow set in stone


Well, Philip J. Eby explicitly said that WSGI 2.0 exists only for
removing the use of start_response...

So I assume that it is already set in stone.

> and because you can't do what you
> want, then it is no good and we should keep the WSGI 1.0 way of doing
> things.
>
> Like Brian is starting to think about what else WSGI 2.0 could be so
> as to allow other ways of doing things, why don't you try the same
> thing and think about how you could do what you want in a similar
> style to WSGI 2.0, but adapting the WSGI 2.0 interface in some way. If
> the changes make sense and don't deviate too far from where we have
> been going, maybe people might accept it.
>

I have tried to figure out how to implement an asynchronous application
with WSGI 2.0, but the results are not good:


def application(environ, start_response):
def app_iter()


c = Connection(...)
r = c.request(...)
for block in r:
yield block

data = r.get_response()

environ['start_response'](
'200 OK', [('Content-Type', ('text/plain')])

yield data

return '', [], app_iter

> [...]

Manlio Perillo

Manlio Perillo

unread,
Mar 7, 2008, 5:11:01 AM3/7/08
to Graham Dumpleton, Web SIG
Graham Dumpleton ha scritto:

> On 07/03/2008, Manlio Perillo <manlio_...@libero.it> wrote:
>> Moreover with an asynchronous gateway it is possible to implement a
>> "middleware" that can execute an application inside a thread.
>>
>> This is possible by creating a pipe, starting a new thread, having the
>> main thread polling the pipe, and having the thread write some data in
>> the pipe to "wake" the main thread when finish its job.
>>
>> I'm going to write a sample implementation when I find some time.
>>
>> Yes, we need to use a thread, but this can be done in pure Python code
>> only (altought I'm not sure if this can have side effects on Nginx).
>
> So you do understand this technique of using a socketpair() pipe as a
> way of communicating between code which is thread safe and other code
> which is potentially non thread safe.

Right.

> This makes moot your prior point
> that they (threads) are not supported by the server and thus you want
> to avoid using them.
>

Not really true ;-).

Threads are still not supported by Nginx.
This means that using threads in an application embedded in Nginx can
cause who knows what problems (ok, probabily it will *not* cause any
problems).

Moreover, I'm not sure that such a *middleware* will be a full WSGI 1.0
conforming middleware.

> In other words, as I have pointed out previously, in practice it would
> be possible to implement a thread pool mechanism on top of nginx such
> that you could avoid this whole problem of the asynchronous model at
> the WSGI level.
>

No, this does not solves the problem.

The number of threads I can create is limited, so I can serve only a
limited number of concurrent requests.

The asynchronous solution is more optimized.

> I still don't understand why you are so resistant to going this path
> given that for Python web applications, the event driven model doesn't
> necessarily provide any benefits when one looks at the bigger picture
> and perhaps just makes it harder to implement application code.
>

The event drive model *does* provide benefits for my problem.

And I'm not looking at the bigger picture here.
I'm looking at a HTTP resource that needs to execute an HTTP request to
an external web application.

> If you want to pursue an even driven model because you find it an
> interesting area to work in then fine, but you shouldn't expect
> everyone else to try and accommodate that way of thinking when people
> are happy with the alternative.
>

The problem here is that I'm just pointing out that the *current* WSGI
1.0 *supports* asynchronous applications.

Until now nobody else have implemented asynchronous applications on top
of WSGI, and so Philip J. Eby have decided to getting rid of the
asynchronous support, for the sake of having a simplified implementation.

I'm only saying: "hey, wait. Actually WSGI 1.0 *can* really be used for
writing asynchronous applications, here is a *working* and not pure
academic example".

> Graham
>


Manlio Perillo

Graham Dumpleton

unread,
Mar 7, 2008, 5:15:59 AM3/7/08
to Manlio Perillo, Web SIG
On 07/03/2008, Manlio Perillo <manlio_...@libero.it> wrote:
> Is it true that Apache can spawn additional processes,

Yes, for prefork and worker MPM, but not winnt on Windows. See for
example details for worker MPM in:

http://httpd.apache.org/docs/2.2/mod/worker.html

> By the way, I know there is an event based worker in Apache.
> Have you exterience with it?

No, haven't used it. It isn't an event driven system like you know it.
It still uses threads like worker MPM. The difference as I understand
it is that it dedicates a single thread to managing client socket
connections maintained due to keep alive, rather than a whole thread
being tied up for each such connection. So, it is just an improvement
over worker and does not implement a full event driven system.

> > No matter what technology one uses there will be such trade offs and
> > they will vary depending on what you are doing. Thus it is going to be
> > very rare that one technology is always the "right" technology. Also,
> > as much as people like to focus on raw performance of the web server
> > for hosting Python web applications, in general the actual performance
> > matters very little in the greater scheme of things (unless your
> > stupid enough to use CGI). This is because that isn't where the
> > bottlenecks are generally going to be. Thus, that one hosting solution
> > may for a hello world program be three times faster than another,
> > means absolutely nothing if that ends up translating to less than 1
> > percent throughput when someone loads and runs their mega Python
> > application. This is especially the case when the volume of traffic
> > the application receives never goes any where near fully utilising the
> > actual resources available. For large systems, you would never even
> > depend on one machine anyway and load balance across a cluster. Thus
> > the focus by many on raw speed in many cases is just plain ridiculous
> > as there is a lot more to it than that.
>
> There is not only the problem on raw speed.
> There is also a problem of server resources usage.
>
> As an example, an Italian hosting company poses strict limits on
> resource usage for each client.

As would any sane web hosting company.

> They do not use Apache, since they fear that serving embedded
> applications limits their control

If they believe that embedded solutions like mod_python are the only
things available for Apache, then I can understand that. There are
other solutions though such as fastcgi and mod_wsgi daemon mode, so it
isn't as necessarily as unmanageable as they may believe. They perhaps
just don't know what options are available, don't understand the
technology well or how to manage it. I do admit though it would be
harder when it isn't your own application and you are hosting stuff
written by a third party.

> Using Nginx + the wsgi module has the benefit to require less system
> resources than flup (as an example) and, probabily, Apache.

Memory usage is also relative, just like network performance.
Configure Apache correctly and don't load modules you don't need and
the base overhead of Apache can be reduced quite a lot. For a big
system heavy on media using a separate media server such as nginx or
lighttpd can be sensible. One can then turn off keep alive on Apache
for the dynamic Python web application since keep alive doesn't
necessarily help there and will cause the sorts of issues the event
MPM attempts to solve. So, its manageable and there are known steps
one can take.

The real memory usage comes when someone loads up a Python web
application which requires 80-100MB per process at the outset before
much has even happened. Just because you are using another web hosting
solution, be it nginx or even a Python based web server, this will not
change that the Python web application is chewing up so much memory.

The one area where memory usage can be a problem with Python web
applications and which is not necessarily understood well by most
people, is the risk of concurrent requests causing a sudden burst in
memory usage. Imagine a specific URL which needs a large amount of
transient memory, for example something which is generating PDFs using
reportlab and PIL. All is okay if the URL only gets hit by one request
at a time, but if multiple requests hit at the same time, then your
memory blows out considerably as each request needs the large amount
of transient memory at the same time and once allocated it will be
retained by the process.

So, if one was using worker MPM to keep down the number of overall
processes and memory usage, you run the risk of this sort of problem
occurring. One could stop it occurring by implementing throttling in
the application, that is put locking on specific URLs which consumed
lots of transient memory to restrict number of concurrent requests,
but frankly I have never actually ever heard of anyone actually doing
it.

The alternative is to use prefork MPM, or similar model, such that
there can only be one active request in the process at a time. But
then you need more processes to handle the same number of requests, so
overall memory usage is high again. For large sites however, which can
afford lots of memory, using prefork would be the better way to go as
it will at least limit the possibilities of individual processes
spiking memory usage unexpectedly, with memory usage being more
predictable.

That all said, just because you aren't using threads and are handling
concurrency using an event driven system approach will not necessarily
isolate you from this specific problem. All in all it can be a tough
problem. If your web application demands are relatively simple then it
may never be an issue, but people are trying to do more and more
within the web application itself, rather than delegating it to
separate back end systems or programs. At the same time they want to
use cheap memory constrained VPS systems. So, lots of fun. :-)

Graham

Manlio Perillo

unread,
Mar 7, 2008, 5:30:29 AM3/7/08
to Brian Smith, Web SIG
Brian Smith ha scritto:

> Manlio Perillo wrote:
>> Brian Smith ha scritto:
>>> We already have standard mechanisms for doing something
>>> similar in WSGI: multi-threaded and multi-process WSGI
>>> gateways that let applications block indefinitely while
>>> letting other applications run.
>> Ok, but this is not the best solution to the problem!
>
> Why not?
>
>>> I think it is safe to say that multi-threaded or multi-process
>>> execution is something that is virtually required for WSGI.
>> but only if the application is synchronous and heavy I/O bound.
>
> Isn't that almost every WSGI application?
>

I'm not sure that a generic application that uses a database can be
considered *heavy* I/O bound.

Compare, as an example, a query to a database that can take up to 0.2
seconds with an HTTP request to a web service that can take up to 2 seconds.


>> Note that Nginx is multi-process, but it only executes a
>> fixed number of worker processes, so if an I/O request can
>> block for a significative amount of time, you can not afford
>> to let it block.
>
> Can't you just increase the number of processes?
>

Yes, but you should agree withe me that the asynchronous solution is
more optimized.

Moreover my application needs to run in a shared hosting, where there is
a limit on the mumber of processes an user can execute.

I can not run too many worker processes.


>> Moreover with an asynchronous gateway it is possible to
>> implement a "middleware" that can execute an application
>> inside a thread.
>>
>> This is possible by creating a pipe, starting a new thread,
>> having the main thread polling the pipe, and having the
>> thread write some data in the pipe to "wake" the main thread
>> when finish its job.
>
> Right. This is exactly what I was saying. By using
> multiprocessing/multithreading, each application can block as much as it
> wants.
>

Ok, but the middleware *needs* the poll extension :).

So the best solution, IMHO, is to implement the WSGI 1.0 spec for Nginx,
and then implement a pure Python middleware/adapter that will execute a
WSGI 2.0 application in a thread.

However if some corrections are going to be implemented in WSGI 2.0, I
would like to have them "backported" to WSGI 1.1, as an example.

>>> Again, I like the simplification that WSGI 2.0 applications
>>> are always functions or function-like callables, and never
>>> iterables.
>> Where is the simplification?
>
> My understanding is that the application callable never returns an
> interator (it never yields, it only returns). This is simpler to explain
> to people that are new to WSGI.

This is indeed true.
I too found some problems when I first read the WSGI specification.

*However* now it seems to me the most natural API.
It only needs some practice.

> It also simplifies the language in the
> specification. The difference is basically immaterial to WSGI gateway
> implementers, but that is because the WSGI specification is biased
> towards making gateways simple to implement.
>

No, it also make it simpler to implement.

> [...]

Ok.

> [...]
>

> My understanding is that nginx completely buffers all input, so that all
> reads from wsgi.input are basically non-blocking.
>

Right.
This makes my life easier, since I can just use a cStringIO of File
object :).

However in future the Nginx author is planning to add support for input
filters and chunked request bodies.

At that time, I will implement an extension that will allow a non
blocking (asynchronous) reading from wsgi.input.


> - Brian
>


Manlio Perillo

br...@tmbx.com

unread,
Mar 7, 2008, 8:56:16 AM3/7/08
to web...@python.org
>Manlio Perillo wrote:
>> Brian Smith wrote:

>> For "non-blocking reads", given environ["wsgi.input"].read(64000,
>> min=8000):
>>

>> 1. If more than 64000 bytes are available without blocking, [64000] bytes


>> are returned.
>> 2. If less than 8000 bytes are available without blocking, then the

>> gateway blocks until at least [8000] bytes are available.


>> 3. When 8000-63999 bytes are available, then all those bytes are
>> returned.

I made some typos when I was editing the above description. I have
corrected them by replacing the typos with the corrected text in
[brackets]. I hope it makes more sense now.

Additionally:

4. If the "min" parameter is absent, or less than zero, then it defaults
to being equal to the first argument (i.e. the current always-blocking
behavior).

5. There is no way to distinguish "no input available yet" from EOF, when
min=0 using just read(). Instead, some other mechanism must be used to
detect EOF if true non-blocking reads (min=0) are used. The vast majority
of the time, keeping a count of the bytes read and comparing to
CONTENT_LENGTH will be enough.

Note also that calling this "non-blocking" is really not precise, because
sometimes it does block due to rule #2 above.

- Brian

Reply all
Reply to author
Forward
0 new messages