[Python-ideas] asyncore: included batteries don't fit

151 views
Skip to first unread message

chrysn

unread,
Sep 22, 2012, 12:31:06 PM9/22/12
to python...@python.org
hello python-ideas,

i'd like to start discussion about the state of asyncore/asynchat's
adaption in the python standard library, with the intention of finding a
roadmap for how to improve things, and of kicking off and coordinating
implementations.

here's the problem (as previously described in [issue15978] and
redirected here, with some additions):

the asyncore module would be much more useful if it were well integrated
in the standard library. in particular, it should be supported by:

* subprocess

* BaseHTTPServer / http.server (and thus, socketserver)

* urllib2 / urllib, http.client

* probably many other network libraries except smtpd, which already uses
asyncore

* third party libraries (if stdlib leads the way, the ecosystem will
follow; eg pyserial)

without widespread asyncore support, it is not possible to easily
integrate different servers and services with each other; with asyncore
support, it's just a matter of creating the objects and entering the
main loop. (eg, a http server for controlling a serial device, with a
telnet-like debugging interface).

some examples of the changes required:

* the socketserver documents that it would like to have such a
framework ("Future work: [...] Standard framework for select-based
multiplexing"). due to the nature of socketserver based
implementations (blocking reads), we can't just "add glue so it
works", but there could be extensions so that implementations can be
ported to asynchronous socketservers. i've done if for a particular
case (ported SimpleHTTPServer, but it's a mess of monkey-patching and
intermediate StringIOs).

* for subprocess, there's a bunch of recipies at [1].

* pyserial (not standard library, but might as well become) can be
ported quite easily [2].


this touches several modules whose implementations can be handled
independently from each other; i'd implement some of them myself.
terry.reedy redirected me from the issue tracker to this list, hoping
for controversy and alternatives. if you'd like to discuss, throw in
questions, and we'll find a solution. if you'd think talk is cheap, i
can try to work out first sketches.


python already has batteries for nonblocking operation included, and i
say it's doing it right -- let's just make sure the batteries fit in the
other gadgets!


yours truly
chrysn


[1] http://code.activestate.com/recipes/576957-asynchronous-subprocess-using-asyncore/
[2] http://sourceforge.net/tracker/?func=detail&aid=3559321&group_id=46487&atid=446305
[issue15978] http://bugs.python.org/issue15978


--
Es ist nicht deine Schuld, dass die Welt ist, wie sie ist -- es wär' nur
deine Schuld, wenn sie so bleibt.
(You are not to blame for the state of the world, but you would be if
that state persisted.)
-- Die Ärzte
signature.asc

Oleg Broytman

unread,
Sep 22, 2012, 12:52:53 PM9/22/12
to python...@python.org
Hi!

On Sat, Sep 22, 2012 at 06:31:06PM +0200, chrysn <chr...@fsfe.org> wrote:
> the asyncore module would be much more useful if it were well integrated
> in the standard library. in particular, it should be supported by:
>
> * BaseHTTPServer / http.server (and thus, socketserver)
>
> * urllib2 / urllib, http.client
>
> * probably many other network libraries except smtpd, which already uses
> asyncore

It seems you want Twisted, no?

Oleg.
--
Oleg Broytman http://phdru.name/ p...@phdru.name
Programmers don't die, they just GOSUB without RETURN.
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

chrysn

unread,
Sep 22, 2012, 2:27:10 PM9/22/12
to python...@python.org
On Sat, Sep 22, 2012 at 08:52:53PM +0400, Oleg Broytman wrote:
> On Sat, Sep 22, 2012 at 06:31:06PM +0200, chrysn <chr...@fsfe.org> wrote:
> > the asyncore module would be much more useful if it were well integrated
> > in the standard library. in particular, it should be supported by:
> >
> > [...]
>
> It seems you want Twisted, no?

if these considerations end in twisted being consecrated as the new
asyncore, i'd consider that a valid solution too. then, again,
subprocess and the onboard servers should work well with *that* out of
the box.

best regards
chrysn

--
To use raw power is to make yourself infinitely vulnerable to greater powers.
-- Bene Gesserit axiom
signature.asc

Antoine Pitrou

unread,
Sep 22, 2012, 2:46:06 PM9/22/12
to python...@python.org
On Sat, 22 Sep 2012 18:31:06 +0200
chrysn <chr...@fsfe.org> wrote:
> hello python-ideas,
>
> i'd like to start discussion about the state of asyncore/asynchat's
> adaption in the python standard library, with the intention of finding a
> roadmap for how to improve things, and of kicking off and coordinating
> implementations.
>
> here's the problem (as previously described in [issue15978] and
> redirected here, with some additions):
>
> the asyncore module would be much more useful if it were well integrated
> in the standard library. in particular, it should be supported by:

SSL support is also lacking:
http://bugs.python.org/issue10084

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net

Oleg Broytman

unread,
Sep 22, 2012, 2:52:10 PM9/22/12
to python...@python.org
On Sat, Sep 22, 2012 at 08:27:10PM +0200, chrysn <chr...@fsfe.org> wrote:
> On Sat, Sep 22, 2012 at 08:52:53PM +0400, Oleg Broytman wrote:
> > On Sat, Sep 22, 2012 at 06:31:06PM +0200, chrysn <chr...@fsfe.org> wrote:
> > > the asyncore module would be much more useful if it were well integrated
> > > in the standard library. in particular, it should be supported by:
> > >
> > > [...]
> >
> > It seems you want Twisted, no?
>
> if these considerations end in twisted being consecrated as the new
> asyncore, i'd consider that a valid solution too.

If you mean that Twisted will be included in the standard library --
then no, I'm sure it will not. Python comes with batteries included, but
Twisted is not a battery, it's rather a power plant. I am sure it will
always be developed and distributed separately.
And developing asyncore to the level of Twisted would be a
duplication of effort.

> then, again,
> subprocess and the onboard servers should work well with *that* out of
> the box.

If you want subprocess and Twisted to work together -- you know where
to send patches.

PS. In my not so humble opinion what the standard library really lacks
in this area is a way to combine a few asynchronous libraries with
different mainloops. Think about wxPython+Twisted in one program. But I
have no slightest idea how to approach the problem.

Amaury Forgeot d'Arc

unread,
Sep 22, 2012, 3:50:39 PM9/22/12
to python...@python.org
2012/9/22 Oleg Broytman <p...@phdru.name>:
> PS. In my not so humble opinion what the standard library really lacks
> in this area is a way to combine a few asynchronous libraries with
> different mainloops. Think about wxPython+Twisted in one program. But I
> have no slightest idea how to approach the problem.

Twisted proposes a wxreactor, of course.

--
Amaury Forgeot d'Arc

Oleg Broytman

unread,
Sep 22, 2012, 4:05:42 PM9/22/12
to python...@python.org
On Sat, Sep 22, 2012 at 09:50:39PM +0200, Amaury Forgeot d'Arc <amau...@gmail.com> wrote:
> 2012/9/22 Oleg Broytman <p...@phdru.name>:
> > PS. In my not so humble opinion what the standard library really lacks
> > in this area is a way to combine a few asynchronous libraries with
> > different mainloops. Think about wxPython+Twisted in one program. But I
> > have no slightest idea how to approach the problem.
>
> Twisted proposes a wxreactor, of course.

And wxPython has a meaning to extend its main loop. But these are
only partial solutions. There are much more libraries with mainloops.
D-Bus/GLib, e.g.

Oleg.
--
Oleg Broytman http://phdru.name/ p...@phdru.name
Programmers don't die, they just GOSUB without RETURN.

chrysn

unread,
Sep 22, 2012, 4:16:53 PM9/22/12
to Oleg Broytman
On Sat, Sep 22, 2012 at 10:52:10PM +0400, Oleg Broytman wrote:
> If you mean that Twisted will be included in the standard library --
> then no, I'm sure it will not. Python comes with batteries included, but
> Twisted is not a battery, it's rather a power plant. I am sure it will
> always be developed and distributed separately.
> And developing asyncore to the level of Twisted would be a
> duplication of effort.

> PS. In my not so humble opinion what the standard library really lacks
> in this area is a way to combine a few asynchronous libraries with
> different mainloops. Think about wxPython+Twisted in one program. But I
> have no slightest idea how to approach the problem.

well, what about python including a battery and a battery plug, then?
asyncore could be the battery, and a interface between asynchronous
libraries the battery plug. users could start developing with batteries,
and when the project grows, just plug it into a power plant.

less analogy, more technical: the asyncore dispatcher to main loop
interface is pretty thin -- there is a (global or explicitly passed)
"map" (a dictionary), mapping file descriptors to objects that can be
readable or writable (or acceptable, not sure if that detail is really
needed that far down). a dispatcher registers to a map, and then the
main loop select()s for events on all files and dispatches them
accordingly.

it won't be as easy as just taking that interface, eg because it lacks
timeouts, but i think it can be the "way to combine a few asynchronous
libraries". (to avoid asyncore becoming a powerplant itself, it could
choose not to implement some features for simplicity. for example, if
asyncore chose to still not implement timeouts, registering timeouts to
a asyncore based main loop would just result in a NotImplementedError
telling the user to get a more powerful main loop.)

i don't want to claim i know how that could work in detail or even if it
could work at all, but if this is interesting for enough people that it
will be used, i'd like to find out.


> > then, again, subprocess and the onboard servers should work well
> > with *that* out of the box.
>
> If you want subprocess and Twisted to work together -- you know where
> to send patches.

no, actually -- for now, it'd be a patch to twisted (who'd reply with
"we already have a way of dealing with it"). if asyncore's interface
becomes the battery plug, it'd be a patch to subprocess.


thanks for sharing your ideas
signature.asc

Terry Reedy

unread,
Sep 22, 2012, 7:04:44 PM9/22/12
to python...@python.org
On 9/22/2012 2:46 PM, Antoine Pitrou wrote:
> On Sat, 22 Sep 2012 18:31:06 +0200
> chrysn <chr...@fsfe.org> wrote:
>> hello python-ideas,
>>
>> i'd like to start discussion about the state of asyncore/asynchat's
>> adaption in the python standard library, with the intention of finding a
>> roadmap for how to improve things, and of kicking off and coordinating
>> implementations.
>>
>> here's the problem (as previously described in [issue15978] and
>> redirected here, with some additions):
>>
>> the asyncore module would be much more useful if it were well integrated
>> in the standard library. in particular, it should be supported by:
>
> SSL support is also lacking:
> http://bugs.python.org/issue10084

chrysn: The issue needs a patch that incorporates Antoine's review. I am
sure there are other asyncore issues that could use help too.

--
Terry Jan Reedy

Giampaolo Rodolà

unread,
Sep 24, 2012, 6:31:37 PM9/24/12
to chrysn, python...@python.org

I still think this proposal is too vaguely defined and any effort towards adding async IO support to existing batteries is premature for different reasons, first of which the inadequacy of asyncore as the base async framework to fulfill the task you're proposing.

asyncore is so old and difficult to fix/enhance without breaking backward compatibility (see for example http://bugs.python.org/issue11273#msg156439) that relying on it for any modern work is inevitably a bad idea.

From a chronological standpoint I still think the best thing to do in order to fix the "python async problem" once and for all is to first define and possibly implement an "async WSGI interface" describing what a standard async IO loop/reactor should look like (in terms of API) and how to integrate with it, see:

From there the python stdlib *might* grow a new module implementing the "async WSGI interface" (let's call it asyncore2) and some of the stdlib batteries such as socketserver can possibly use it.

In my mind this is the ideal long-term scenario but even managing to define an "async WSGI interface" alone would be a big step forward.

Again, at this point in time what you're proposing looks too vague, ambitious and premature to me.


--- Giampaolo
 


2012/9/22 chrysn <chr...@fsfe.org>

Josiah Carlson

unread,
Sep 24, 2012, 8:02:08 PM9/24/12
to Giampaolo Rodolà, python...@python.org
Temporarily un-lurking to reply to this thread (which I'll actually be reading).

Giampaolo and I talked about this for a bit over the weekend, and I
have to say that I agree with his perspective.

In particular, to get something better than asyncore, there must be
something minimally better to build upon. I don't really have an
opinion on what that minimally better thing should be named, but I do
agree that having a simple reactor API that has predictable behavior
over the variety of handlers (select, poll, epoll, kqueue, WSAEvent in
Windows, etc.) is necessary.

Now, let's get to brass tacks...

1. Whatever reactors are available, you need to be able to instantiate
multiple of different types of reactors and multiple instances of the
same type of reactor simultaneously (to support multiple threads
handling different groups of reactors, or different reactors for
different types of objects on certain platforms). While this allows
for insanity in the worst-case, we're all consenting adults here, so
shouldn't be limited by reactor singletons. There should be a default
reactor class, which is defined on module/package import (use the
"best" one for the platform).

2. The API must be simple. I am not sure that it can get easier than
Idea #3 from:
http://mail.python.org/pipermail/python-ideas/2012-May/015245.html
I personally like it because it offers a simple upgrade path for
asyncore users (create your asyncore-derived classes, pass it into the
new reactor), while simultaneously defining a relatively easy API for
any 3rd party to integrate with. By offering an easy-to-integrate
method for 3rd parties (that is also sane), there is the added bonus
that 3rd parties are more likely to integrate, rather than replace,
which means more use in the "real world", better bug reports, etc. To
simplify integration further, make the API register(fd, handler,
events=singleton). Passing no events from the caller means "register
me for all events", which will help 3rd parties that aren't great with
handling read/write registration.

3. I don't have a 3rd tack, you can hang things on the wall with 2 ;)

Regards,
- Josiah

chrysn

unread,
Sep 26, 2012, 4:17:18 AM9/26/12
to Josiah Carlson, Giampaolo Rodolà, python...@python.org
On Mon, Sep 24, 2012 at 03:31:37PM -0700, Giampaolo Rodolà wrote:
> From a chronological standpoint I still think the best thing to do in order
> to fix the "python async problem" once and for all is to first define and
> possibly implement an "async WSGI interface" describing what a standard
> async IO loop/reactor should look like (in terms of API) and how to
> integrate with it, see:
> http://mail.python.org/pipermail/python-ideas/2012-May/015223.html
> http://mail.python.org/pipermail/python-ideas/2012-May/015235.html

i wasn't aware that pep 3153 exists. given that, my original intention
of this thread should be re-worded into "let's get pep3153 along!".

i'm not convinced by the api suggested in the first mail, as it sounds
very unix centric (poll, read/write/error). i rather imagined leaving
the details of the callbackable/mainloop interaction to be platform
details. (a win32evtlog event source just couldn't possibly register
with a select() based main loop). i'd prefer to keep the part that
registers with the a main loop concentrated to a very lowlevel common
denominator. for unix, that'd mean that there is a basic callbackable
for "things that receive events because they have a fileno". everything
above that, eg the distinction whether a "w" event means that we can
write() or that we must accept() could happen above that and wouldn't
have to be concerned with the main loop integration any more.

in case (pseudo)code gets the idea over better:

class UnixFilehandle(object):
def __init__(self, fileno):
self._fileno = fileno

def register_with_main_loop(self, mainloop):
# it might happen that the main loop doesn't support unix
# filenos. tough luck, in that case -- the developer should
# select a more suitable main loop.
mainloop.register_unix_fileno(self._fileno, self)

def handle_r_event(self): raise NotImplementedError("Not configured to receive that sort of event")
# if you're sure you'd never receive any anyway, you can
# not-register them by setting them None in the subclass
handle_w_event = handle_e_event = handle_r_event

class SocketServer(UnixFilehandle):
def __init__(self, socket):
self._socket = socket
UnixFilehandle.init(socket.fileno())

def handle_w_event(self):
self.handle_accept_event(self.socket.accept())

other interfaces parallel to the file handle interface would, for
example, handle unix signals. (built atop of that, like the
accept-handling socket server, could be an that deals with child
processes.) the interface for android might look different again,
because there is no main loop and select never gets called by the
application.

> From there the python stdlib *might* grow a new module implementing the
> "async WSGI interface" (let's call it asyncore2) and some of the stdlib
> batteries such as socketserver can possibly use it.
>
> In my mind this is the ideal long-term scenario but even managing to define
> an "async WSGI interface" alone would be a big step forward.

i'd welcome such an interface. if asyncore can then be retrofitted to
accept that interface too w/o breaking compatibility, it'd be nice, but
if not, it's asyncore2, then.

> Again, at this point in time what you're proposing looks too vague,
> ambitious and premature to me.

please don't get me wrong -- i'm not proposing anything for immediate
action, i just want to start a thinking process towards a better
integrated stdlib.


On Mon, Sep 24, 2012 at 05:02:08PM -0700, Josiah Carlson wrote:
> 1. Whatever reactors are available, you need to be able to instantiate
> multiple of different types of reactors and multiple instances of the
> same type of reactor simultaneously (to support multiple threads
> handling different groups of reactors, or different reactors for
> different types of objects on certain platforms). While this allows
> for insanity in the worst-case, we're all consenting adults here, so
> shouldn't be limited by reactor singletons. There should be a default
> reactor class, which is defined on module/package import (use the
> "best" one for the platform).

i think that's already common. with asyncore, you can have different
maps (just one is installed globally as default). with the gtk main
loop, it's a little tricky (the gtk.main() function doesn't simply take
an argument), but the underlying glib can do that afaict.

> 2. The API must be simple. I am not sure that it can get easier than
> Idea #3 from:
> http://mail.python.org/pipermail/python-ideas/2012-May/015245.html

it's good that the necessities of call_later and call_every are
mentioned here, i'd have forgotten about them.


we've talked about many things we'd need in a python asynchronous
interface (not implementation), so what are the things we *don't* need?
(so we won't start building a framework like twisted). i'll start:

* high-level protocol handling (can be extra modules atop of it)
* ssl
* something like the twisted delayed framework (not sure about that, i
guess the twisted people will have good reason to use it, but i don't
see compelling reasons for such a thing in a minimal interface from my
limited pov)
* explicit connection handling (retries, timeouts -- would be up to the
user as well, eg urllib might want to set up a timeout and retries for
asynchronous url requests)

best regards
signature.asc

Josiah Carlson

unread,
Sep 26, 2012, 1:02:24 PM9/26/12
to chrysn, python...@python.org
On Wed, Sep 26, 2012 at 1:17 AM, chrysn <chr...@fsfe.org> wrote:
> On Mon, Sep 24, 2012 at 03:31:37PM -0700, Giampaolo Rodolà wrote:
>> From a chronological standpoint I still think the best thing to do in order
>> to fix the "python async problem" once and for all is to first define and
>> possibly implement an "async WSGI interface" describing what a standard
>> async IO loop/reactor should look like (in terms of API) and how to
>> integrate with it, see:
>> http://mail.python.org/pipermail/python-ideas/2012-May/015223.html
>> http://mail.python.org/pipermail/python-ideas/2012-May/015235.html
>
> i wasn't aware that pep 3153 exists. given that, my original intention
> of this thread should be re-worded into "let's get pep3153 along!".

Go ahead and read PEP 3153, we will wait.

A careful reading of PEP 3153 will tell you that the intent is to make
a "light" version of Twisted built into Python. There isn't any
discussion as to *why* this is a good idea, it just lays out the plan
of action. Its ideas were gathered from the experience of the Twisted
folks.

Their experience is substantial, but in the intervening 1.5+ years
since Pycon 2011, only the barest of abstract interfaces has been
defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py),
and no discussion has taken place as to forward migration of the
(fairly large) body of existing asyncore code.

> i'm not convinced by the api suggested in the first mail, as it sounds
> very unix centric (poll, read/write/error). i rather imagined leaving
> the details of the callbackable/mainloop interaction to be platform
> details. (a win32evtlog event source just couldn't possibly register
> with a select() based main loop). i'd prefer to keep the part that

Of course not, but then again no one would attempt to do as much. They
would use a WSAEvent reactor, because that's the only thing that it
would work with. That said, WSAEvent should arguably be the default on
Windows, so this issue shouldn't even come up there. Also, worrying
about platform-specific details like "what if someone uses a source
that is relatively uncommon on the platform" is a red-herring; get the
interface/api right, build it, and start using it.

To the point, Giampaolo already has a reactor that implements the
interface (more or less "idea #3" from his earlier message), and it's
been used in production (under staggering ftp(s) load). Even better,
it offers effectively transparent replacement of the existing asyncore
loop, and supports existing asyncore-derived classes. It is available:
https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py
That is, incidentally, what Giampaolo has implemented already. I
encourage you to read the source I linked above.

>> From there the python stdlib *might* grow a new module implementing the
>> "async WSGI interface" (let's call it asyncore2) and some of the stdlib
>> batteries such as socketserver can possibly use it.
>>
>> In my mind this is the ideal long-term scenario but even managing to define
>> an "async WSGI interface" alone would be a big step forward.
>
> i'd welcome such an interface. if asyncore can then be retrofitted to
> accept that interface too w/o breaking compatibility, it'd be nice, but
> if not, it's asyncore2, then.

Easily done, because it's already been done ;)

>> Again, at this point in time what you're proposing looks too vague,
>> ambitious and premature to me.
>
> please don't get me wrong -- i'm not proposing anything for immediate
> action, i just want to start a thinking process towards a better
> integrated stdlib.

I am curious as to what you mean by "a better integrated stdlib". A
new interface that doesn't allow people to easily migrate from an
existing (and long-lived, though flawed) standard library is not
better integration. Better integration requires allowing previous
users to migrate, while encouraging new users to join in with any
later development. That's what Giampaolo's suggested interface offers
on the lowest level; something to handle file-handle reactors,
combined with a scheduler.

From there; whether layers like Twisted are evolved, or more shallow
layers (like much existing asyncore-derived classes) is yet to be
determined by actual people using it.

> On Mon, Sep 24, 2012 at 05:02:08PM -0700, Josiah Carlson wrote:
>> 1. Whatever reactors are available, you need to be able to instantiate
>> multiple of different types of reactors and multiple instances of the
>> same type of reactor simultaneously (to support multiple threads
>> handling different groups of reactors, or different reactors for
>> different types of objects on certain platforms). While this allows
>> for insanity in the worst-case, we're all consenting adults here, so
>> shouldn't be limited by reactor singletons. There should be a default
>> reactor class, which is defined on module/package import (use the
>> "best" one for the platform).
>
> i think that's already common. with asyncore, you can have different
> maps (just one is installed globally as default). with the gtk main
> loop, it's a little tricky (the gtk.main() function doesn't simply take
> an argument), but the underlying glib can do that afaict.

Remember that a reactor isn't just a dictionary of file handles to do
stuff on, it's the thing that determines what underlying platform
mechanics will be used to multiplex across channels. But that level of
detail will be generally unused by most people, as most people will
only use one at a time. The point of offering multiple reactors is to
allow people to be flexible if they choose (or to pick from the
different reactors if they know that one is faster for their number of
expected handles).

>> 2. The API must be simple. I am not sure that it can get easier than
>> Idea #3 from:
>> http://mail.python.org/pipermail/python-ideas/2012-May/015245.html
>
> it's good that the necessities of call_later and call_every are
> mentioned here, i'd have forgotten about them.
>
> we've talked about many things we'd need in a python asynchronous
> interface (not implementation), so what are the things we *don't* need?
> (so we won't start building a framework like twisted). i'll start:
>
> * high-level protocol handling (can be extra modules atop of it)
> * ssl
> * something like the twisted delayed framework (not sure about that, i
> guess the twisted people will have good reason to use it, but i don't
> see compelling reasons for such a thing in a minimal interface from my
> limited pov)
> * explicit connection handling (retries, timeouts -- would be up to the
> user as well, eg urllib might want to set up a timeout and retries for
> asynchronous url requests)

I disagree with the last 3. If you have an IO loop, more often than
not you want an opportunity to do something later in the same context.
This is commonly the case for bandwidth limiting, connection timeouts,
etc., which are otherwise *very* difficult to do at a higher level
(which are the reasons why schedulers are built into IO loops).
Further, SSL in async can be tricky to get right. Having the 20-line
SSL layer as an available class is a good idea, and will save people
time by not having them re-invent it (poorly or incorrectly) every
time.

Regards,
- Josiah

chrysn

unread,
Oct 3, 2012, 10:43:20 AM10/3/12
to Josiah Carlson, python...@python.org
On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote:
> Go ahead and read PEP 3153, we will wait.
>
> A careful reading of PEP 3153 will tell you that the intent is to make
> a "light" version of Twisted built into Python. There isn't any
> discussion as to *why* this is a good idea, it just lays out the plan
> of action. Its ideas were gathered from the experience of the Twisted
> folks.
>
> Their experience is substantial, but in the intervening 1.5+ years
> since Pycon 2011, only the barest of abstract interfaces has been
> defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py),
> and no discussion has taken place as to forward migration of the
> (fairly large) body of existing asyncore code.

it doesn't look like twisted-light to me, more like a interface
suggestion for a small subset of twisted. in particular, it doesn't talk
about main loops / reactors / registration-in-the-first-place.

you mention interaction with the twisted people. is there willingness,
from the twisted side, to use a standard python middle layer, once it
exists and has sufficiently high quality?


> To the point, Giampaolo already has a reactor that implements the
> interface (more or less "idea #3" from his earlier message), and it's
> been used in production (under staggering ftp(s) load). Even better,
> it offers effectively transparent replacement of the existing asyncore
> loop, and supports existing asyncore-derived classes. It is available:
> https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py

i've had a look at it, but honestly can't say more than that it's good
to have a well-tested asyncore compatible main loop with scheduling
support, and i'll try it out for my own projects.

> >> Again, at this point in time what you're proposing looks too vague,
> >> ambitious and premature to me.
> >
> > please don't get me wrong -- i'm not proposing anything for immediate
> > action, i just want to start a thinking process towards a better
> > integrated stdlib.
>
> I am curious as to what you mean by "a better integrated stdlib". A
> new interface that doesn't allow people to easily migrate from an
> existing (and long-lived, though flawed) standard library is not
> better integration. Better integration requires allowing previous
> users to migrate, while encouraging new users to join in with any
> later development. That's what Giampaolo's suggested interface offers
> on the lowest level; something to handle file-handle reactors,
> combined with a scheduler.

a new interface won't make integration automatically happen, but it's
something the standard library components can evolve on. whether, for
example urllib2 will then automatically work asynchronously in that
framework or whether we'll wait for urllib3, we'll see when we have it.

@migrate from an existing standard library: is there a big user base for
the current asyncore framework? my impression from is that it is not
very well known among python users, and most that could use it use
twisted.

> > we've talked about many things we'd need in a python asynchronous
> > interface (not implementation), so what are the things we *don't* need?
> > (so we won't start building a framework like twisted). i'll start:
> >
> > * high-level protocol handling (can be extra modules atop of it)
> > * ssl
> > * something like the twisted delayed framework (not sure about that, i
> > guess the twisted people will have good reason to use it, but i don't
> > see compelling reasons for such a thing in a minimal interface from my
> > limited pov)
> > * explicit connection handling (retries, timeouts -- would be up to the
> > user as well, eg urllib might want to set up a timeout and retries for
> > asynchronous url requests)
>
> I disagree with the last 3. If you have an IO loop, more often than
> not you want an opportunity to do something later in the same context.
> This is commonly the case for bandwidth limiting, connection timeouts,
> etc., which are otherwise *very* difficult to do at a higher level
> (which are the reasons why schedulers are built into IO loops).
> Further, SSL in async can be tricky to get right. Having the 20-line
> SSL layer as an available class is a good idea, and will save people
> time by not having them re-invent it (poorly or incorrectly) every
> time.

i see; those should be provided, then.


i'm afraid i don't completely get the point you're making, sorry for
that, maybe i've missed important statements or lack sufficiently deep
knowledge of topics affected and got lost in details.

what is your opinion on the state of asynchronous operations in python,
and what would you like it to be?

thanks for staying with this topic
signature.asc

Josiah Carlson

unread,
Oct 5, 2012, 2:51:21 PM10/5/12
to chrysn, python...@python.org
Things don't "automatically work" without work. You can't just make
urllib2 work asynchronously unless you do the sorts of greenlet-style
stack switching that lies to you about what is going on, or unless you
redesign it from scratch to do such. That's not to say that greenlets
are bad, they are great. But expecting that a standard library
implementing an updated async spec will all of a sudden hook itself
into a synchronous socket client? I think that expectation is
unreasonable.

> @migrate from an existing standard library: is there a big user base for
> the current asyncore framework? my impression from is that it is not
> very well known among python users, and most that could use it use
> twisted.

"Well known" is an interesting claim. I believe it actually known of
by quite a large part of the community, but due to a (perhaps
deserved) reputation (that may or may not still be the case), isn't
used as often as Twisted.

But along those lines, there are a few questions that should be asked:
1. Is it desirable to offer users the chance to transition from
asyncore-derived stuff to some new thing?
2. If so, what is necessary for an upgrade/replacement for
asyncore/asynchat in the long term?
3. Would 3rd parties use this as a basis for their libraries?
4. What are the short, mid, and long-term goals?

For my answers:
1. I think it is important to offer people who are using a standard
library module to continue using a standard library module if
possible.
2. A transition should offer either an adapter or similar-enough API
equivalency between the old and new.
3. I think that if it offers a reasonable API, good functionality, and
examples are provided - both as part of the stdlib and outside the
stdlib, people will see the advantages of maintaining less of their
own custom code. To the point: would Twisted use *whatever* was in the
stdlib? I don't know the answer, but unless the API is effectively
identical to Twisted, that transition may be delayed significantly.
4. Short: get current asyncore people transitioned to something
demonstrably better, that 3rd parties might also use. Mid: pull
parsers/logic out of cores of methods and make them available for
sync/async/3rd party parsing/protocol handling (get the best protocol
parsers into the stdlib, separated from the transport). Long: everyone
contributes/updates the stdlib modules because it has the best parsers
for protocols/formats, that can be used from *anywhere* (sync or
async).

My long-term dream (which has been the case for 6+ years, since I
proposed doing it myself on the python-dev mailing list and was told
"no") is that whether someone uses urllib2, httplib2, smtpd, requests,
ftplib, etc., they all have access to high-quality protocol-level
protocol parsers. So that once one person writes the bit that handles
http 30X redirects, everyone can use it. So that when one person
writes the gzip + chunked transfer encoding/decoding, everyone can use
it.
I think it is functional, but flawed. I also think that every 3rd
party that does network-level protocols are different mixes of
functional and flawed. I think that there is a repeated and
often-times wasted effort where folks are writing different and
invariably crappy (to some extent) protocol parsers and network
handlers. I think that whenever possible, that should stop, and the
highest-quality protocol parsing functions/methods should be available
in the Python standard library, available to be called from any
library, whether sync, async, stdlib, or 3rd party.

Now, my discussions in the context of asyncore-related upgrades may
seem like a strange leap, but some of these lesser-quality parsing
routines exist in asyncore-derived classes, as well as
non-asyncore-derived classes. But if we make an effort on the asyncore
side of things, under the auspices of improving one stdlib module,
offering additional functionality, the obviousness of needing
protocol-level parsers shared among sync/async should become obvious
to *everyone* (that it isn't now the case I suspect is because the
communities either don't spend a lot of time cross-pollinating, people
like writing parsers - I do too ;) - or the sync folks end up going
the greenlet route if/when threading bites them on the ass).

Antoine Pitrou

unread,
Oct 5, 2012, 4:09:54 PM10/5/12
to python...@python.org
On Fri, 5 Oct 2012 11:51:21 -0700
Josiah Carlson <josiah....@gmail.com>
wrote:
>
> My long-term dream (which has been the case for 6+ years, since I
> proposed doing it myself on the python-dev mailing list and was told
> "no") is that whether someone uses urllib2, httplib2, smtpd, requests,
> ftplib, etc., they all have access to high-quality protocol-level
> protocol parsers.

I'm not sure what you're talking about: what were you told "no" about,
specifically? Your proposal sounds reasonable and (ideally) desirable to
me.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Guido van Rossum

unread,
Oct 6, 2012, 6:00:54 PM10/6/12
to Josiah Carlson, python...@python.org
This is an incredibly important discussion.

I would like to contribute despite my limited experience with the
various popular options. My own async explorations are limited to the
constraints of the App Engine runtime environment, where a rather
unique type of reactor is required. I am developing some ideas around
separating reactors, futures, and yield-based coroutines, but they
take more thinking and probably some experimental coding before I'm
ready to write it up in any detail. For a hint on what I'm after, you
might read up on monocle (https://github.com/saucelabs/monocle) and my
approach to building coroutines on top of Futures
(http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349).

In the mean time I'd like to bring up a few higher-order issues:

(1) How importance is it to offer a compatibility path for asyncore? I
would have thought that offering an integration path forward for
Twisted and Tornado would be more important.

(2) We're at a fork in the road here. On the one hand, we could choose
to deeply integrate greenlets/gevents into the standard library. (It's
not monkey-patching if it's integrated, after all. :-) I'm not sure
how this would work for other implementations than CPython, or even
how to address CPython on non-x86 architectures. But users seem to
like the programming model: write synchronous code, get async
operation for free. It's easy to write protocol parsers that way. On
the other hand, we could reject this approach: the integration would
never be completely smooth, there's the issue of other implementations
and architectures, it probably would never work smoothly even for
CPython/x86 when 3rd party extension modules are involved.
Callback-based APIs don't have these downsides, but they are harder to
program; however we can make programming them easier by using
yield-based coroutines. Even Twisted offers those (inline callbacks).

Before I invest much more time in these ideas I'd like to at least
have (2) sorted out.

--
--Guido van Rossum (python.org/~guido)

Massimo DiPierro

unread,
Oct 6, 2012, 6:10:09 PM10/6/12
to Guido van Rossum, python...@python.org
I would strongly support integrating gevents into the standard library.
That would finally make me switch to Python 3. :-)

Antoine Pitrou

unread,
Oct 6, 2012, 6:24:02 PM10/6/12
to python...@python.org
On Sat, 6 Oct 2012 15:00:54 -0700
Guido van Rossum <gu...@python.org> wrote:
>
> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library. (It's
> not monkey-patching if it's integrated, after all. :-) I'm not sure
> how this would work for other implementations than CPython, or even
> how to address CPython on non-x86 architectures. But users seem to
> like the programming model: write synchronous code, get async
> operation for free. It's easy to write protocol parsers that way. On
> the other hand, we could reject this approach: the integration would
> never be completely smooth, there's the issue of other implementations
> and architectures, it probably would never work smoothly even for
> CPython/x86 when 3rd party extension modules are involved.
> Callback-based APIs don't have these downsides, but they are harder to
> program; however we can make programming them easier by using
> yield-based coroutines. Even Twisted offers those (inline callbacks).

greenlets/gevents only get you half the advantages of single-threaded
"async" programming: they get you scalability in the face of a high
number of concurrent connections, but they don't get you the robustness
of cooperative multithreading (because it's not obvious when reading
the code where the possible thread-switching points are).

(I don't actually understand the attraction of gevent, except for
extreme situations; threads should be cheap on a decent OS)

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Josiah Carlson

unread,
Oct 6, 2012, 6:44:05 PM10/6/12
to Antoine Pitrou, python...@python.org
On Fri, Oct 5, 2012 at 1:09 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Fri, 5 Oct 2012 11:51:21 -0700
> Josiah Carlson <josiah....@gmail.com>
> wrote:
>>
>> My long-term dream (which has been the case for 6+ years, since I
>> proposed doing it myself on the python-dev mailing list and was told
>> "no") is that whether someone uses urllib2, httplib2, smtpd, requests,
>> ftplib, etc., they all have access to high-quality protocol-level
>> protocol parsers.
>
> I'm not sure what you're talking about: what were you told "no" about,
> specifically? Your proposal sounds reasonable and (ideally) desirable to
> me.

I've managed to find the email where I half-way proposed it (though
not as pointed as what I posted above):
http://mail.python.org/pipermail/python-dev/2004-November/049827.html

Phillip J. Eby said in a reply that policy would kill it. My
experience at the time told me that policy was a tough nut to crack,
and my 24-year old self wasn't confident enough to keep pushing (even
though I had the time). Now, my 32-year old self has the confidence
and the knowledge to do it (or advise how to do it), but not the time
(I'm finishing up my first book, doing a conference tour, running a
startup, and preparing for my first child).

One of the big reasons why I like and am pushing Giampaolo's ideas
(and existing code) is my faith that he *can* and *will* do it, if he
says he will.

Regards,
- Josiah

Guido van Rossum

unread,
Oct 6, 2012, 8:23:48 PM10/6/12
to Antoine Pitrou, python...@python.org
On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Sat, 6 Oct 2012 15:00:54 -0700
> Guido van Rossum <gu...@python.org> wrote:
>>
>> (2) We're at a fork in the road here. On the one hand, we could choose
>> to deeply integrate greenlets/gevents into the standard library. (It's
>> not monkey-patching if it's integrated, after all. :-) I'm not sure
>> how this would work for other implementations than CPython, or even
>> how to address CPython on non-x86 architectures. But users seem to
>> like the programming model: write synchronous code, get async
>> operation for free. It's easy to write protocol parsers that way. On
>> the other hand, we could reject this approach: the integration would
>> never be completely smooth, there's the issue of other implementations
>> and architectures, it probably would never work smoothly even for
>> CPython/x86 when 3rd party extension modules are involved.
>> Callback-based APIs don't have these downsides, but they are harder to
>> program; however we can make programming them easier by using
>> yield-based coroutines. Even Twisted offers those (inline callbacks).
>
> greenlets/gevents only get you half the advantages of single-threaded
> "async" programming: they get you scalability in the face of a high
> number of concurrent connections, but they don't get you the robustness
> of cooperative multithreading (because it's not obvious when reading
> the code where the possible thread-switching points are).

I used to think that too, long ago, until I discovered that as you add
abstraction layers, cooperative multithreading is untenable -- sooner
or later you will lose track of where the threads are switched.

> (I don't actually understand the attraction of gevent, except for
> extreme situations; threads should be cheap on a decent OS)

I think it's the observation that the number of sockets you can
realistically have open in a single process or machine is always 1-2
orders of maginuted larger than the number of threads you can have --
and this makes sense since the total amount of memory (kernel and
user) to represent a socket is just much smaller than needed for a
thread. Just check the configuration limits of your typical Linux
kernel if you don't believe me. :-)

--
--Guido van Rossum (python.org/~guido)

Carlo Pires

unread,
Oct 6, 2012, 8:45:59 PM10/6/12
to python...@python.org
+1000

Can we dream with gevent integrated to standard cpython ? This would be a fantastic path for 3.4 :)

And I definitely should move to 3.x.

Because for web programming, I just can't think another way to program using python. I'm seeing some people going to other languages where async is more easy like Go (some are trying Erlang). Async is a MUST HAVE for web programming these days...

In my experience, I've found that "robustness of cooperative multithreading" come at the price of a code difficult to maintain. And, in single threading it never reach the SMP benefits with easy. Thats why erlang shines... it abstracts the hard work of to maintain the switching under control. Gevent walks the same line: makes the programmer life easier.

--
  Carlo Pires


2012/10/6 Guido van Rossum <gu...@python.org>

Josiah Carlson

unread,
Oct 6, 2012, 10:22:26 PM10/6/12
to Guido van Rossum, python...@python.org
On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <gu...@python.org> wrote:
> This is an incredibly important discussion.
>
> I would like to contribute despite my limited experience with the
> various popular options. My own async explorations are limited to the
> constraints of the App Engine runtime environment, where a rather
> unique type of reactor is required. I am developing some ideas around
> separating reactors, futures, and yield-based coroutines, but they
> take more thinking and probably some experimental coding before I'm
> ready to write it up in any detail. For a hint on what I'm after, you
> might read up on monocle (https://github.com/saucelabs/monocle) and my
> approach to building coroutines on top of Futures
> (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349).

Yield-based coroutines like monocle are the simplest way to do
multi-paradigm in the same code. Whether you have a async-style
reactor, greenlet-style stack switching, cooperatively scheduled
generator trampolines, or just plain blocking threaded sockets; that
style works with all of them (the futures and wrapper around
everything just looks a little different).

That said, it forces everyone to drink the same coroutine-styled
kool-aid. That doesn't bother me. But I understand it, and have built
similar systems before. I don't have an intuition about whether 3rd
parties will like it or will migrate to it. Someone want to ping the
Twisted and Tornado folks about it?

> In the mean time I'd like to bring up a few higher-order issues:
>
> (1) How importance is it to offer a compatibility path for asyncore? I
> would have thought that offering an integration path forward for
> Twisted and Tornado would be more important.
>
> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library. (It's
> not monkey-patching if it's integrated, after all. :-) I'm not sure
> how this would work for other implementations than CPython, or even
> how to address CPython on non-x86 architectures. But users seem to
> like the programming model: write synchronous code, get async
> operation for free. It's easy to write protocol parsers that way. On
> the other hand, we could reject this approach: the integration would
> never be completely smooth, there's the issue of other implementations
> and architectures, it probably would never work smoothly even for
> CPython/x86 when 3rd party extension modules are involved.
> Callback-based APIs don't have these downsides, but they are harder to
> program; however we can make programming them easier by using
> yield-based coroutines. Even Twisted offers those (inline callbacks).
>
> Before I invest much more time in these ideas I'd like to at least
> have (2) sorted out.

Combining your responses to #1 and now this, are you proposing a path
forward for Twisted/Tornado to be greenlets? That's an interesting
approach to the problem, though I can see the draw. ;)

I have been hesitant on the Twisted side of things for an arbitrarily
selfish reason. After 2-3 hours of reading over a codebase (which I've
done 5 or 6 times in the last 8 years), I ask myself whether I believe
I understand 80+% of how things work; how data flows, how
callbacks/layers are invoked, and whether I could add a piece of
arbitrary functionality to one layer or another (or to determine the
proper layer in which to add the functionality). If my answer is "no",
then my gut says "this is probably a bad idea". But if I start
figuring out the layers before I've finished my 2-3 hours, and I start
finding bugs? Well, then I think it's a much better idea, even if the
implementation is buggy.

Maybe something like Monocle would be better (considering your favor
for that style, it obviously has a leg-up on the competition). I don't
know. But if something like Monocle can merge it all together, then
maybe I'd be happy. Incidentally, I can think of a few different
styles of wrappers that would actually let people using
asyncore-derived stuff use something like Monocle. So maybe that's
really the right answer?

Regards,
- Josiah

P.S. Thank you for weighing in on this Guido. Even if it doesn't end
up the way I had originally hoped, at least now there's discussion.

Guido van Rossum

unread,
Oct 7, 2012, 12:05:13 AM10/7/12
to Josiah Carlson, python...@python.org
On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson <josiah....@gmail.com> wrote:
> On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <gu...@python.org> wrote:
>> This is an incredibly important discussion.
>>
>> I would like to contribute despite my limited experience with the
>> various popular options. My own async explorations are limited to the
>> constraints of the App Engine runtime environment, where a rather
>> unique type of reactor is required. I am developing some ideas around
>> separating reactors, futures, and yield-based coroutines, but they
>> take more thinking and probably some experimental coding before I'm
>> ready to write it up in any detail. For a hint on what I'm after, you
>> might read up on monocle (https://github.com/saucelabs/monocle) and my
>> approach to building coroutines on top of Futures
>> (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349).
>
> Yield-based coroutines like monocle are the simplest way to do
> multi-paradigm in the same code. Whether you have a async-style
> reactor, greenlet-style stack switching, cooperatively scheduled
> generator trampolines, or just plain blocking threaded sockets; that
> style works with all of them (the futures and wrapper around
> everything just looks a little different).

Glad I'm not completely crazy here. :-)

> That said, it forces everyone to drink the same coroutine-styled
> kool-aid. That doesn't bother me. But I understand it, and have built
> similar systems before. I don't have an intuition about whether 3rd
> parties will like it or will migrate to it. Someone want to ping the
> Twisted and Tornado folks about it?

They should be reading this. Or maybe we should bring it up on
python-dev before too long.

>> In the mean time I'd like to bring up a few higher-order issues:
>>
>> (1) How importance is it to offer a compatibility path for asyncore? I
>> would have thought that offering an integration path forward for
>> Twisted and Tornado would be more important.
>>
>> (2) We're at a fork in the road here. On the one hand, we could choose
>> to deeply integrate greenlets/gevents into the standard library. (It's
>> not monkey-patching if it's integrated, after all. :-) I'm not sure
>> how this would work for other implementations than CPython, or even
>> how to address CPython on non-x86 architectures. But users seem to
>> like the programming model: write synchronous code, get async
>> operation for free. It's easy to write protocol parsers that way. On
>> the other hand, we could reject this approach: the integration would
>> never be completely smooth, there's the issue of other implementations
>> and architectures, it probably would never work smoothly even for
>> CPython/x86 when 3rd party extension modules are involved.
>> Callback-based APIs don't have these downsides, but they are harder to
>> program; however we can make programming them easier by using
>> yield-based coroutines. Even Twisted offers those (inline callbacks).
>>
>> Before I invest much more time in these ideas I'd like to at least
>> have (2) sorted out.
>
> Combining your responses to #1 and now this, are you proposing a path
> forward for Twisted/Tornado to be greenlets? That's an interesting
> approach to the problem, though I can see the draw. ;)

Can't tell whether you're serious, but that's not what I meant. Surely
it will never fly for Twisted. Tornado apparently already works with
greenlets (though maybe through a third party hack). But personally
I'd be leaning towards rejecting greenlets, for the same reasons I've
kept the doors tightly shut for Stackless -- I like it fine as a
library, but not as a language feature, because I don't see how it can
be supported on all platforms where Python must be supported.

However I figured that if we define the interfaces well enough, it
might be possible to use (a superficially modified version of)
Twisted's reactors instead of the standard ones, and, orthogonally,
Twisted's deferred's could be wrapped in the standard Futures (or the
other way around?) when used with a non-Twisted reactor. Which would
hopefully open the door for migrating some of their more useful
protocol parsers into the stdlib.

> I have been hesitant on the Twisted side of things for an arbitrarily
> selfish reason. After 2-3 hours of reading over a codebase (which I've
> done 5 or 6 times in the last 8 years), I ask myself whether I believe
> I understand 80+% of how things work; how data flows, how
> callbacks/layers are invoked, and whether I could add a piece of
> arbitrary functionality to one layer or another (or to determine the
> proper layer in which to add the functionality). If my answer is "no",
> then my gut says "this is probably a bad idea". But if I start
> figuring out the layers before I've finished my 2-3 hours, and I start
> finding bugs? Well, then I think it's a much better idea, even if the
> implementation is buggy.

Can't figure what you're implying here. On which side does Twisted fall for you?

> Maybe something like Monocle would be better (considering your favor
> for that style, it obviously has a leg-up on the competition). I don't
> know. But if something like Monocle can merge it all together, then
> maybe I'd be happy.

My worry is that monocle is too simple and does not cater for advanced
needs. It doesn't seem to have caught on much outside the company
where it originated.

> Incidentally, I can think of a few different
> styles of wrappers that would actually let people using
> asyncore-derived stuff use something like Monocle. So maybe that's
> really the right answer?

I still don't really think asyncore is going to be a problem. It can
easily be separated into a reactor and callbacks.

> Regards,
> - Josiah
>
> P.S. Thank you for weighing in on this Guido. Even if it doesn't end
> up the way I had originally hoped, at least now there's discussion.

Hm, there seemed to be plenty of discussion before...

--
--Guido van Rossum (python.org/~guido)

Duncan McGreggor

unread,
Oct 7, 2012, 12:17:23 AM10/7/12
to Guido van Rossum, python...@python.org

Yup, we are. I've pinged others in the Twisted cabal on this matter, so hopefully you'll be hearing from one or more of us soon...

d
 

Devin Jeanpierre

unread,
Oct 7, 2012, 12:23:43 AM10/7/12
to Guido van Rossum, python...@python.org
On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum <gu...@python.org> wrote:
> However I figured that if we define the interfaces well enough, it
> might be possible to use (a superficially modified version of)
> Twisted's reactors instead of the standard ones, and, orthogonally,
> Twisted's deferred's could be wrapped in the standard Futures (or the
> other way around?) when used with a non-Twisted reactor. Which would
> hopefully open the door for migrating some of their more useful
> protocol parsers into the stdlib.

I thought futures were meant for thread and process pools? The
blocking methods make them a bad fit for an asynchronous networking
toolset.

The Twisted folks have discussed integrating futures and Twisted (see
also the reply, which has some corrections):

http://twistedmatrix.com/pipermail/twisted-python/2011-January/023296.html

-- Devin

Guido van Rossum

unread,
Oct 7, 2012, 12:35:42 AM10/7/12
to Devin Jeanpierre, python...@python.org


On Saturday, October 6, 2012, Devin Jeanpierre wrote:
On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum <gu...@python.org> wrote:
> However I figured that if we define the interfaces well enough, it
> might be possible to use (a superficially modified version of)
> Twisted's reactors instead of the standard ones, and, orthogonally,
> Twisted's deferred's could be wrapped in the standard Futures (or the
> other way around?) when used with a non-Twisted reactor. Which would
> hopefully open the door for migrating some of their more useful
> protocol parsers into the stdlib.

I thought futures were meant for thread and process pools? The
blocking methods make them a bad fit for an asynchronous networking
toolset.

The specific Future implementation in the py3k stdlib uses threads and is indeed meant for thread and process pools.

But the *concept* of futures works fine in event-based systems, see the link I posted into the NDB sources. I'm not keen on cancellation and threadpools FWIW.
 
The Twisted folks have discussed integrating futures and Twisted (see
also the reply, which has some corrections):

http://twistedmatrix.com/pipermail/twisted-python/2011-January/023296.html

-- Devin


Antoine Pitrou

unread,
Oct 7, 2012, 6:09:31 AM10/7/12
to python...@python.org
On Sat, 6 Oct 2012 17:23:48 -0700
Even with an explicit notation like "yield" / "yield from"?

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Guido van Rossum

unread,
Oct 7, 2012, 11:04:30 AM10/7/12
to Antoine Pitrou, python...@python.org
On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Sat, 6 Oct 2012 17:23:48 -0700
> Guido van Rossum <gu...@python.org> wrote:
>> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>> > greenlets/gevents only get you half the advantages of single-threaded
>> > "async" programming: they get you scalability in the face of a high
>> > number of concurrent connections, but they don't get you the robustness
>> > of cooperative multithreading (because it's not obvious when reading
>> > the code where the possible thread-switching points are).
>>
>> I used to think that too, long ago, until I discovered that as you add
>> abstraction layers, cooperative multithreading is untenable -- sooner
>> or later you will lose track of where the threads are switched.
>
> Even with an explicit notation like "yield" / "yield from"?

If you strictly adhere to using those you should be safe (though
distinguishing between the two may prove challenging) -- but in
practice it's hard to get everyone and every API to use this style. So
you'll have some blocking API calls hidden deep inside what looks like
a perfectly innocent call to some helper function.

IIUC in Go this is solved by mixing threads and lighter-weight
constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the
rest of the system continues to make progress by spawning another
thread.

My own experience with NDB is that it's just too hard to make everyone
use the async APIs all the time -- so I gave up and made async APIs an
optional feature, offering a blocking and an async version of every
API. I didn't start out that way, but once I started writing
documentation aimed at unsophisticated users, I realized that it was
just too much of an uphill battle to bother.

So I think it's better to accept this and deal with it, possibly
adding locking primitives into the mix that work well with the rest of
the framework. Building a lock out of a tasklet-based (i.e.
non-threading) Future class is easy enough.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 7, 2012, 8:52:41 PM10/7/12
to Duncan M. McGreggor, python...@python.org
On Sat, Oct 6, 2012 at 9:09 PM, Duncan M. McGreggor
<duncan.m...@gmail.com> wrote:
> We're here ;-)
>
> I'm forwarding this to the rest of the Twisted cabal...

Quick question. I'd like to see how Twisted typically implements a
protocol parser. Where would be a good place to start reading example
code?

--
--Guido van Rossum (python.org/~guido)

Ben Darnell

unread,
Oct 7, 2012, 9:41:52 PM10/7/12
to python...@python.org
Hi python-ideas,

I'm jumping in to this thread on behalf of Tornado. I think there are
actually two separate issues here and it's important to keep them
distinct: at a low level, there is a need for a standardized event
loop, while at a higher level there is a question of what asynchronous
code should look like.

This thread so far has been more about the latter, but the need for
standardization is more acute for the core event loop. I've written a
bridge between Tornado and Twisted so libraries written for both event
loops can coexist, but obviously that wouldn't scale if there were a
proliferation of event loop implementations out there. I'd be in
favor of a simple event loop interface in the standard library, with
reference implementation(s) (select, epoll, kqueue, iocp) and some
means of configuring the global (or thread-local) singleton. My
preference is to keep the interface fairly low-level and close to the
underlying mechanisms (i.e. like IReactorFDSet instead of
IReactor{TCP,UDP,SSL,etc}), so that different interfaces like
Tornado's IOStream or Twisted's protocols can be built on top of it.

As for the higher-level question of what asynchronous code should look
like, there's a lot more room for spirited debate, and I don't think
there's enough consensus to declare a One True Way. Personally, I'm
-1 on greenlets as a general solution (what if you have to call
MySQLdb or getaddrinfo?), although they can be useful in particular
cases to convert well-behaved synchronous code into async (as in
Motor: http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/).
I like Futures, though, and I find that they work well in
asynchronous code. The use of the result() method to encapsulate both
successful responses and exceptions is especially nice with generator
coroutines.

FWIW, here's the interface I'm moving towards for async code. From
the caller's perspective, asynchronous functions return a Future (the
future has to be constructed by hand since there is no Executor
involved), and also take an optional callback argument (mainly for
consistency with currently-prevailing patterns for async code; if the
callback is given it is simply added to the Future with
add_done_callback). In Tornado the Future is created by a decorator
and hidden from the asynchronous function (it just sees the callback),
although this relies on some Tornado-specific magic for exception
handling. In a coroutine, the decorator recognizes Futures and
resumes execution when the future is done. With these decorators
asynchronous code looks almost like synchronous code, except for the
"yield" keyword before each asynchronous call.

-Ben

Guido van Rossum

unread,
Oct 7, 2012, 10:01:42 PM10/7/12
to Ben Darnell, python...@python.org
On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell <b...@bendarnell.com> wrote:
> Hi python-ideas,
>
> I'm jumping in to this thread on behalf of Tornado.

Welcome!

> I think there are
> actually two separate issues here and it's important to keep them
> distinct: at a low level, there is a need for a standardized event
> loop, while at a higher level there is a question of what asynchronous
> code should look like.

Yes, yes. I tried to bring up thing distinction. I'm glad I didn't
completely fail.

> This thread so far has been more about the latter, but the need for
> standardization is more acute for the core event loop. I've written a
> bridge between Tornado and Twisted so libraries written for both event
> loops can coexist, but obviously that wouldn't scale if there were a
> proliferation of event loop implementations out there. I'd be in
> favor of a simple event loop interface in the standard library, with
> reference implementation(s) (select, epoll, kqueue, iocp) and some
> means of configuring the global (or thread-local) singleton. My
> preference is to keep the interface fairly low-level and close to the
> underlying mechanisms (i.e. like IReactorFDSet instead of
> IReactor{TCP,UDP,SSL,etc}), so that different interfaces like
> Tornado's IOStream or Twisted's protocols can be built on top of it.

As long as it's not so low-level that other people shy away from it.

I also have a feeling that one way or another this will require
cooperation between the Twisted and Tornado developers in order to
come up with a compromise that both are willing to conform to in a
meaningful way. (Unfortunately I don't know how to define "meaningful
way" more precisely here. I guess the idea is that almost all things
*using* an event loop use the standardized abstract API without caring
whether underneath it's Tornado, Twisted, or some simpler thing in the
stdlib.

> As for the higher-level question of what asynchronous code should look
> like, there's a lot more room for spirited debate, and I don't think
> there's enough consensus to declare a One True Way. Personally, I'm
> -1 on greenlets as a general solution (what if you have to call
> MySQLdb or getaddrinfo?), although they can be useful in particular
> cases to convert well-behaved synchronous code into async (as in
> Motor: http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/).

Agreed on both counts.

> I like Futures, though, and I find that they work well in
> asynchronous code. The use of the result() method to encapsulate both
> successful responses and exceptions is especially nice with generator
> coroutines.

Yay!

> FWIW, here's the interface I'm moving towards for async code. From
> the caller's perspective, asynchronous functions return a Future (the
> future has to be constructed by hand since there is no Executor
> involved),

Ditto for NDB (though there's a decorator that often takes care of the
future construction).

> and also take an optional callback argument (mainly for
> consistency with currently-prevailing patterns for async code; if the
> callback is given it is simply added to the Future with
> add_done_callback).

That's interesting. I haven't found the need for this yet. Is it
really so common that you can't write this as a Future() constructor
plus a call to add_done_callback()? Or is there some subtle semantic
difference?

> In Tornado the Future is created by a decorator
> and hidden from the asynchronous function (it just sees the callback),

Hm, interesting. NDB goes the other way, the callbacks are mostly used
to make Futures work, and most code (including large swaths of
internal code) uses Futures. I think NDB is similar to monocle here.
In NDB, you can do

f = <some function returning a Future>
r = yield f

where "yield f" is mostly equivalent to f.result(), except it gives
better opportunity for concurrency.

> although this relies on some Tornado-specific magic for exception
> handling. In a coroutine, the decorator recognizes Futures and
> resumes execution when the future is done. With these decorators
> asynchronous code looks almost like synchronous code, except for the
> "yield" keyword before each asynchronous call.

Yes! Same here.

I am currently trying to understand if using "yield from" (and
returning a value from a generator) will simplify things. For example
maybe the need for a special decorator might go away. But I keep
getting headaches -- perhaps there's a Monad involved. :-)

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 7, 2012, 10:49:29 PM10/7/12
to Duncan McGreggor, python...@python.org
On Sun, Oct 7, 2012 at 7:16 PM, Duncan McGreggor
<duncan.m...@gmail.com> wrote:
>
>
> On Sun, Oct 7, 2012 at 5:52 PM, Guido van Rossum <gu...@python.org> wrote:
>>
>> On Sat, Oct 6, 2012 at 9:09 PM, Duncan M. McGreggor
>> <duncan.m...@gmail.com> wrote:
>> > We're here ;-)
>> >
>> > I'm forwarding this to the rest of the Twisted cabal...
>>
>> Quick question. I'd like to see how Twisted typically implements a
>> protocol parser. Where would be a good place to start reading example
>> code?
>
>
> I'm not exactly sure what you're looking for (e.g., I'm not sure what your
> exact definition of a protocol parser is), but this might be getting close
> to what you want:
>
> * https://github.com/twisted/twisted/blob/master/twisted/mail/pop3.py
> * https://github.com/twisted/twisted/blob/master/twisted/protocols/basic.py
>
> The POP3 protocol implementation in Twisted is a pretty good example of how
> one should create a protocol. It's a subclass of the
> twisted.protocol.basic.LineOnlyReceiver, and I'm guessing when you said
> "parsing" you're wanting to look at what's in the dataReceived method of
> that class.
>
> Hopefully that's what you were after...

Yes, those are perfect. The term I used came from one of Josiah's
previous messages in this thread, but I think he really meant protocol
handler.

My current goal is to see if it would be possible to come up with an
abstraction that makes it possible to write protocol handlers that are
independent from the rest of the infrastructure (e.g. transport,
reactor). I honestly have no idea if this is a sane idea but I'm going
to look into it anyway; if it works it would be cool to be able to
reuse the same POP3 logic in different environments (e.g. synchronous
thread-based, Twisted) without having to pul in all of Twisted. I.e.
Twisted could contribute the code to the stdlib and the stdlib could
make it work with SocketServer but Twisted could still use it
(assuming Twisted ever gets ported to Py3k :-).

Ben Darnell

unread,
Oct 8, 2012, 12:44:27 AM10/8/12
to Guido van Rossum, python...@python.org
On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum <gu...@python.org> wrote:
> As long as it's not so low-level that other people shy away from it.

That depends on the target audience. The low-level IOLoop and Reactor
are pretty similar -- you can implement one in terms of the other --
but as you move up the stack cross-compatibility becomes harder. For
example, if I wanted to implement tornado's IOStreams in twisted, I
wouldn't start with the analogous class in twisted (Protocol?), I'd go
down to the Reactor and build from there, so putting something
IOStream or Protocol in asycore2 wouldn't do much to unify the two
worlds. (it would help people build async stuff with the stdlib
alone, but at that point it becomes more like a peer or competitor to
tornado and twisted instead of a bridge between them)

>
> I also have a feeling that one way or another this will require
> cooperation between the Twisted and Tornado developers in order to
> come up with a compromise that both are willing to conform to in a
> meaningful way. (Unfortunately I don't know how to define "meaningful
> way" more precisely here. I guess the idea is that almost all things
> *using* an event loop use the standardized abstract API without caring
> whether underneath it's Tornado, Twisted, or some simpler thing in the
> stdlib.

I'd phrase the goal as being able to run both Tornado and Twisted in
the same thread without any piece of code needing to know about both
systems. I think that's achievable as far as core functionality goes.
I expect both sides have some lesser-used functionality that might
not make it into the stdlib version, but as long as it's possible to
plug in a "real" IOLoop or Reactor when needed it should be OK.
It's a Future constructor, a (conditional) add_done_callback, plus the
calls to set_result or set_exception and the with statement for error
handling. In full:

def future_wrap(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
future = Future()
if kwargs.get('callback') is not None:
future.add_done_callback(kwargs.pop('callback'))
kwargs['callback'] = future.set_result
def handle_error(typ, value, tb):
future.set_exception(value)
return True
with ExceptionStackContext(handle_error):
f(*args, **kwargs)
return future
return wrapper



>
>> In Tornado the Future is created by a decorator
>> and hidden from the asynchronous function (it just sees the callback),
>
> Hm, interesting. NDB goes the other way, the callbacks are mostly used
> to make Futures work, and most code (including large swaths of
> internal code) uses Futures. I think NDB is similar to monocle here.
> In NDB, you can do
>
> f = <some function returning a Future>
> r = yield f
>
> where "yield f" is mostly equivalent to f.result(), except it gives
> better opportunity for concurrency.

Yes, tornado's gen.engine does the same thing here. However, the
stakes are higher than "better opportunity for concurrency" - in an
event loop if you call future.result() without yielding, you'll
deadlock if that Future's task needs to run on the same event loop.

>
>> although this relies on some Tornado-specific magic for exception
>> handling. In a coroutine, the decorator recognizes Futures and
>> resumes execution when the future is done. With these decorators
>> asynchronous code looks almost like synchronous code, except for the
>> "yield" keyword before each asynchronous call.
>
> Yes! Same here.
>
> I am currently trying to understand if using "yield from" (and
> returning a value from a generator) will simplify things. For example
> maybe the need for a special decorator might go away. But I keep
> getting headaches -- perhaps there's a Monad involved. :-)

I think if you build generator handling directly into the event loop
and use "yield from" for calls from one async function to another then
you can get by without any decorators. But I'm not sure if you can do
that and maintain any compatibility with existing non-generator async
code.

I think the ability to return from a generator is actually a bigger
deal than "yield from" (and I only learned about it from another
python-ideas thread today). The only reason a generator decorated
with @tornado.gen.engine needs a callback passed in to it is to act as
a psuedo-return, and a real return would prevent the common mistake of
running the callback then falling through to the rest of the function.

For concreteness, here's a crude sketch of what the APIs I'm talking
about would look like in use (in a hypothetical future version of
tornado).

@future_wrap
@gen.engine
def async_http_client(url, callback):
parsed_url = urlparse.urlsplit(url)
# works the same whether the future comes from a thread pool or @future_wrap
addrinfo = yield g_thread_pool.submit(socket.getaddrinfo,
parsed_url.hostname, parsed_url.port)
stream = IOStream(socket.socket())
yield stream.connect((addrinfo[0][-1]))
stream.write('GET %s HTTP/1.0' % parsed_url.path)
header_data = yield stream.read_until('\r\n\r\n')
headers = parse_headers(header_data)
body_data = yield stream.read_bytes(int(headers['Content-Length']))
stream.close()
callback(body_data)

# another function to demonstrate composability
@future_wrap
@gen.engine
def fetch_some_urls(url1, url2, url3, callback):
body1 = yield async_http_client(url1)
# yield a list of futures for concurrency
future2 = yield async_http_client(url2)
future3 = yield async_http_client(url3)
body2, body3 = yield [future2, future3]
callback((body1, body2, body3))

One hole in this design is how to deal with callbacks that are run
multiple times. For example, the IOStream read methods take both a
regular callback and an optional streaming_callback (which is called
with each chunk of data as it arrives). I think this needs to be
modeled as something like an iterator of Futures, but I haven't worked
out the details yet.

-Ben

Floris Bruynooghe

unread,
Oct 8, 2012, 7:10:05 AM10/8/12
to python...@python.org
On 8 October 2012 03:49, Guido van Rossum <gu...@python.org> wrote:
> My current goal is to see if it would be possible to come up with an
> abstraction that makes it possible to write protocol handlers that are
> independent from the rest of the infrastructure (e.g. transport,
> reactor).

This would be my ideal situation too and I think this is what PEP 3153
was trying to achieve. While I am an greenlet (eventlet) user I agree
with the sentiment that it is not ideal to include it into the stdlib
itself and instead work to a solution where we can share protocol
implementations while having the freedom to run on a twisted reactor,
tornado, something greenlet based or something in the stdlib depending
on the preference of the developer.

FWIW I have implemented the AgentX protocol based on PEP-3153 and it
isn't complete yet (I had to go outside of what it defines). It is
also rather heavy handed and I'm not sure how one could migrate the
stdlib to something like this. So hopefully there are better
solutions possible.


Regards,
Floris

Christian Heimes

unread,
Oct 8, 2012, 8:39:14 AM10/8/12
to Python-Ideas
Hi Ben,

Am 08.10.2012 03:41, schrieb Ben Darnell:
> This thread so far has been more about the latter, but the need for
> standardization is more acute for the core event loop. I've written a
> bridge between Tornado and Twisted so libraries written for both event
> loops can coexist, but obviously that wouldn't scale if there were a
> proliferation of event loop implementations out there. I'd be in
> favor of a simple event loop interface in the standard library, with
> reference implementation(s) (select, epoll, kqueue, iocp) and some
> means of configuring the global (or thread-local) singleton.
[...]

Python's standard library doesn't contain in interface to I/O Completion
Ports. I think a common event loop system is a good reason to add IOCP
if somebody is up for the challenge.

Would you prefer an IOCP wrapper in the stdlib or your own version?
Twisted has its own Cython based wrapper, some other libraries use a
libevent-based solution.

Christian

Joachim König

unread,
Oct 8, 2012, 9:34:52 AM10/8/12
to python...@python.org
On 08/10/2012 03:41 Ben Darnell wrote:
> As for the higher-level question of what asynchronous code should look
> like, there's a lot more room for spirited debate, and I don't think
> there's enough consensus to declare a One True Way. Personally, I'm
> -1 on greenlets as a general solution (what if you have to call
> MySQLdb or getaddrinfo?)

The caller of such a potentially blocking function could:

* spawn a new thread for the call
* call the function inside the thread and collect return value or exception
* register the thread (id) to inform the event loop (scheduler) it's
waiting for it's completion
* yield (aka "switch" in greenlet) to the event loop / scheduler
* upon continuation either continue with the result or reraise the
exception that happened in the thread

Unfortunately on Unix systems select/poll/kqueue cannot specify threads as
event resources, so an additional pipe descriptor would be needed for
the scheduler
to detect thread completions without blocking (threads would write to
the pipe upon
completion), not elegant but doable.

Joachim

Guido van Rossum

unread,
Oct 8, 2012, 11:30:12 AM10/8/12
to Ben Darnell, python...@python.org
On Sun, Oct 7, 2012 at 9:44 PM, Ben Darnell <b...@bendarnell.com> wrote:
> On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum <gu...@python.org> wrote:
>> As long as it's not so low-level that other people shy away from it.
>
> That depends on the target audience. The low-level IOLoop and Reactor
> are pretty similar -- you can implement one in terms of the other --
> but as you move up the stack cross-compatibility becomes harder. For
> example, if I wanted to implement tornado's IOStreams in twisted, I
> wouldn't start with the analogous class in twisted (Protocol?), I'd go
> down to the Reactor and build from there, so putting something
> IOStream or Protocol in asycore2 wouldn't do much to unify the two
> worlds. (it would help people build async stuff with the stdlib
> alone, but at that point it becomes more like a peer or competitor to
> tornado and twisted instead of a bridge between them)

Sure. And of course we can't expect Twisted and Tornado to just merge
projects. They each have different strengths and weaknesses and they
each have strong opinions on how things should be done. I do get your
point that none of that is incompatible with a shared reactor
specification.

>> I also have a feeling that one way or another this will require
>> cooperation between the Twisted and Tornado developers in order to
>> come up with a compromise that both are willing to conform to in a
>> meaningful way. (Unfortunately I don't know how to define "meaningful
>> way" more precisely here. I guess the idea is that almost all things
>> *using* an event loop use the standardized abstract API without caring
>> whether underneath it's Tornado, Twisted, or some simpler thing in the
>> stdlib.
>
> I'd phrase the goal as being able to run both Tornado and Twisted in
> the same thread without any piece of code needing to know about both
> systems. I think that's achievable as far as core functionality goes.
> I expect both sides have some lesser-used functionality that might
> not make it into the stdlib version, but as long as it's possible to
> plug in a "real" IOLoop or Reactor when needed it should be OK.

Sounds good. I think a reactor is always going to be an extension of
the shared spec.

[...]
>> That's interesting. I haven't found the need for this yet. Is it
>> really so common that you can't write this as a Future() constructor
>> plus a call to add_done_callback()? Or is there some subtle semantic
>> difference?
>
> It's a Future constructor, a (conditional) add_done_callback, plus the
> calls to set_result or set_exception and the with statement for error
> handling. In full:
>
> def future_wrap(f):
> @functools.wraps(f)
> def wrapper(*args, **kwargs):
> future = Future()
> if kwargs.get('callback') is not None:
> future.add_done_callback(kwargs.pop('callback'))
> kwargs['callback'] = future.set_result
> def handle_error(typ, value, tb):
> future.set_exception(value)
> return True
> with ExceptionStackContext(handle_error):
> f(*args, **kwargs)
> return future
> return wrapper

Hmm... I *think* it automatically adds a special keyword 'callback' to
the *call* site so that you can do things like

fut = some_wrapped_func(blah, callback=my_callback)

and then instead of using yield to wait for the callback, put the
continuation of your code in the my_callback() function. But it also
seems like it passes callback=future.set_result as the callback to the
wrapped function, which looks to me like that function was apparently
written before Futures were widely used. This seems pretty impure to
me and I'd like to propose a "future" where such functions either be
given the Future where the result is expected, or (more commonly) the
function would create the Future itself.

Unless I'm totally missing the programming model here.

PS. I'd like to learn more about ExceptionStackContext() -- I've
struggled somewhat with getting decent tracebacks in NDB.

>>> In Tornado the Future is created by a decorator
>>> and hidden from the asynchronous function (it just sees the callback),
>>
>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>> to make Futures work, and most code (including large swaths of
>> internal code) uses Futures. I think NDB is similar to monocle here.
>> In NDB, you can do
>>
>> f = <some function returning a Future>
>> r = yield f
>>
>> where "yield f" is mostly equivalent to f.result(), except it gives
>> better opportunity for concurrency.
>
> Yes, tornado's gen.engine does the same thing here. However, the
> stakes are higher than "better opportunity for concurrency" - in an
> event loop if you call future.result() without yielding, you'll
> deadlock if that Future's task needs to run on the same event loop.

That would depend on the semantics of the event loop implementation.
In NDB's event loop, such a .result() call would just recursively
enter the event loop, and you'd only deadlock if you actually have two
pieces of code waiting for each other's completion.

[...]
>> I am currently trying to understand if using "yield from" (and
>> returning a value from a generator) will simplify things. For example
>> maybe the need for a special decorator might go away. But I keep
>> getting headaches -- perhaps there's a Monad involved. :-)
>
> I think if you build generator handling directly into the event loop
> and use "yield from" for calls from one async function to another then
> you can get by without any decorators. But I'm not sure if you can do
> that and maintain any compatibility with existing non-generator async
> code.
>
> I think the ability to return from a generator is actually a bigger
> deal than "yield from" (and I only learned about it from another
> python-ideas thread today). The only reason a generator decorated
> with @tornado.gen.engine needs a callback passed in to it is to act as
> a psuedo-return, and a real return would prevent the common mistake of
> running the callback then falling through to the rest of the function.

Ah, so you didn't come up with the clever hack of raising an exception
to signify the return value. In NDB, you raise StopIteration (though
it is given the alias 'Return' for clarity) with an argument, and the
wrapper code that is responsible for the Future takes the value from
the StopIteration exception and passes it to the Future's
set_result().

> For concreteness, here's a crude sketch of what the APIs I'm talking
> about would look like in use (in a hypothetical future version of
> tornado).
>
> @future_wrap
> @gen.engine
> def async_http_client(url, callback):
> parsed_url = urlparse.urlsplit(url)
> # works the same whether the future comes from a thread pool or @future_wrap

And you need the thread pool because there's no async version of
getaddrinfo(), right?

> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
> stream = IOStream(socket.socket())
> yield stream.connect((addrinfo[0][-1]))
> stream.write('GET %s HTTP/1.0' % parsed_url.path)

Why no yield in front of the write() call?

> header_data = yield stream.read_until('\r\n\r\n')
> headers = parse_headers(header_data)
> body_data = yield stream.read_bytes(int(headers['Content-Length']))
> stream.close()
> callback(body_data)
>
> # another function to demonstrate composability
> @future_wrap
> @gen.engine
> def fetch_some_urls(url1, url2, url3, callback):
> body1 = yield async_http_client(url1)
> # yield a list of futures for concurrency
> future2 = yield async_http_client(url2)
> future3 = yield async_http_client(url3)
> body2, body3 = yield [future2, future3]
> callback((body1, body2, body3))

This second one is nearly identical to the way we it's done in NDB.
However I think you have a typo -- I doubt that there should be yields
on the lines creating future2 and future3.

> One hole in this design is how to deal with callbacks that are run
> multiple times. For example, the IOStream read methods take both a
> regular callback and an optional streaming_callback (which is called
> with each chunk of data as it arrives). I think this needs to be
> modeled as something like an iterator of Futures, but I haven't worked
> out the details yet.

Ah. Yes, that's a completely different kind of thing, and probably
needs to be handled in a totally different way. I think it probably
needs to be modeled more like an infinite loop where at the blocking
point (e.g. a low-level read() or accept() call) you yield a Future.
Although I can see that this doesn't work well with the IOLoop's
concept of file descriptor (or other event source) registration.

Guido van Rossum

unread,
Oct 8, 2012, 11:34:29 AM10/8/12
to Floris Bruynooghe, python...@python.org
On Mon, Oct 8, 2012 at 4:10 AM, Floris Bruynooghe <fl...@devork.be> wrote:
> On 8 October 2012 03:49, Guido van Rossum <gu...@python.org> wrote:
>> My current goal is to see if it would be possible to come up with an
>> abstraction that makes it possible to write protocol handlers that are
>> independent from the rest of the infrastructure (e.g. transport,
>> reactor).
>
> This would be my ideal situation too and I think this is what PEP 3153
> was trying to achieve. While I am an greenlet (eventlet) user I agree
> with the sentiment that it is not ideal to include it into the stdlib
> itself and instead work to a solution where we can share protocol
> implementations while having the freedom to run on a twisted reactor,
> tornado, something greenlet based or something in the stdlib depending
> on the preference of the developer.
>
> FWIW I have implemented the AgentX protocol based on PEP-3153 and it
> isn't complete yet (I had to go outside of what it defines). It is
> also rather heavy handed and I'm not sure how one could migrate the
> stdlib to something like this. So hopefully there are better
> solutions possible.

The more I think about this the more I think it will be really hard to
accomplish. I think we ought to try and go for goals that are easier
to obtain (and still useful) first, such as a common reactor/ioloop
specification and a "best practice" implementation (which may choose a
different polling mechanism depending on the platform OS) in the
stdlib. 3rd party code could then hook into this mechanism and offer
alternate reactors, e.g. integrated with a 3rd party GUI library such
as Wx, Gtk, Qt -- maybe we can offer Tk integration in the stdlib. 3rd
party reactors could also offer additional functionality, e.g.
advanced scheduling, threadpool integration, or whatever (my
imagination isn't very good here).

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 8, 2012, 11:35:08 AM10/8/12
to Christian Heimes, Python-Ideas
On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <chri...@python.org> wrote:
> Python's standard library doesn't contain in interface to I/O Completion
> Ports. I think a common event loop system is a good reason to add IOCP
> if somebody is up for the challenge.
>
> Would you prefer an IOCP wrapper in the stdlib or your own version?
> Twisted has its own Cython based wrapper, some other libraries use a
> libevent-based solution.

What's an IOCP?

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 8, 2012, 11:37:28 AM10/8/12
to Joachim König, python...@python.org
On Mon, Oct 8, 2012 at 6:34 AM, Joachim König <h...@online.de> wrote:
> On 08/10/2012 03:41 Ben Darnell wrote:
>>
>> As for the higher-level question of what asynchronous code should look
>> like, there's a lot more room for spirited debate, and I don't think
>> there's enough consensus to declare a One True Way. Personally, I'm
>> -1 on greenlets as a general solution (what if you have to call
>> MySQLdb or getaddrinfo?)
>
>
> The caller of such a potentially blocking function could:
>
> * spawn a new thread for the call
> * call the function inside the thread and collect return value or exception
> * register the thread (id) to inform the event loop (scheduler) it's waiting for it's completion
> * yield (aka "switch" in greenlet) to the event loop / scheduler
> * upon continuation either continue with the result or reraise the exception that happened in the thread

Ben just posted an example of how to do exactly that for getaddrinfo().

> Unfortunately on Unix systems select/poll/kqueue cannot specify threads as
> event resources, so an additional pipe descriptor would be needed for the scheduler
> to detect thread completions without blocking (threads would write to the pipe upon
> completion), not elegant but doable.

However it must be done this seems a useful thing to solve once and
for all in a standard reactor specification and stdlib implementation.
(Ditto for signal handlers BTW.)

--
--Guido van Rossum (python.org/~guido)

Barry Warsaw

unread,
Oct 8, 2012, 12:45:34 PM10/8/12
to python...@python.org
On Oct 06, 2012, at 03:00 PM, Guido van Rossum wrote:

>This is an incredibly important discussion.

Indeed. If Python gets it right, it could be yet another killer reason for
upgrading to Python 3, at least for the growing subset of event-driven
applications.

>(1) How importance is it to offer a compatibility path for asyncore?

I've written and continue to use async-based code. I don't personally care
much about compatibility. I've use async because it was the simplest and most
stdlibby of the options for the Python versions I can use, but I have no love
for it. If there were a better, more readable and comprehensible way to do
it, I'd ditch the async-based versions as soon as possible.

>I would have thought that offering an integration path forward for Twisted
>and Tornado would be more important.

Agreed. I share the same dream as someone else in this thread mentioned. It
would be really fantastic if the experts in a particular protocol could write
support for that protocol Just Once and have it as widely shared as possible.
Maybe this is an unrealistic dream, but now's the time to have them anyway.

Even something like the email package could benefit from this. The FeedParser
is our attempt to support asynchronous reading of email data for parsing. I'm
not so sure that the asynchronous part of that is very useful.

-Barry
signature.asc

Mike Graham

unread,
Oct 8, 2012, 1:04:00 PM10/8/12
to Guido van Rossum, Python-Ideas
On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum <gu...@python.org> wrote:
> On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <chri...@python.org> wrote:
>> Python's standard library doesn't contain in interface to I/O Completion
>> Ports. I think a common event loop system is a good reason to add IOCP
>> if somebody is up for the challenge.
>>
>> Would you prefer an IOCP wrapper in the stdlib or your own version?
>> Twisted has its own Cython based wrapper, some other libraries use a
>> libevent-based solution.
>
> What's an IOCP?

It's the non-crappy select equivalent on Windows.

Mike

Paul Moore

unread,
Oct 8, 2012, 1:07:25 PM10/8/12
to mikeg...@gmail.com, Python-Ideas
On 8 October 2012 18:04, Mike Graham <mikeg...@gmail.com> wrote:
>> What's an IOCP?
>
> It's the non-crappy select equivalent on Windows.

I/O Completion port, just for clarity :-)
Paul.

Antoine Pitrou

unread,
Oct 8, 2012, 2:36:37 PM10/8/12
to python...@python.org
On Mon, 8 Oct 2012 13:04:00 -0400
Mike Graham <mikeg...@gmail.com> wrote:
> On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum <gu...@python.org> wrote:
> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <chri...@python.org> wrote:
> >> Python's standard library doesn't contain in interface to I/O Completion
> >> Ports. I think a common event loop system is a good reason to add IOCP
> >> if somebody is up for the challenge.
> >>
> >> Would you prefer an IOCP wrapper in the stdlib or your own version?
> >> Twisted has its own Cython based wrapper, some other libraries use a
> >> libevent-based solution.
> >
> > What's an IOCP?
>
> It's the non-crappy select equivalent on Windows.

Except that it's not exactly an equivalent, it's a whole different
programming model ;)

(but I understand what you mean: it allows to do non-blocking I/O on an
arbitrary number of objects in parallel)

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Guido van Rossum

unread,
Oct 8, 2012, 2:40:28 PM10/8/12
to Antoine Pitrou, python...@python.org
On Mon, Oct 8, 2012 at 11:36 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Mon, 8 Oct 2012 13:04:00 -0400
> Mike Graham <mikeg...@gmail.com> wrote:
>> On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum <gu...@python.org> wrote:
>> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <chri...@python.org> wrote:
>> >> Python's standard library doesn't contain in interface to I/O Completion
>> >> Ports. I think a common event loop system is a good reason to add IOCP
>> >> if somebody is up for the challenge.
>> >>
>> >> Would you prefer an IOCP wrapper in the stdlib or your own version?
>> >> Twisted has its own Cython based wrapper, some other libraries use a
>> >> libevent-based solution.
>> >
>> > What's an IOCP?
>>
>> It's the non-crappy select equivalent on Windows.
>
> Except that it's not exactly an equivalent, it's a whole different
> programming model ;)
>
> (but I understand what you mean: it allows to do non-blocking I/O on an
> arbitrary number of objects in parallel)

Now I know what it is I think that (a) the abstract reactor design
should support IOCP, and (b) the stdlib should have enabled by default
IOCP when on Windows.

--
--Guido van Rossum (python.org/~guido)

Mark Adam

unread,
Oct 8, 2012, 3:20:57 PM10/8/12
to Guido van Rossum, python...@python.org
On Sun, Oct 7, 2012 at 9:01 PM, Guido van Rossum <gu...@python.org> wrote:
> On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell <b...@bendarnell.com> wrote:
>> I think there are
>> actually two separate issues here and it's important to keep them
>> distinct: at a low level, there is a need for a standardized event
>> loop, while at a higher level there is a question of what asynchronous
>> code should look like.
>
> Yes, yes. I tried to bring up thing distinction. I'm glad I didn't
> completely fail.

Perhaps this is obvious to others, but (like hinted at above) there
seem to be two primary issues with event handlers:

1) event handlers for the machine-program interface (ex. network I/O)
2) event handlers for the program-user interface (ex. mouse I/O)

While similar, my gut tell me they have to be handled in completely
different way in order to preserve order (i.e. sanity).

This issue, for me, has come up with wanting to make a p2p network
application with VPython.

MarkJ

Guido van Rossum

unread,
Oct 8, 2012, 4:00:45 PM10/8/12
to Mark Adam, python...@python.org
On Mon, Oct 8, 2012 at 12:20 PM, Mark Adam <dreamin...@gmail.com> wrote:
> On Sun, Oct 7, 2012 at 9:01 PM, Guido van Rossum <gu...@python.org> wrote:
>> On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell <b...@bendarnell.com> wrote:
>>> I think there are
>>> actually two separate issues here and it's important to keep them
>>> distinct: at a low level, there is a need for a standardized event
>>> loop, while at a higher level there is a question of what asynchronous
>>> code should look like.
>>
>> Yes, yes. I tried to bring up this distinction. I'm glad I didn't
>> completely fail.
>
> Perhaps this is obvious to others, but (like hinted at above) there
> seem to be two primary issues with event handlers:
>
> 1) event handlers for the machine-program interface (ex. network I/O)
> 2) event handlers for the program-user interface (ex. mouse I/O)
>
> While similar, my gut tell me they have to be handled in completely
> different way in order to preserve order (i.e. sanity).
>
> This issue, for me, has come up with wanting to make a p2p network
> application with VPython.

Interesting. I agree that these are different in nature, but I think
it would still be useful to have a single event loop ("reactor") that
can multiplex them together. I think where the paths diverge is when
it comes to the signature of the callback; for GUI events there is
certain standard structure that must be passed to the callback and
which isn't readily available when you *specify* the callback. OTOH
for your typical socket event the callback can just call the
appropriate method on the socket once it knows the socket is ready.

But still, in many cases I would like to see these all serialized in
the same thread and multiplexed according to some kind of assigned or
implied priorities, and IIRC, GUI events often are "collapsed" (e.g.
multple redraw events for the same window, or multiple mouse motion
events).

I also imagine the typical GUI event loop has hooks for integrating
file descriptor polling, or perhaps it gives you a file descriptor to
add to your select/poll/etc. map.

Also, doesn't the Windows IOCP unify the two?

--
--Guido van Rossum (python.org/~guido)

Christian Heimes

unread,
Oct 8, 2012, 8:13:03 PM10/8/12
to Guido van Rossum, Python-Ideas
Am 08.10.2012 17:35, schrieb Guido van Rossum:
> On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <chri...@python.org> wrote:
>> Python's standard library doesn't contain in interface to I/O Completion
>> Ports. I think a common event loop system is a good reason to add IOCP
>> if somebody is up for the challenge.
>>
>> Would you prefer an IOCP wrapper in the stdlib or your own version?
>> Twisted has its own Cython based wrapper, some other libraries use a
>> libevent-based solution.
>
> What's an IOCP?

I/O Completion Ports, http://en.wikipedia.org/wiki/IOCP

It's a Windows (and apparently also Solaris) API for async IO that can
handle multiple threads.

Christian

Ben Darnell

unread,
Oct 9, 2012, 1:12:51 AM10/9/12
to Guido van Rossum, python...@python.org
On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <gu...@python.org> wrote:
>> It's a Future constructor, a (conditional) add_done_callback, plus the
>> calls to set_result or set_exception and the with statement for error
>> handling. In full:
>>
>> def future_wrap(f):
>> @functools.wraps(f)
>> def wrapper(*args, **kwargs):
>> future = Future()
>> if kwargs.get('callback') is not None:
>> future.add_done_callback(kwargs.pop('callback'))
>> kwargs['callback'] = future.set_result
>> def handle_error(typ, value, tb):
>> future.set_exception(value)
>> return True
>> with ExceptionStackContext(handle_error):
>> f(*args, **kwargs)
>> return future
>> return wrapper
>
> Hmm... I *think* it automatically adds a special keyword 'callback' to
> the *call* site so that you can do things like
>
> fut = some_wrapped_func(blah, callback=my_callback)
>
> and then instead of using yield to wait for the callback, put the
> continuation of your code in the my_callback() function.

Yes. Note that if you're passing in a callback you're probably going
to just ignore the return value. The callback argument and the future
return value are essentially two alternative interfaces; it probably
doesn't make sense to use both at once (but as a library author it's
useful to provide both).

> But it also
> seems like it passes callback=future.set_result as the callback to the
> wrapped function, which looks to me like that function was apparently
> written before Futures were widely used. This seems pretty impure to
> me and I'd like to propose a "future" where such functions either be
> given the Future where the result is expected, or (more commonly) the
> function would create the Future itself.

Yes, it's impure and based on pre-Future patterns. The caller's
callback argument and the inner function's callback not really related
any more (they were the same in pre-Future async code of course).
They should probably have different names, although if the inner
function's return value were passed via exception (StopIteration or
return) the inner callback argument can just go away.

>
> Unless I'm totally missing the programming model here.
>
> PS. I'd like to learn more about ExceptionStackContext() -- I've
> struggled somewhat with getting decent tracebacks in NDB.

StackContext doesn't quite give you better tracebacks, although I
think it could be adapted to do that. ExceptionStackContext is
essentially a try/except block that follows you around across
asynchronous operations - on entry it sets a thread-local state, and
all the tornado asynchronous functions know to save this state when
they are passed a callback, and restore it when they execute it. This
has proven to be extremely helpful in ensuring that all exceptions get
caught by something that knows how to do the appropriate cleanup (i.e.
an asynchronous web page serves an error instead of just spinning
forever), although it has turned out to be a little more intrusive and
magical than I had originally anticipated.

https://github.com/facebook/tornado/blob/master/tornado/stack_context.py

>
>>>> In Tornado the Future is created by a decorator
>>>> and hidden from the asynchronous function (it just sees the callback),
>>>
>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>>> to make Futures work, and most code (including large swaths of
>>> internal code) uses Futures. I think NDB is similar to monocle here.
>>> In NDB, you can do
>>>
>>> f = <some function returning a Future>
>>> r = yield f
>>>
>>> where "yield f" is mostly equivalent to f.result(), except it gives
>>> better opportunity for concurrency.
>>
>> Yes, tornado's gen.engine does the same thing here. However, the
>> stakes are higher than "better opportunity for concurrency" - in an
>> event loop if you call future.result() without yielding, you'll
>> deadlock if that Future's task needs to run on the same event loop.
>
> That would depend on the semantics of the event loop implementation.
> In NDB's event loop, such a .result() call would just recursively
> enter the event loop, and you'd only deadlock if you actually have two
> pieces of code waiting for each other's completion.

Hmm, I think I'd rather deadlock. :) If the event loop is reentrant
then the application code has be coded defensively as if it were
preemptively multithreaded, which introduces the possibility of
deadlock or (probably) more subtle/less frequent errors. Reentrancy
has been a significant problem in my experience, so I've been moving
towards a policy where methods in Tornado that take a callback never
run it immediately; callbacks are always scheduled on the next
iteration of the IOLoop with IOLoop.add_callback.

>
> [...]
>>> I am currently trying to understand if using "yield from" (and
>>> returning a value from a generator) will simplify things. For example
>>> maybe the need for a special decorator might go away. But I keep
>>> getting headaches -- perhaps there's a Monad involved. :-)
>>
>> I think if you build generator handling directly into the event loop
>> and use "yield from" for calls from one async function to another then
>> you can get by without any decorators. But I'm not sure if you can do
>> that and maintain any compatibility with existing non-generator async
>> code.
>>
>> I think the ability to return from a generator is actually a bigger
>> deal than "yield from" (and I only learned about it from another
>> python-ideas thread today). The only reason a generator decorated
>> with @tornado.gen.engine needs a callback passed in to it is to act as
>> a psuedo-return, and a real return would prevent the common mistake of
>> running the callback then falling through to the rest of the function.
>
> Ah, so you didn't come up with the clever hack of raising an exception
> to signify the return value. In NDB, you raise StopIteration (though
> it is given the alias 'Return' for clarity) with an argument, and the
> wrapper code that is responsible for the Future takes the value from
> the StopIteration exception and passes it to the Future's
> set_result().

I think I may have thought about "raise Return(x)" and dismissed it as
too weird. But then, I'm abnormally comfortable with asynchronous
code that passes callbacks around.

>
>> For concreteness, here's a crude sketch of what the APIs I'm talking
>> about would look like in use (in a hypothetical future version of
>> tornado).
>>
>> @future_wrap
>> @gen.engine
>> def async_http_client(url, callback):
>> parsed_url = urlparse.urlsplit(url)
>> # works the same whether the future comes from a thread pool or @future_wrap
>
> And you need the thread pool because there's no async version of
> getaddrinfo(), right?

Right.

>
>> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>> stream = IOStream(socket.socket())
>> yield stream.connect((addrinfo[0][-1]))
>> stream.write('GET %s HTTP/1.0' % parsed_url.path)
>
> Why no yield in front of the write() call?

Because we don't need to wait for the write to complete before we
continue to the next statement. write() doesn't return anything; it
just succeeds or fails, and if it fails the next read_until will fail
too. (although in this case it wouldn't hurt to have the yield either)

>
>> header_data = yield stream.read_until('\r\n\r\n')
>> headers = parse_headers(header_data)
>> body_data = yield stream.read_bytes(int(headers['Content-Length']))
>> stream.close()
>> callback(body_data)
>>
>> # another function to demonstrate composability
>> @future_wrap
>> @gen.engine
>> def fetch_some_urls(url1, url2, url3, callback):
>> body1 = yield async_http_client(url1)
>> # yield a list of futures for concurrency
>> future2 = yield async_http_client(url2)
>> future3 = yield async_http_client(url3)
>> body2, body3 = yield [future2, future3]
>> callback((body1, body2, body3))
>
> This second one is nearly identical to the way we it's done in NDB.
> However I think you have a typo -- I doubt that there should be yields
> on the lines creating future2 and future3.

Right.

>
>> One hole in this design is how to deal with callbacks that are run
>> multiple times. For example, the IOStream read methods take both a
>> regular callback and an optional streaming_callback (which is called
>> with each chunk of data as it arrives). I think this needs to be
>> modeled as something like an iterator of Futures, but I haven't worked
>> out the details yet.
>
> Ah. Yes, that's a completely different kind of thing, and probably
> needs to be handled in a totally different way. I think it probably
> needs to be modeled more like an infinite loop where at the blocking
> point (e.g. a low-level read() or accept() call) you yield a Future.
> Although I can see that this doesn't work well with the IOLoop's
> concept of file descriptor (or other event source) registration.

It works just fine at the IOLoop level: you call
IOLoop.add_handler(fd, func, READ), and you'll get read events
whenever there's new data until you call remove_handler(fd) (or
update_handler). If you're passing callbacks around explicitly it's
pretty straightforward (as much as anything ever is in that style) to
allow for those callbacks to be run more than once. The problem is
that generators more or less require that each callback be run exactly
once. That's a generally desirable property, but the mismatch between
the two layers can be difficult to deal with.

-Ben

Greg Ewing

unread,
Oct 9, 2012, 1:56:15 AM10/9/12
to python...@python.org
Mark Adam wrote:
> 1) event handlers for the machine-program interface (ex. network I/O)
> 2) event handlers for the program-user interface (ex. mouse I/O)
>
> While similar, my gut tell me they have to be handled in completely
> different way in order to preserve order (i.e. sanity).

They can't be *completely* different, because deep down there
has to be a single event loop that can handle all kinds of
asynchronous events.

Upper layers can provide different APIs for them, but there
has to be some commonality in the lowest layers.

--
Greg

Ben Darnell

unread,
Oct 9, 2012, 2:53:11 AM10/9/12
to Greg Ewing, python...@python.org
On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Mark Adam wrote:
>>
>> 1) event handlers for the machine-program interface (ex. network I/O)
>> 2) event handlers for the program-user interface (ex. mouse I/O)
>>
>> While similar, my gut tell me they have to be handled in completely
>> different way in order to preserve order (i.e. sanity).
>
>
> They can't be *completely* different, because deep down there
> has to be a single event loop that can handle all kinds of
> asynchronous events.

There doesn't *have* to be - you could run a network event loop in one
thread and a GUI event loop in another and pass control back and forth
via methods like IOLoop.add_callback or Reactor.callFromThread.
However, Twisted has Reactor implementations that are integrated with
several different GUI toolkit's event loops, and while I haven't
worked with such a beast my gut instinct is that in most cases a
single shared event loop is the way to go.

-Ben

Greg Ewing

unread,
Oct 9, 2012, 5:11:43 AM10/9/12
to python...@python.org
Ben Darnell wrote:

> StackContext doesn't quite give you better tracebacks, although I
> think it could be adapted to do that. ExceptionStackContext is
> essentially a try/except block that follows you around across
> asynchronous operations - on entry it sets a thread-local state, and
> all the tornado asynchronous functions know to save this state when
> they are passed a callback, and restore it when they execute it.

This is something that generator-based coroutines using
yield-from ought to handle a lot more cleanly. You should
be able to just use an ordinary try-except block in your
generator code and have it do the right thing.

I hope that the new async core will be designed so that
generator-based coroutines can be plugged into it directly
and efficiently, without the need for a lot of decorators,
callbacks, Futures, etc. in between.

--
Greg

Christian Heimes

unread,
Oct 9, 2012, 12:11:08 PM10/9/12
to Python-Ideas
Am 08.10.2012 20:40, schrieb Guido van Rossum:
> Now I know what it is I think that (a) the abstract reactor design
> should support IOCP, and (b) the stdlib should have enabled by default
> IOCP when on Windows.

I've created a ticket for the topic: http://bugs.python.org/issue16175

Christian

Guido van Rossum

unread,
Oct 9, 2012, 1:05:12 PM10/9/12
to Greg Ewing, python...@python.org
On Tue, Oct 9, 2012 at 2:11 AM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Ben Darnell wrote:
>
>> StackContext doesn't quite give you better tracebacks, although I
>> think it could be adapted to do that. ExceptionStackContext is
>> essentially a try/except block that follows you around across
>> asynchronous operations - on entry it sets a thread-local state, and
>> all the tornado asynchronous functions know to save this state when
>> they are passed a callback, and restore it when they execute it.

> This is something that generator-based coroutines using
> yield-from ought to handle a lot more cleanly. You should
> be able to just use an ordinary try-except block in your
> generator code and have it do the right thing.

Indeed, in NDB this works great. However tracebacks don't work so
great: If you don't catch the exception right away, it takes work to
make the tracebacks look right when you catch it a few generator calls
down on the (conceptual) stack. I fixed this to some extent in NDB, by
passing the traceback explicitly along when setting an exception on a
Future; before I did this, tracebacks looked awful. But there are
still StackContextquite a few situations in NDB where an uncaught
exception prints a baffling traceback, showing lots of frames from the
event loop and other async machinery but not the user code that was
actually waiting for anything. I have to study Tornado's to see if
there are ideas there for improving this.

> I hope that the new async core will be designed so that
> generator-based coroutines can be plugged into it directly
> and efficiently, without the need for a lot of decorators,
> callbacks, Futures, etc. in between.

That has been my hope too. But so far when thinking about this
recently I have found the goal elusive -- somehow it seems there *has*
to be a distinction between an operation you just *yield* (this would
be waiting for a specific low-level I/O operation) and something you
use with yield-from, which returns a value through StopIteration. I
keep getting a headache when I think about this, so there must be a
Monad in there somewhere... :-( Perhaps you can clear things up by
showing some detailed (but still simple enough) example code to handle
e.g. a simple web client?

--
--Guido van Rossum (python.org/~guido)

Laurens Van Houtven

unread,
Oct 9, 2012, 2:00:21 PM10/9/12
to chrysn, python...@python.org
Oh my me. This is a very long thread that I probably should have replied to a long time ago. This thread is intensely long right now, and tonight is the first chance I've had to try and go through it comprehensively. I'll try to reply to individual points made in the thread -- if I missed yours, please don't be offended, I promise it's my fault :)

FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka async-pep.

First of all, I'm glad to see that there's some more "let's get that pep along" movement. I tabled it because:

a) I didn't have enough time to contribute,
b) a lot of promised contributions ended up not happening when it came down to it, which was incredibly demotivating. The combination of this thread, plus the fact that I was strong armed at Pycon ZA by a bunch of community members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into exploring this thing again.

First of all, I don't feel async-pep is an attempt at twisted light in the stdlib. Other than separation of transport and protocol, there's not really much there that even smells of twisted (especially since right now I'd probably throw consumers/producers out) -- and that separation is simply good practice. Twisted does the same thing, but it didn't invent it. Furthermore, the advantages seem clear: reusability and testability are more than enough for me.

If there's one take away idea from async-pep, it's reusable protocols.

The PEP should probably be a number of PEPs. At first sight, it seems that this number is at least four:

1. Protocol and transport abstractions, making no mention of asynchronous IO (this is what I want 3153 to be, because it's small, manageable, and virtually everyone appears to agree it's a fantastic idea)
2. A base reactor interface
3. A way of structuring callbacks: probably deferreds with a built-in inlineCallbacks for people who want to write synchronous-looking code with explicit yields for asynchronous procedures
4+ adapting the stdlib tools to using these new things

Re: forward path for existing asyncore code. I don't remember this being raised as an issue. If anything, it was mentioned in passing, and I think the answer to it was something to the tune of "asyncore's API is broken, fixing it is more important than backwards compat". Essentially I agree with Guido that the important part is an upgrade path to a good third-party library, which is the part about asyncore that REALLY sucks right now. Regardless, an API upgrade is probably a good idea. I'm not sure if it should go in the first PEP: given the separation I've outlined above (which may be too spread out...), there's no obvious place to put it besides it being a new PEP.

Re base reactor interface: drawing maximally from the lessons learned in twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, etc), asynchronous-looking name lookup, fd handling are the important parts. call_every can be implemented in terms of call_later on a separate object, so I think it should be (eg twisted.internet.task.LoopingCall). One thing that is apparently forgotten about is event loop integration. The prime way of having two event loops cooperate is *NOT* "run both in parallel", it's "have one call the other". Even though not all loops support this, I think it's important to get this as part of the interface (raise an exception for all I care if it doesn't work).

cheers
lvh

Greg Ewing

unread,
Oct 9, 2012, 8:44:23 PM10/9/12
to Guido van Rossum, python...@python.org
Guido van Rossum wrote:

> Indeed, in NDB this works great. However tracebacks don't work so
> great: If you don't catch the exception right away, it takes work to
> make the tracebacks look right when you catch it a few generator calls
> down on the (conceptual) stack. I fixed this to some extent in NDB, by
> passing the traceback explicitly along when setting an exception on a
> Future;

Was this before or after the recent change that was supposed
to improve tracebacks from yield-fram chains? If there's still
a problem after that, maybe exception handling in yield-from
requires some more work.

> But so far when thinking about this
> recently I have found the goal elusive --

> Perhaps you can clear things up by
> showing some detailed (but still simple enough) example code to handle
> e.g. a simple web client?

You might like to take a look at this, where I develop a series of
examples culminating in a simple multi-threaded server:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt

Code here:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/

> somehow it seems there *has*
> to be a distinction between an operation you just *yield* (this would
> be waiting for a specific low-level I/O operation) and something you
> use with yield-from, which returns a value through StopIteration.

It may be worth noting that nothing in my server example uses 'yield'
to send or receive values -- yield is only used without argument as
a suspension point. But the functions containing the yields *are*
called with yield-from and may return values via StopIteration.

So I think there are (at least) two distinct ways of using generators,
but the distinction isn't quite the one you're making. Rather, we
have "coroutines" (don't yield values, do return values) and
"iterators" (do yield values, don't return values).

Moreover, it's *only* the "coroutine" variety that we need to cater
for when designing an async event system. Does that help to
alleviate any of your monad-induced headaches?

--
Greg

Greg Ewing

unread,
Oct 10, 2012, 3:53:04 AM10/10/12
to Guido van Rossum, python...@python.org
Guido van Rossum wrote:
> But there are
> still quite a few situations in NDB where an uncaught
> exception prints a baffling traceback, showing lots of frames from the
> event loop and other async machinery but not the user code that was
> actually waiting for anything.

I just tried an experiment using Python 3.3. I modified the
parse_request() function of my spamserver example to raise
an exception that isn't caught anywhere:

def parse_request(line):
tokens = line.split()
print(tokens)
if tokens and tokens[0] == b"EGGS":
raise ValueError("Server is allergic to eggs")
...

The resulting traceback looks like this. The last two lines
show very clearly where abouts the exception occurred in
user code. So it all seems to work quite happily.

Traceback (most recent call last):
File "spamserver.py", line 73, in <module>
run2()
File
"/Local/Projects/D/Python/YieldFrom/3.3/Examples/Scheduler/scheduler.py", line
109, in run2
run()
File
"/Local/Projects/D/Python/YieldFrom/3.3/Examples/Scheduler/scheduler.py", line
53, in run
next(g)
File "spamserver.py", line 50, in handler
n = parse_request(line)
File "spamserver.py", line 61, in parse_request
raise ValueError("Server is allergic to eggs")
ValueError: Server is allergic to eggs

--
Greg

Ben Darnell

unread,
Oct 10, 2012, 12:41:33 PM10/10/12
to Greg Ewing, python...@python.org
On Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> You might like to take a look at this, where I develop a series of
> examples culminating in a simple multi-threaded server:
>
> http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt


Thanks for this link, it was very helpful to see it all come together
from scratch. And I think the most compelling thing about it is
something that I hadn't picked up on when I looked at "yield from"
before, that it naturally preserves the call stack for exception
handling. That's a big deal, and may be worth the requirement of 3.3+
since the tricks we've used to get better exception handling in
earlier pythons have been pretty ugly. On the other hand, it does
mean starting from scratch with a new asynchronous world that's not
directly compatible with the existing Twisted or Tornado ecosystems.

-Ben

Mark Adam

unread,
Oct 10, 2012, 12:56:17 PM10/10/12
to Ben Darnell, python...@python.org
On Tue, Oct 9, 2012 at 1:53 AM, Ben Darnell <b...@bendarnell.com> wrote:
> On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
>> Mark Adam wrote:
>>>
>>> 1) event handlers for the machine-program interface (ex. network I/O)
>>> 2) event handlers for the program-user interface (ex. mouse I/O)
>>>
>>> While similar, my gut tell me they have to be handled in completely
>>> different way in order to preserve order (i.e. sanity).
>>
>> They can't be *completely* different, because deep down there
>> has to be a single event loop that can handle all kinds of
>> asynchronous events.
>
> There doesn't *have* to be - you could run a network event loop in one
> thread and a GUI event loop in another and pass control back and forth
> via methods like IOLoop.add_callback or Reactor.callFromThread.

No, this won't work. The key FAIL in that sentence is "...and pass
control", because the O.S. has to be in charge of things that happen
in user space. And everything in Python happens in user space.
(hence my suggestion of creating a Python O.S.).

MarkJ

Ben Darnell

unread,
Oct 10, 2012, 1:29:35 PM10/10/12
to Mark Adam, python...@python.org
On Wed, Oct 10, 2012 at 9:56 AM, Mark Adam <dreamin...@gmail.com> wrote:
> On Tue, Oct 9, 2012 at 1:53 AM, Ben Darnell <b...@bendarnell.com> wrote:
>> On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
>>> Mark Adam wrote:
>>>>
>>>> 1) event handlers for the machine-program interface (ex. network I/O)
>>>> 2) event handlers for the program-user interface (ex. mouse I/O)
>>>>
>>>> While similar, my gut tell me they have to be handled in completely
>>>> different way in order to preserve order (i.e. sanity).
>>>
>>> They can't be *completely* different, because deep down there
>>> has to be a single event loop that can handle all kinds of
>>> asynchronous events.
>>
>> There doesn't *have* to be - you could run a network event loop in one
>> thread and a GUI event loop in another and pass control back and forth
>> via methods like IOLoop.add_callback or Reactor.callFromThread.
>
> No, this won't work. The key FAIL in that sentence is "...and pass
> control", because the O.S. has to be in charge of things that happen
> in user space. And everything in Python happens in user space.
> (hence my suggestion of creating a Python O.S.).

Letting the OS/GUI library have control of the UI thread is exactly
the point I was making. Perhaps "pass control" was a little vague,
but what I meant is that you'd have two threads, one for UI and one
for networking. When you need to start a network operation from the
UI thread you'd use IOLoop.add_callback() to pass a function to the
network thread, and then when the network operation completes you'd
use the analogous function from the UI library to send the response
back and update the interface from the UI thread.

-Ben

Greg Ewing

unread,
Oct 10, 2012, 5:23:00 PM10/10/12
to python...@python.org
>>>>Mark Adam wrote:

>>>There doesn't *have* to be - you could run a network event loop in one
>>>thread and a GUI event loop in another and pass control back and forth
>>>via methods like IOLoop.add_callback or Reactor.callFromThread.

Well, that could be done, but one of the reasons for using
an event loop approach in the first place is to avoid having
to deal with threads and all their attendant concurrency
problems.

--
Greg

Trent Nelson

unread,
Oct 10, 2012, 8:55:23 PM10/10/12
to Christian Heimes, Python-Ideas
On Mon, Oct 08, 2012 at 05:13:03PM -0700, Christian Heimes wrote:
> Am 08.10.2012 17:35, schrieb Guido van Rossum:
> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <chri...@python.org> wrote:
> >> Python's standard library doesn't contain in interface to I/O Completion
> >> Ports. I think a common event loop system is a good reason to add IOCP
> >> if somebody is up for the challenge.
> >>
> >> Would you prefer an IOCP wrapper in the stdlib or your own version?
> >> Twisted has its own Cython based wrapper, some other libraries use a
> >> libevent-based solution.
> >
> > What's an IOCP?
>
> I/O Completion Ports, http://en.wikipedia.org/wiki/IOCP
>
> It's a Windows (and apparently also Solaris)

And AIX, too. For every OS IOCP implementation, there's a
corresponding Snakebite box :-)

> API for async IO that can handle multiple threads.

I find it helps to think of it in terms of a half-sync/half-async
pattern. The half-async part handles the I/O; the OS wakes up one
of your "I/O" threads upon incoming I/O. The job of such threads
is really just to pull/push the bytes from/to kernel/user space as
quickly as it can.

(Since Vista, Windows has provided a corresponding thread pool
API that gels really well with IOCP. Windows will optimally
manage threads based on incoming I/O; spawning/destroying
threads as per necessary. You can even indicate to Windows
whether your threads will be "compute" or I/O bound, which
it uses to optimize its scheduling algorithm.)

The half-sync part is the event-loop part of your app, which simply
churns away on the data prepared for it by the async threads.

What would be neat is if the half-async path could be run outside
the GIL. They would need to be able to allocate memory that could
then be "owned" by the GIL-holding half-sync part.

You could leverage this with kqueue and epoll; have similar threads
set up to simply process I/O independent of the GIL, using the same
facilities that would be used by IOCP-processing threads.

Then the "asyncore" event-loop simply becomes the half-sync part of
the pattern, enumerating over all the I/O requests queued up for it
by all the GIL-independent half-async threads.

Trent.

Antoine Pitrou

unread,
Oct 11, 2012, 10:40:43 AM10/11/12
to python...@python.org
On Wed, 10 Oct 2012 20:55:23 -0400
Trent Nelson <tr...@snakebite.org> wrote:
>
> You could leverage this with kqueue and epoll; have similar threads
> set up to simply process I/O independent of the GIL, using the same
> facilities that would be used by IOCP-processing threads.

Would you really win anything by doing I/O in separate threads, while
doing normal request processing in the main thread?

That said, the idea of a common API architected around async I/O,
rather than non-blocking I/O, sounds interesting at least theoretically.
Maybe all those outdated Snakebite Operating Systems are useful for
something after all. ;-P

cheers

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Guido van Rossum

unread,
Oct 11, 2012, 2:45:02 PM10/11/12
to Greg Ewing, python...@python.org
Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
>> Indeed, in NDB this works great. However tracebacks don't work so
>> great: If you don't catch the exception right away, it takes work to
>> make the tracebacks look right when you catch it a few generator calls
>> down on the (conceptual) stack. I fixed this to some extent in NDB, by
>> passing the traceback explicitly along when setting an exception on a
>> Future;
>
>
> Was this before or after the recent change that was supposed
> to improve tracebacks from yield-fram chains? If there's still
> a problem after that, maybe exception handling in yield-from
> requires some more work.

Sadly it was with Python 2.5/2.7...

>> But so far when thinking about this
>> recently I have found the goal elusive --
>
>
>> Perhaps you can clear things up by
>>
>> showing some detailed (but still simple enough) example code to handle
>> e.g. a simple web client?
>
>
> You might like to take a look at this, where I develop a series of
> examples culminating in a simple multi-threaded server:
>
> http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt

Definitely very enlightening. Though I think you should not use
'thread' since that term is already reserved for OS threads as
supported by the threading module. In NDB I chose to use 'tasklet' --
while that also has other meanings, its meaning isn't fixed in core
Python. You could also use task, which also doesn't have a core Python
meaning. Just don't call it "process", never mind that Erlang uses
this (a number of other languages rooted in old traditions do too, I
believe).

Also I think you can now revisit it and rewrite the code to use Python 3.3.

> Code here:
>
> http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/

It does bother me somehow that you're not using .send() and yield
arguments at all. I notice that you have a lot ofthree-line code
blocks like this:

block_for_reading(sock)
yield
data = sock.recv(1024)

The general form seems to be:

arrange for a callback when some operation can be done without blocking
yield
do the operation

This seems to be begging to be collapsed into a single line, e.g.

data = yield sock.recv_async(1024)

(I would also prefer to see the socket wrapped in an object that makes
it hard to accidentally block.)

>> somehow it seems there *has*
>> to be a distinction between an operation you just *yield* (this would
>> be waiting for a specific low-level I/O operation) and something you
>> use with yield-from, which returns a value through StopIteration.
>
> It may be worth noting that nothing in my server example uses 'yield'
> to send or receive values -- yield is only used without argument as
> a suspension point. But the functions containing the yields *are*
> called with yield-from and may return values via StopIteration.

Yeah, but see my remark above...

> So I think there are (at least) two distinct ways of using generators,
> but the distinction isn't quite the one you're making. Rather, we
> have "coroutines" (don't yield values, do return values) and
> "iterators" (do yield values, don't return values).

But surely there's still a place for send() and other PEP 342 features?

> Moreover, it's *only* the "coroutine" variety that we need to cater
> for when designing an async event system. Does that help to
> alleviate any of your monad-induced headaches?

Not entirely, no. I now have a fair amount experience writing an async
system and helping users make sense of its error messages, and there
are some practical considerations. E.g. my users sometimes want to
treat something as a coroutine but they don't have any yields in it
(perhaps they are writing skeleton code and plan to fill in the I/O
later). Example:

def caller():
data = yield from reader()

def reader():
return 'dummy'
yield

works, but if you drop the yield it doesn't work. With a decorator I
know how to make it work either way.

--
--Guido van Rossum (python.org/~guido)

Terry Reedy

unread,
Oct 11, 2012, 4:44:46 PM10/11/12
to python...@python.org
On 10/11/2012 2:45 PM, Guido van Rossum wrote:
> Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:

>> You might like to take a look at this, where I develop a series of
>> examples culminating in a simple multi-threaded server:
>>
>> http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt
>
> Definitely very enlightening. Though I think you should not use
> 'thread' since that term is already reserved for OS threads as
> supported by the threading module. In NDB I chose to use 'tasklet' --

I read through this also and agree that using 'thread' for 'task',
'tasklet', 'micrethread', or whatever is distracting. Part of the point,
to me, is that the code does *not* use (OS) threads and the thread module.

Tim Peters intended iterators, including generators, to be an
alternative to what he viewed as 'inside-out' callback code. The idea
was that pausing where appropriate allowed code that belongs together to
be kept together. I find generator-based event loops to be somewhat
easier to understand than callback-based loops. I certainly was more
comfortable with Greg's example than what I have read about twisted. So
I would like to see a generator-based system in the stdlib.

--
Terry Jan Reedy

Guido van Rossum

unread,
Oct 11, 2012, 5:18:50 PM10/11/12
to Laurens Van Houtven, python...@python.org
On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_...@lvh.cc> wrote:
> Oh my me. This is a very long thread that I probably should have replied to
> a long time ago. This thread is intensely long right now, and tonight is the
> first chance I've had to try and go through it comprehensively. I'll try to
> reply to individual points made in the thread -- if I missed yours, please
> don't be offended, I promise it's my fault :)

No problem, I'm running behind myself...

> FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka
> async-pep.

I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153.

> First of all, I'm glad to see that there's some more "let's get that pep
> along" movement. I tabled it because:
>
> a) I didn't have enough time to contribute,
> b) a lot of promised contributions ended up not happening when it came down
> to it, which was incredibly demotivating. The combination of this thread,
> plus the fact that I was strong armed at Pycon ZA by a bunch of community
> members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into
> exploring this thing again.
>
> First of all, I don't feel async-pep is an attempt at twisted light in the
> stdlib. Other than separation of transport and protocol, there's not really
> much there that even smells of twisted (especially since right now I'd
> probably throw consumers/producers out) -- and that separation is simply
> good practice. Twisted does the same thing, but it didn't invent it.
> Furthermore, the advantages seem clear: reusability and testability are more
> than enough for me.
>
> If there's one take away idea from async-pep, it's reusable protocols.

Is there a newer version that what's on
http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any
specific proposals, after spending a lot of time giving a rationale
and defining some terms. The version on
https://github.com/lvh/async-pep doesn't seem to be any more complete.

> The PEP should probably be a number of PEPs. At first sight, it seems that
> this number is at least four:
>
> 1. Protocol and transport abstractions, making no mention of asynchronous IO
> (this is what I want 3153 to be, because it's small, manageable, and
> virtually everyone appears to agree it's a fantastic idea)

But the devil is in the details. *What* specifically are you
proposing? How would you write a protocol handler/parser without any
reference to I/O? Most protocols are two-way streets -- you read some
stuff, and you write some stuff, then you read some more. (HTTP may be
the exception here, if you don't keep the connection open.)

> 2. A base reactor interface

I agree that this should be a separate PEP. But I do think that in
practice there will be dependencies between the different PEPs you are
proposing.

> 3. A way of structuring callbacks: probably deferreds with a built-in
> inlineCallbacks for people who want to write synchronous-looking code with
> explicit yields for asynchronous procedures

Your previous two ideas sound like you're not tied to backward
compatibility with Tornado and/or Twisted (not even via an adaptation
layer). Given that we're talking Python 3.4 here that's fine with me
(though I think we should be careful to offer a path forward for those
packages and their users, even if it means making changes to the
libraries). But Twisted Deferred is pretty arcane, and I would much
rather not use it as the basis of a forward-looking design. I'd much
rather see what we can mooch off PEP 3148 (Futures).

> 4+ adapting the stdlib tools to using these new things

We at least need to have an idea for how this could be done. We're
talking serious rewrites of many of our most fundamental existing
synchronous protocol libraries (e.g. httplib, email, possibly even
io.TextWrapper), most of which have had only scant updates even
through the Python 3 transition apart from complications to deal with
the bytes/str dichotomy.

> Re: forward path for existing asyncore code. I don't remember this being
> raised as an issue. If anything, it was mentioned in passing, and I think
> the answer to it was something to the tune of "asyncore's API is broken,
> fixing it is more important than backwards compat". Essentially I agree with
> Guido that the important part is an upgrade path to a good third-party
> library, which is the part about asyncore that REALLY sucks right now.

I have the feeling that the main reason asyncore sucks is that it
requires you to subclass its Dispatcher class, which has a rather
treacherous interface.

> Regardless, an API upgrade is probably a good idea. I'm not sure if it
> should go in the first PEP: given the separation I've outlined above (which
> may be too spread out...), there's no obvious place to put it besides it
> being a new PEP.

Aren't all your proposals API upgrades?

> Re base reactor interface: drawing maximally from the lessons learned in
> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
> etc), asynchronous-looking name lookup, fd handling are the important parts.

That actually sounds more concrete than I'd like a reactor interface
to be. In the App Engine world, there is a definite need for a
reactor, but it cannot talk about file descriptors at all -- all I/O
is defined in terms of RPC operations which have their own (several
layers of) async management but still need to be plugged in to user
code that might want to benefit from other reactor functionality such
as scheduling and placing a call at a certain moment in the future.

> call_every can be implemented in terms of call_later on a separate object,
> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
> that is apparently forgotten about is event loop integration. The prime way
> of having two event loops cooperate is *NOT* "run both in parallel", it's
> "have one call the other". Even though not all loops support this, I think
> it's important to get this as part of the interface (raise an exception for
> all I care if it doesn't work).

This is definitely one of the things we ought to get right. My own
thoughts are slightly (perhaps only cosmetically) different again:
ideally each event loop would have a primitive operation to tell it to
run for a little while, and then some other code could tie several
event loops together.

Possibly the primitive operation would be something like "block until
either you've got one event ready, or until a certain time (possibly
0) has passed without any events, and then give us the events that are
ready and a lower bound for when you might have more work to do" -- or
maybe instead of returning the event(s) it could just call the
associated callback (it might have to if it is part of a GUI library
that has callbacks written in C/C++ for certain events like screen
refreshes).

Anyway, it would be good to have input from representatives from Wx,
Qt, Twisted and Tornado to ensure that the *functionality* required is
all there (never mind the exact signatures of the APIs needed to
provide all that functionality).

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 11, 2012, 6:28:18 PM10/11/12
to Ben Darnell, python...@python.org
Definitely sounds like something that could be simplified if you
didn't have backward compatibility baggage...
Heh. I'll try to mine it for gems.
The latter is a good tactic and I'm also using it. (Except for some
reason we had to add the concept of "immediate callbacks" to our
Future class, and those are run inside the set_result() call. But most
callbacks don't use that feature.)

I don't have a choice about making the event loop reentrant -- App
Engine's underlying RPC multiplexing implementation *is* reentrant,
and there is a large set of "classic" APIs that I cannot stop the user
from calling that reenter it. But even if my hand wasn't forced, I'm
not sure if I would make your choice. In NDB, there is a full
complement of synchronous APIs that exactly matches the async APIs,
and users are free to use the synchronous APIs in parts of their code
where they don't need concurrency. Hence, every sychronous API just
calls its async sibling and immediately waits for its result, which
implicitly invokes the event loop.

Of course, I have it easy -- multiple incoming requests are dispatched
to separate threads by the App Engine runtime, so I don't have to
worry about multiplexing at that level at all -- just end user code
that is essentially single-threaded unless they go out of their way.

I did end up debugging one user's problem where they were making a
synchronous call inside an async handler, and -- very rarely! -- the
recursive event loop calls kept stacking up until they hit a
StackOverflowError. So I would agree that async code shouldn't make
synchronous API calls; but I haven't heard yet from anyone who was
otherwise hurt by the recursive event loop invocations -- in
particular, nobody has requested locks.

Still, this sounds like an important issue to revisit when discussing
a standard reactor API as part of Lourens's PEP offensive.
As I thought about the issue of how to spell "return a value" and
looked at various approaches, I decided I definitely didn't like what
monocle does: they let you say "yield X" where X is a non-Future
value; and I saw some other solution (Twisted? Phillip Eby?) that
simply called a function named something like returnValue(X). But I
also wanted it to look like a control statement that ends a block (so
auto-indenting editors would auto-dedent the next line), and that
means there are only four choices: continue, break, raise or return.
Three of those are useless... So the only choice really was which
exception to raise. FOrtunately I had the advantage of knowing that
PEP 380 was going to implement "return X" from a generator as "raise
StopIteration(X)" so I decided to be compatible with that.

>>> For concreteness, here's a crude sketch of what the APIs I'm talking
>>> about would look like in use (in a hypothetical future version of
>>> tornado).
>>>
>>> @future_wrap
>>> @gen.engine
>>> def async_http_client(url, callback):
>>> parsed_url = urlparse.urlsplit(url)
>>> # works the same whether the future comes from a thread pool or @future_wrap
>>
>> And you need the thread pool because there's no async version of
>> getaddrinfo(), right?
>
> Right.
>
>>
>>> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>>> stream = IOStream(socket.socket())
>>> yield stream.connect((addrinfo[0][-1]))
>>> stream.write('GET %s HTTP/1.0' % parsed_url.path)
>>
>> Why no yield in front of the write() call?
>
> Because we don't need to wait for the write to complete before we
> continue to the next statement. write() doesn't return anything; it
> just succeeds or fails, and if it fails the next read_until will fail
> too. (although in this case it wouldn't hurt to have the yield either)

I guess you have a certain kind of buffering built in to your stream?
So if you make two write() calls without waiting in quick succession,
does the system collapse these into one, or does it end up making two
system calls, or what? In NDB, there's a similar issue with multiple
RPCs that can be batched. I ended up writing an abstraction that
automatically combines these; the call isn't actually made until there
are no other runnable tasks. I've had to explain this a few times to
users who try to get away with overlapping CPU work and I/O, but
otherwise it's worked quite well.
Okay, I see that these are useful. However they feel as two very
different classes of callbacks -- one that is called when a *specific*
piece of I/O that was previously requested is done; another that will
be called *whenever* a certain condition becomes true on a certain
channel. The former would correspond to e.g. completion of the headers
of an incoming HTTP request); the latter might correspond to a
"listening" socket receiving another connection.

--
--Guido van Rossum (python.org/~guido)

Devin Jeanpierre

unread,
Oct 11, 2012, 6:42:55 PM10/11/12
to Guido van Rossum, python...@python.org
Could you be more specific? I've never heard Deferreds in particular
called "arcane". They're very popular in e.g. the JS world, and
possibly elsewhere. Moreover, they're extremely similar to futures, so
if one is arcane so is the other.

Maybe if you could elaborate on features of their designs that are better/worse?

As far as I know, they mostly differ in that:

- Callbacks are added in a pipeline, rather than "in parallel"
- Deferreds pass in values along the pipeline, rather than self (and
have a separate pipeline for error values).

Neither is clearly better or more obvious than the other. If anything
I generally find deferred composition more useful than deferred
tee-ing, so I feel like composition is the correct base operator, but
you could pick another. Either way, each is implementable in terms of
the other (ish?). The pipeline approach is particularly nice for the
errback pipeline, because it allows chained exception (Failure)
handling on the deferred to be very simple. The larger issue is that
futures don't make chaining easy at all, even if it is theoretically
possible.

For example, look at the following Twisted code:
http://bpaste.net/show/RfEwoaflO0qY76N8NjHx/ , and imagine how that
might generalize to more realistic error handling scenarios.

The equivalent Futures code would involve creating one Future per
callback in the pipeline and manually hooking them up with a special
callback that passes values to the next future. And if we add that to
the futures API, the API will almost certainly be somewhat similar to
what Twisted has with deferreds and chaining and such. So then,
equally arcane.

To my mind, it is Futures that need to mooch off of Deferreds, not the
other way around. Twisted's Deferreds have a lot of history with
making asynchronous computation pleasant, and Futures are missing a
lot of good tools.

-- Devin

Guido van Rossum

unread,
Oct 11, 2012, 7:37:42 PM10/11/12
to Devin Jeanpierre, python...@python.org
On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
<jeanpi...@gmail.com> wrote:
> On Thu, Oct 11, 2012 at 5:18 PM, Guido van Rossum <gu...@python.org> wrote:
>> [...] Twisted Deferred is pretty arcane, and I would much
>> rather not use it as the basis of a forward-looking design. I'd much
>> rather see what we can mooch off PEP 3148 (Futures).
>
> Could you be more specific? I've never heard Deferreds in particular
> called "arcane". They're very popular in e.g. the JS world,

Really? Twisted is used in the JS world? Or do you just mean the
pervasiveness of callback style async programming? That's one of the
things I am desperately trying to keep out of Python, I find that
style unreadable and unmanageable (whenever I click on a button in a
website and nothing happens I know someone has a bug in their
callbacks). I understand you feel different; but I feel the general
sentiment is that callback-based async programming is even harder than
multi-threaded programming (and nobody is claiming that threads are
easy :-).

> and possibly elsewhere. Moreover, they're extremely similar to futures, so
> if one is arcane so is the other.

I love Futures, they represent a nice simple programming model. But I
especially love that you can write async code using Futures and
yield-based coroutines (what you call inlineCallbacks) and never have
to write an explicit callback function. Ever.

> Maybe if you could elaborate on features of their designs that are better/worse?
>
> As far as I know, they mostly differ in that:
>
> - Callbacks are added in a pipeline, rather than "in parallel"
> - Deferreds pass in values along the pipeline, rather than self (and
> have a separate pipeline for error values).

These two combined are indeed what mostly feels arcane to me.

> Neither is clearly better or more obvious than the other. If anything
> I generally find deferred composition more useful than deferred
> tee-ing, so I feel like composition is the correct base operator, but
> you could pick another.

If you're writing long complicated chains of callbacks that benefit
from these features, IMO you are already doing it wrong. I understand
that this is a matter of style where I won't be able to convince you.
But style is important to me, so let's agree to disagree.

> Either way, each is implementable in terms of
> the other (ish?). The pipeline approach is particularly nice for the
> errback pipeline, because it allows chained exception (Failure)
> handling on the deferred to be very simple. The larger issue is that
> futures don't make chaining easy at all, even if it is theoretically
> possible.

But as soon as you switch from callbacks to yield-based coroutines the
chaining becomes natural, error handling is just a matter of
try/except statements (or not if you want the error to bubble up) and
(IMO) the code becomes much more readable.

> For example, look at the following Twisted code:
> http://bpaste.net/show/RfEwoaflO0qY76N8NjHx/ , and imagine how that
> might generalize to more realistic error handling scenarios.

Looks fine to me. I have a lot of code like that in NDB and it works
great. (Note that NDB's Futures are not the same as PEP 3148 Futures,
although they have some things in common; in particular NDB Futures
are not tied to threads.)

> The equivalent Futures code would involve creating one Future per
> callback in the pipeline and manually hooking them up with a special
> callback that passes values to the next future. And if we add that to
> the futures API, the API will almost certainly be somewhat similar to
> what Twisted has with deferreds and chaining and such. So then,
> equally arcane.

The *implementation* of this stuff in NDB is certainly hairy; I
already posted the link to the code:
http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349
However, this is internal code and doesn't affect the Future API at
all.

> To my mind, it is Futures that need to mooch off of Deferreds, not the
> other way around. Twisted's Deferreds have a lot of history with
> making asynchronous computation pleasant, and Futures are missing a
> lot of good tools.

I am totally open to learning from Twisted's experience. I hope that
you are willing to share even the end result might not look like
Twisted at all -- after all in Python 3.3 we have "yield from" and
return from a generator and many years of experience with different
styles of async APIs. In addition to Twisted, there's Tornado and
Monocle, and then there's the whole greenlets/gevent and
Stackless/microthreads community that we can't completely ignore. I
believe somewhere is an ideal async architecture, and I hope you can
help us discover it.

(For example, I am very interested in Twisted's experiences writing
real-world performant, robust reactors.)

--
--Guido van Rossum (python.org/~guido)

Terry Reedy

unread,
Oct 11, 2012, 8:29:05 PM10/11/12
to python...@python.org
On 10/11/2012 5:18 PM, Guido van Rossum wrote:

> Anyway, it would be good to have input from representatives from Wx,
> Qt, Twisted and Tornado to ensure that the *functionality* required is
> all there (never mind the exact signatures of the APIs needed to
> provide all that functionality).

And of course tk/tkinter (tho perhaps we can represent that). It occurs
to me that while i/o (file/socket) events can be added to a user
(mouse/key) event loop, and I suspect that some tk/tkinter apps do so,
it might be sensible to keep the two separate. A master loop could tell
the user-event loop to handle all user events and then the i/o loop to
handle one i/o event. This all depends on the relative speed of the
handler code.

--
Terry Jan Reedy

Guido van Rossum

unread,
Oct 11, 2012, 8:34:33 PM10/11/12
to Terry Reedy, python...@python.org
On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy <tjr...@udel.edu> wrote:
> On 10/11/2012 5:18 PM, Guido van Rossum wrote:
>
>> Anyway, it would be good to have input from representatives from Wx,
>> Qt, Twisted and Tornado to ensure that the *functionality* required is
>> all there (never mind the exact signatures of the APIs needed to
>> provide all that functionality).
>
>
> And of course tk/tkinter (tho perhaps we can represent that). It occurs to
> me that while i/o (file/socket) events can be added to a user (mouse/key)
> event loop, and I suspect that some tk/tkinter apps do so, it might be
> sensible to keep the two separate. A master loop could tell the user-event
> loop to handle all user events and then the i/o loop to handle one i/o
> event. This all depends on the relative speed of the handler code.

You should talk to a Tcl/Tk user (if there are any left :-). They
actually really like the unified event loop that's used for both
widget events and network events. Tk is probably also a good example
of a hybrid GUI system, where some of the callbacks (e.g. redraw
events) are implemented in C.

--
--Guido van Rossum (python.org/~guido)

Ben Darnell

unread,
Oct 11, 2012, 8:41:57 PM10/11/12
to Guido van Rossum, python...@python.org
Probably, although I still feel like callback-passing has its place.
For example, I think the Tornado chat demo
(https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py)
would be less clear with coroutines and Futures than it is now
(although it would fit better into Greg's schedule/unschedule style).
That doesn't mean that every method has to take a callback, but I'd be
reluctant to get rid of them until we have more experience with the
generator/future-focused style.
Tornado has a synchronous HTTPClient that does the same thing,
although each fetch creates and runs its own IOLoop rather than
spinning the top-level IOLoop. (This means it doesn't really make
sense to run it when there is a top-level IOLoop; it's provided as a
convenience for scripts and multi-threaded apps who want an
HTTPRequest interface consistent with the async version).

>
> Of course, I have it easy -- multiple incoming requests are dispatched
> to separate threads by the App Engine runtime, so I don't have to
> worry about multiplexing at that level at all -- just end user code
> that is essentially single-threaded unless they go out of their way.
>
> I did end up debugging one user's problem where they were making a
> synchronous call inside an async handler, and -- very rarely! -- the
> recursive event loop calls kept stacking up until they hit a
> StackOverflowError. So I would agree that async code shouldn't make
> synchronous API calls; but I haven't heard yet from anyone who was
> otherwise hurt by the recursive event loop invocations -- in
> particular, nobody has requested locks.

I think that's because you don't have file descriptor support. In a
(level-triggered) event loop if you don't drain the socket before
reentering the loop then your read handler will be called again, which
generally makes a mess. I suppose with coroutines you'd want
edge-triggered instead of level-triggered though, which might make
this problem go away.
Yes, IOStream does buffering for you. Each IOStream.write() call will
generally result in a syscall, but once the outgoing socket buffer is
full subsequent writes will be buffered in the IOStream and written
when the IOLoop says the socket is writable. (the callback argument
to write() can be used for flow control in this case) I used to defer
the syscall until the IOLoop was idle to batch things up, but it turns
out to be more efficient in practice to just write things out each
time and let the higher level do its own buffering when appropriate.


-Ben

Ben Darnell

unread,
Oct 11, 2012, 8:57:38 PM10/11/12
to Guido van Rossum, python...@python.org
On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <gu...@python.org> wrote:
>> Re base reactor interface: drawing maximally from the lessons learned in
>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>> etc), asynchronous-looking name lookup, fd handling are the important parts.
>
> That actually sounds more concrete than I'd like a reactor interface
> to be. In the App Engine world, there is a definite need for a
> reactor, but it cannot talk about file descriptors at all -- all I/O
> is defined in terms of RPC operations which have their own (several
> layers of) async management but still need to be plugged in to user
> code that might want to benefit from other reactor functionality such
> as scheduling and placing a call at a certain moment in the future.

So are you thinking of something like
reactor.add_event_listener(event_type, event_params, func)? One thing
to keep in mind is that file descriptors are somewhat special (at
least in a level-triggered event loop), because of the way the event
will keep firing until the socket buffer is drained or the event is
unregistered. I'd be inclined to keep file descriptors in the
interface even if they just raise an error on app engine, since
they're fairly fundamental to the (unixy) event loop. On the other
hand, I don't have any experience with event loops outside the
unix/network world so I don't know what other systems might need for
their event loops.

>
>> call_every can be implemented in terms of call_later on a separate object,
>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>> that is apparently forgotten about is event loop integration. The prime way
>> of having two event loops cooperate is *NOT* "run both in parallel", it's
>> "have one call the other". Even though not all loops support this, I think
>> it's important to get this as part of the interface (raise an exception for
>> all I care if it doesn't work).
>
> This is definitely one of the things we ought to get right. My own
> thoughts are slightly (perhaps only cosmetically) different again:
> ideally each event loop would have a primitive operation to tell it to
> run for a little while, and then some other code could tie several
> event loops together.
>
> Possibly the primitive operation would be something like "block until
> either you've got one event ready, or until a certain time (possibly
> 0) has passed without any events, and then give us the events that are
> ready and a lower bound for when you might have more work to do" -- or
> maybe instead of returning the event(s) it could just call the
> associated callback (it might have to if it is part of a GUI library
> that has callbacks written in C/C++ for certain events like screen
> refreshes).


That doesn't work very well - while one loop is waiting for its
timeout, nothing can happen on the other event loop. You have to
switch back and forth frequently to keep things responsive, which is
inefficient. I'd rather give each event loop its own thread; you can
minimize the thread-synchronization concerns by picking one loop as
"primary" and having all the others just pass callbacks over to it
when their events fire.

-Ben

Greg Ewing

unread,
Oct 11, 2012, 9:32:10 PM10/11/12
to Guido van Rossum, python...@python.org
Guido van Rossum wrote:
> Though I think you should not use
> 'thread' since that term is already reserved for OS threads as
> supported by the threading module. ... You could also use task,
> which also doesn't have a core Python
> meaning.
>
> Also I think you can now revisit it and rewrite the code to use Python 3.3.

Both good ideas. I'll see about publishing an updated version.

> It does bother me somehow that you're not using .send() and yield
> arguments at all. I notice that you have a lot ofthree-line code
> blocks like this:
>
> block_for_reading(sock)
> yield
> data = sock.recv(1024)

I wouldn't say I have a "lot". In the spamserver, there are really
only three -- one for accepting a connection, one for reading from
a socket, and one for writing to a socket. These are primitive
operations that would be provided by an async socket library.

Generally, all the yields would be hidden inside primitives like
this. Normally, user code would never need to use 'yield', only
'yield from'.

This probably didn't come through as clearly as it might have in my
tutorial. Part of the reason is that at the time I wrote it, I was
having to manually expand yield-froms into for-loops, so I was
reluctant to use any more of them than I needed to. Also, yield-from
was a new and unfamiliar concept, and I didn't want to scare people
by overusing it. These considerations led me to push some of the
yields slightly further up the layer stack than they could be.

>
> The general form seems to be:
>
> arrange for a callback when some operation can be done without blocking
> yield
> do the operation
>
> This seems to be begging to be collapsed into a single line, e.g.
>
> data = yield sock.recv_async(1024)

I'm not sure how you're imagining that would work, but whatever
it is, it's wrong -- that just doesn't make sense.

What *would* make sense is

data = yield from sock.recv_async(1024)

with sock.recv_async() being a primitive that encapsulates the
block/yield/process triplet.

> (I would also prefer to see the socket wrapped in an object that makes
> it hard to accidentally block.)

It would be straightforward to make the primitives be methods of a
socket wrapper object. I only used functions in the tutorial in the
interests of keeping the amount of machinery to a bare minimum.

> But surely there's still a place for send() and other PEP 342 features?

In the wider world of generator usage, yes. If you have a
generator that it makes sense to send() things into, for
example, and you want to factor part of it out into another
function, the fact that yield-from passes through sent values
is useful.

But we're talking about a very specialised use of generators
here, and so far I haven't thought of a use for sent or yielded
values in this context that can't be done in a more straightforward
way by other means.

Keep in mind that a value yielded by a generator being used as
part of a coroutine is *not* seen by code calling it with
yield-from. Rather, it comes out in the inner loop of the
scheduler, from the next() call being used to resume the
coroutine. Likewise, any send() call would have to be made
by the scheduler, not the yield-from caller.

So, the send/yield channel is exclusively for communication
with the *scheduler* and nothing else. Under the old way of
doing generator-based coroutines, this channel was used to
simulate a call stack by yielding 'call' and 'return'
instructions that the scheduler interpreted. But all that
is now taken care of by the yield-from mechanism, and there
is nothing left for the send/yield channel to do.

> my users sometimes want to
> treat something as a coroutine but they don't have any yields in it
>
> def caller():
> data = yield from reader()
>
> def reader():
> return 'dummy'
> yield
>
> works, but if you drop the yield it doesn't work. With a decorator I
> know how to make it work either way.

If you're talking about a decorator that turns a function
into a generator, I can't see anything particularly headachish
about that. If you mean something else, you'll have to elaborate.

--
Greg

Mark Adam

unread,
Oct 11, 2012, 9:38:43 PM10/11/12
to Guido van Rossum, python...@python.org, Terry Reedy
On Thu, Oct 11, 2012 at 7:34 PM, Guido van Rossum <gu...@python.org> wrote:
> On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy <tjr...@udel.edu> wrote:
>> On 10/11/2012 5:18 PM, Guido van Rossum wrote:
>>
>>> Anyway, it would be good to have input from representatives from Wx,
>>> Qt, Twisted and Tornado to ensure that the *functionality* required is
>>> all there (never mind the exact signatures of the APIs needed to
>>> provide all that functionality).
>>
>>
>> And of course tk/tkinter (tho perhaps we can represent that). It occurs to
>> me that while i/o (file/socket) events can be added to a user (mouse/key)
>> event loop, and I suspect that some tk/tkinter apps do so, it might be
>> sensible to keep the two separate. A master loop could tell the user-event
>> loop to handle all user events and then the i/o loop to handle one i/o
>> event. This all depends on the relative speed of the handler code.

Here's the thing: the underlying O.S is always handling two major I/O
channels at any given time and it needs all it's attention to do this:
the GUI and one of the following (network, file) I/O. You can
shuffle these around all you want, but somewhere the O.S. kernel is
going to have to be involved, which means either portability is
sacrificed or speed if one is going to pursue and abstract, unified
async API.

> You should talk to a Tcl/Tk user (if there are any left :-).

I used to be one of those :)

mark

Guido van Rossum

unread,
Oct 11, 2012, 11:40:37 PM10/11/12
to Ben Darnell, python...@python.org
Hmm... That's an interesting challenge. I can't quite say I understand
that whole program yet, but I'd like to give it a try. I think it can
be made clearer than Tornado with Futures and coroutines -- it all
depends on how you define your primitives.

> That doesn't mean that every method has to take a callback, but I'd be
> reluctant to get rid of them until we have more experience with the
> generator/future-focused style.

Totally understood. Though the nice thing of Futures is that you can
tie callbacks to them *or* use them in coroutines.
I see. Yet another possible design choice.

>> Of course, I have it easy -- multiple incoming requests are dispatched
>> to separate threads by the App Engine runtime, so I don't have to
>> worry about multiplexing at that level at all -- just end user code
>> that is essentially single-threaded unless they go out of their way.
>>
>> I did end up debugging one user's problem where they were making a
>> synchronous call inside an async handler, and -- very rarely! -- the
>> recursive event loop calls kept stacking up until they hit a
>> StackOverflowError. So I would agree that async code shouldn't make
>> synchronous API calls; but I haven't heard yet from anyone who was
>> otherwise hurt by the recursive event loop invocations -- in
>> particular, nobody has requested locks.
>
> I think that's because you don't have file descriptor support. In a
> (level-triggered) event loop if you don't drain the socket before
> reentering the loop then your read handler will be called again, which
> generally makes a mess. I suppose with coroutines you'd want
> edge-triggered instead of level-triggered though, which might make
> this problem go away.

Ah, good terminology. Coroutines definitely like being edge-triggered.
Makes sense. I think different people might want to implement slightly
different IOStream-like abstractions; this would be a good test of the
infrastructure. You should be able to craft one from scratch out of
sockets and Futures, but there should be one or two standard ones as
well, and they should all happily mix and match using the same
reactor.

--
--Guido van Rossum (python.org/~guido)

Devin Jeanpierre

unread,
Oct 12, 2012, 12:29:05 AM10/12/12
to Guido van Rossum, python...@python.org
First of all, sorry for not snipping the reply I made previously.
Noticed that only after I sent it :(

On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <gu...@python.org> wrote:
> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
> <jeanpi...@gmail.com> wrote:
>> Could you be more specific? I've never heard Deferreds in particular
>> called "arcane". They're very popular in e.g. the JS world,
>
> Really? Twisted is used in the JS world? Or do you just mean the
> pervasiveness of callback style async programming?

Ah, I mean Deferreds. I attended a talk earlier this year all about
deferreds in JS, and not a single reference to Python or Twisted was
made!

These are the examples I remember mentioned in the talk:

- http://api.jquery.com/category/deferred-object/ (not very twistedish
at all, ill-liked by the speaker)
- http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe
not a good example, mochikit tries to be "python in JS")
- http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html
- https://github.com/kriskowal/q (also includes an explanation of why
the author likes deferreds)

There were a few more that the speaker mentioned, but didn't cover.
One of his points was that the various systems of deferreds are subtly
different, some very badly so, and that it was a mess, but that
deferreds were still awesome. JS is a language where async programming
is mainstream, so lots of people try to make it easier, and they all
do it slightly differently.

> That's one of the
> things I am desperately trying to keep out of Python, I find that
> style unreadable and unmanageable (whenever I click on a button in a
> website and nothing happens I know someone has a bug in their
> callbacks). I understand you feel different; but I feel the general
> sentiment is that callback-based async programming is even harder than
> multi-threaded programming (and nobody is claiming that threads are
> easy :-).

:S

There are (at least?) four different styles of asynchronous
computation used in Twisted, and you seem to be confused as to which
ones I'm talking about.

1. Explicit callbacks:

For example, reactor.callLater(t, lambda: print("woo hoo"))

2. Method dispatch callbacks:

Similar to the above, the reactor or somebody has a handle on your
object, and calls methods that you've defined when events happen
e.g. IProtocol's dataReceived method

3. Deferred callbacks:

When you ask for something to be done, it's set up, and you get an
object back, which you can add a pipeline of callbacks to that will be
called whenever whatever happens
e.g. twisted.internet.threads.deferToThread(print,
"x").addCallback(print, "x was printed in some other thread!")

4. Generator coroutines

These are a syntactic wrapper around deferreds. If you yield a
deferred, you will be sent the result if the deferred succeeds, or an
exception if the deferred fails.
e.g. examples from previous message


I don't see a reason for the first to exist at all, the second one is
kind of nice in some circumstances (see below), but perhaps overused.

I feel like you're railing on the first and second when I'm talking
about the third and fourth. I could be wrong.

>> and possibly elsewhere. Moreover, they're extremely similar to futures, so
>> if one is arcane so is the other.
>
> I love Futures, they represent a nice simple programming model. But I
> especially love that you can write async code using Futures and
> yield-based coroutines (what you call inlineCallbacks) and never have
> to write an explicit callback function. Ever.

The reason explicit non-deferred callbacks are involved in Twisted is
because of situations in which deferreds are not present, because of
past history in Twisted. It is not at all a limitation of deferreds or
something futures are better at, best as I'm aware.

(In case that's what you're getting at.)


Anyway, one big issue is that generator coroutines can't really
effectively replace callbacks everywhere. Consider the GUI button
example you gave. How do you write that as a coroutine?

I can see it being written like this:

def mycoroutine(gui):
while True:
clickevent = yield gui.mybutton1.on_click()
# handle clickevent

But that's probably worse than using callbacks.

>> Neither is clearly better or more obvious than the other. If anything
>> I generally find deferred composition more useful than deferred
>> tee-ing, so I feel like composition is the correct base operator, but
>> you could pick another.
>
> If you're writing long complicated chains of callbacks that benefit
> from these features, IMO you are already doing it wrong. I understand
> that this is a matter of style where I won't be able to convince you.
> But style is important to me, so let's agree to disagree.

This is more than a matter of style, so at least for now I'd like to
hold off on calling it even.

In my day to day silly, synchronous, python code, I do lots of
synchronous requests. For example, it's not unreasonable for me to
want to load two different files from disk, or make several database
interactions, etc. If I want to make this asynchronous, I have to find
a way to execute multiple things that could hypothetically block, at
the same time. If I can't do that easily, then the asynchronous
solution has failed, because its entire purpose is to do everything
that I do synchronously, except without blocking the main thread.

Here's an example with lots of synchronous requests in Django:

def view_paste(request, filekey):
try:
fileinfo= Pastes.objects.get(key=filekey)
except DoesNotExist:
t = loader.get_template('pastebin/error.html')
return HttpResponse(t.render(Context(dict(error='File does not
exist'))))

f = open(fileinfo.filename)
fcontents = f.read()
t = loader.get_template('pastebin/paste.html')
return HttpResponse(t.render(Context(dict(file=fcontents))))

How many blocking requests are there? Lots. This is, in a word, a
long, complicated chain of synchronous requests. This is also very
similar to what actual django code might look like in some
circumstances. Even if we might think this is unreasonable, some
subset of alteration of this is reasonable. Certainly we should be
able to, say, load multiple (!) objects from the database, and open
the template (possibly from disk), all potentially-blocking
operations.

This is inherently a long, complicated chain of requests, whether we
implement it asynchronously or synchronously, or use Deferreds or
Futures, or write it in Java or Python. Some parts can be done at any
time before the end (loader.get_template(...)), some need to be done
in a certain order, and there's branching depending on what happens in
different cases. In order to even write this code _at all_, we need a
way to chain these IO actions together. If we can't chain them
together, we can't produce that final synthesis of results at the end.

We _need_ a pipeline or something computationally equivalent or more
powerful. Results from past "deferred computations" need to be passed
forward into future "deferred computations", in order to implement
this at all.

This is not a style issue, this is an issue of needing to be able to
solve problems that involve more than one computation where the
results of every computation matters somewhere. It's just that in this
case, some of the computations are computed asynchronously.

> I am totally open to learning from Twisted's experience. I hope that
> you are willing to share even the end result might not look like
> Twisted at all -- after all in Python 3.3 we have "yield from" and
> return from a generator and many years of experience with different
> styles of async APIs. In addition to Twisted, there's Tornado and
> Monocle, and then there's the whole greenlets/gevent and
> Stackless/microthreads community that we can't completely ignore. I
> believe somewhere is an ideal async architecture, and I hope you can
> help us discover it.
>
> (For example, I am very interested in Twisted's experiences writing
> real-world performant, robust reactors.)

For that stuff, you'd have to speak to the main authors of Twisted.
I'm just a twisted user. :(

In the end it really doesn't matter what API you go with. The Twisted
people will wrap it up so that they are compatible, as far as that is
possible.

I hope I haven't detracted too much from the main thrust of the
surrounding discussion. Futures/deferreds are a pretty big tangent, so
sorry. I justified it to myself by figuring that it'd probably come up
anyway, somehow, since these are useful abstractions for asynchronous
programming.

-- Devin

Devin Jeanpierre

unread,
Oct 12, 2012, 12:40:04 AM10/12/12
to Guido van Rossum, python...@python.org
On Fri, Oct 12, 2012 at 12:29 AM, Devin Jeanpierre
<jeanpi...@gmail.com> wrote:
>> If you're writing long complicated chains of callbacks that benefit
>> from these features, IMO you are already doing it wrong. I understand
>> that this is a matter of style where I won't be able to convince you.
>> But style is important to me, so let's agree to disagree.
>
> This is more than a matter of style, so at least for now I'd like to
> hold off on calling it even.
-- snip boredom --
> together, we can't produce that final synthesis of results at the end.

Ugh, just realized way after the fact that of course you meant
callbacks, not composition. I feel dumb.

Nevermind that whole segment.

Trent Nelson

unread,
Oct 12, 2012, 12:45:06 AM10/12/12
to Antoine Pitrou, python...@python.org
On Thu, Oct 11, 2012 at 07:40:43AM -0700, Antoine Pitrou wrote:
> On Wed, 10 Oct 2012 20:55:23 -0400
> Trent Nelson <tr...@snakebite.org> wrote:
> >
> > You could leverage this with kqueue and epoll; have similar threads
> > set up to simply process I/O independent of the GIL, using the same
> > facilities that would be used by IOCP-processing threads.
>
> Would you really win anything by doing I/O in separate threads, while
> doing normal request processing in the main thread?

If the I/O threads can run independent of the GIL, yes, definitely.
The whole premise of IOCP is that the kernel takes care of waking
one of your I/O handlers when data is ready. IOCP allows that to
happen completely independent of your application's event loop.

It really is the best way to do I/O. The Windows NT design team
got it right from the start. The AIX and Solaris implementations
are semantically equivalent to Windows, without the benefit of
automatic thread pool management (and a few other optimisations).

On Linux and BSD, you could get similar functionality by spawning
I/O threads that could also run independent of the GIL. They would
differ from the IOCP worker threads in the sense that they all have
their own little event loops around epoll/kqueue+timeout. i.e. they
have to continually ask "is there anything to do with this set of
fds", then process the results, then manage set synchronisation.

IOCP threads, on the other hand, wait for completion of something
that has already been requested. The thread body implementation is
significantly simpler, and no synchronisation primitives are needed.

> That said, the idea of a common API architected around async I/O,
> rather than non-blocking I/O, sounds interesting at least theoretically.

It's the best way to do it. There should really be a libevent-type
library (libiocp?) that leverages IOCP where possible, and fakes it
when not using a half-sync/half-async pattern with threads and epoll
or kqueue on Linux and FreeBSD, falling back to processes and poll
on everything else (NetBSD, OpenBSD and HP-UX (the former two not
having robust-enough pthread implementations, the latter not having
anything better than select or poll)).

However, given that the best IOCP implementations are a) Windows by
a huge margin, and then b) Solaris and AIX in equal, distant second
place, I can't see that happening any time soon.

(Trying to use IOCP in the reactor fashion described above for epoll
and kqueue is far more limiting than having an IOCP-oriented API
and faking it for platforms where native support isn't available.)

> Maybe all those outdated Snakebite Operating Systems are useful for
> something after all. ;-P

All the operating systems are the latest version available!
In addition, there's also a Solaris 9 and HP-UX 11iv2 box.
The hardware, on the other hand... not so new in some cases.

Trent.

Antoine Pitrou

unread,
Oct 12, 2012, 3:14:54 AM10/12/12
to python...@python.org
On Fri, 12 Oct 2012 00:29:05 -0400
Devin Jeanpierre <jeanpi...@gmail.com>
wrote:
>
> These are the examples I remember mentioned in the talk:
>
> - http://api.jquery.com/category/deferred-object/ (not very twistedish
> at all, ill-liked by the speaker)
> - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe
> not a good example, mochikit tries to be "python in JS")
> - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html
> - https://github.com/kriskowal/q (also includes an explanation of why
> the author likes deferreds)

Mochikit has been dead for years.

As for the others, just because they are called "Deferred" doesn't mean
they are the same thing. None of them seems to look like Twisted's
Deferred abstraction.

> The reason explicit non-deferred callbacks are involved in Twisted is
> because of situations in which deferreds are not present, because of
> past history in Twisted. It is not at all a limitation of deferreds or
> something futures are better at, best as I'm aware.

A Deferred can only be called once, but a dataReceived method can be
called any number of times. So you can't use a Deferred for
dataReceived unless you introduce significant hackery.

> Anyway, one big issue is that generator coroutines can't really
> effectively replace callbacks everywhere. Consider the GUI button
> example you gave. How do you write that as a coroutine?
>
> I can see it being written like this:
>
> def mycoroutine(gui):
> while True:
> clickevent = yield gui.mybutton1.on_click()
> # handle clickevent
>
> But that's probably worse than using callbacks.

Agreed. And that's precisely because your GUI button handler is a
dataReceived-alike :-)

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Antoine Pitrou

unread,
Oct 12, 2012, 3:18:25 AM10/12/12
to python...@python.org
On Fri, 12 Oct 2012 09:14:54 +0200
Antoine Pitrou <soli...@pitrou.net> wrote:

> On Fri, 12 Oct 2012 00:29:05 -0400
> Devin Jeanpierre <jeanpi...@gmail.com>
> wrote:
> >
> > These are the examples I remember mentioned in the talk:
> >
> > - http://api.jquery.com/category/deferred-object/ (not very twistedish
> > at all, ill-liked by the speaker)
> > - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe
> > not a good example, mochikit tries to be "python in JS")
> > - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html
> > - https://github.com/kriskowal/q (also includes an explanation of why
> > the author likes deferreds)
>
> Mochikit has been dead for years.
>
> As for the others, just because they are called "Deferred" doesn't mean
> they are the same thing. None of them seems to look like Twisted's
> Deferred abstraction.

Correction: actually, some of them do :-) I should have looked a bit
better.

Laurens Van Houtven

unread,
Oct 12, 2012, 5:25:41 AM10/12/12
to Guido van Rossum, python...@python.org
On Thu, Oct 11, 2012 at 11:18 PM, Guido van Rossum <gu...@python.org> wrote:
> If there's one take away idea from async-pep, it's reusable protocols.

Is there a newer version that what's on
http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any
specific proposals, after spending a lot of time giving a rationale
and defining some terms. The version on
https://github.com/lvh/async-pep doesn't seem to be any more complete.

Correct.

If I had to change it today, I'd throw out consumers and producers and just stick to a protocol API.

Do you feel that there should be less talk about rationale?
 
> The PEP should probably be a number of PEPs. At first sight, it seems that
> this number is at least four:
>
> 1. Protocol and transport abstractions, making no mention of asynchronous IO
> (this is what I want 3153 to be, because it's small, manageable, and
> virtually everyone appears to agree it's a fantastic idea)

But the devil is in the details. *What* specifically are you
proposing? How would you write a protocol handler/parser without any
reference to I/O? Most protocols are two-way streets -- you read some
stuff, and you write some stuff, then you read some more. (HTTP may be
the exception here, if you don't keep the connection open.)

It's not that there's *no* reference to IO: it's just that that reference is abstracted away in data_received and the protocol's transport object, just like Twisted's IProtocol.
 
> 2. A base reactor interface

I agree that this should be a separate PEP. But I do think that in
practice there will be dependencies between the different PEPs you are
proposing.

Absolutely.
 
> 3. A way of structuring callbacks: probably deferreds with a built-in
> inlineCallbacks for people who want to write synchronous-looking code with
> explicit yields for asynchronous procedures

Your previous two ideas sound like you're not tied to backward
compatibility with Tornado and/or Twisted (not even via an adaptation
layer). Given that we're talking Python 3.4 here that's fine with me
(though I think we should be careful to offer a path forward for those
packages and their users, even if it means making changes to the
libraries).

I'm assuming that by previous ideas you mean points 1, 2: protocol interface + reactor interface.

I don't see why twisted's IProtocol couldn't grow an adapter for stdlib Protocols. Ditto for Tornado. Similarly, the reactor interface could be *provided* (through a fairly simple translation layer) by different implementations, including twisted.
 
But Twisted Deferred is pretty arcane, and I would much
rather not use it as the basis of a forward-looking design. I'd much
rather see what we can mooch off PEP 3148 (Futures).

I think this needs to be addressed in a separate mail, since more stuff has been said about deferreds in this thread.
 
> 4+ adapting the stdlib tools to using these new things

We at least need to have an idea for how this could be done. We're
talking serious rewrites of many of our most fundamental existing
synchronous protocol libraries (e.g. httplib, email, possibly even
io.TextWrapper), most of which have had only scant updates even
through the Python 3 transition apart from complications to deal with
the bytes/str dichotomy.

I certainly agree that this is a very large amount of work. However, it has obvious huge advantages in terms of code reuse. I'm not sure if I understand the technical barrier though. It should be quite easy to create a blocking API with a protocol implementation that doesn't care; just call data_received with all your data at once, and presto! (Since transports in general don't provide guarantees as to how bytes will arrive, existing Twisted IProtocols have to do this already anyway, and that seems to work fine.)
 
> Re: forward path for existing asyncore code. I don't remember this being
> raised as an issue. If anything, it was mentioned in passing, and I think
> the answer to it was something to the tune of "asyncore's API is broken,
> fixing it is more important than backwards compat". Essentially I agree with
> Guido that the important part is an upgrade path to a good third-party
> library, which is the part about asyncore that REALLY sucks right now.

I have the feeling that the main reason asyncore sucks is that it
requires you to subclass its Dispatcher class, which has a rather
treacherous interface.

There's at least a few others, but sure, that's an obvious one. Many of the objections I can raise however don't matter if there's already an *existing working solution*. I mean, sure, it can't do SSL, but if you have code that does what you want right now, then obviously SSL isn't actually needed.
 
> Regardless, an API upgrade is probably a good idea. I'm not sure if it
> should go in the first PEP: given the separation I've outlined above (which
> may be too spread out...), there's no obvious place to put it besides it
> being a new PEP.

Aren't all your proposals API upgrades?

Sorry, that was incredibly poor wording. I meant something more of an adapter: an upgrade path for existing asyncore code to new and shiny 3153 code.
 
> Re base reactor interface: drawing maximally from the lessons learned in
> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
> etc), asynchronous-looking name lookup, fd handling are the important parts.

That actually sounds more concrete than I'd like a reactor interface
to be. In the App Engine world, there is a definite need for a
reactor, but it cannot talk about file descriptors at all -- all I/O
is defined in terms of RPC operations which have their own (several
layers of) async management but still need to be plugged in to user
code that might want to benefit from other reactor functionality such
as scheduling and placing a call at a certain moment in the future.

I have a hard time understanding how that would work well outside of something like GAE. IIUC, that level of abstraction was chosen because it made sense for GAE (and I don't disagree), but I'm not sure it makes sense here.

In this example, where would eg the select/epoll/whatever calls happen? Is it something that calls the reactor that then in turn calls whatever?

> call_every can be implemented in terms of call_later on a separate object,
> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
> that is apparently forgotten about is event loop integration. The prime way
> of having two event loops cooperate is *NOT* "run both in parallel", it's
> "have one call the other". Even though not all loops support this, I think
> it's important to get this as part of the interface (raise an exception for
> all I care if it doesn't work).

This is definitely one of the things we ought to get right. My own
thoughts are slightly (perhaps only cosmetically) different again:
ideally each event loop would have a primitive operation to tell it to
run for a little while, and then some other code could tie several
event loops together.

As an API, that's pretty close to Twisted's IReactorCore.iterate, I think. It'd work well enough. The issue is only with event loops that don't cooperate so well.

Possibly the primitive operation would be something like "block until
either you've got one event ready, or until a certain time (possibly
0) has passed without any events, and then give us the events that are
ready and a lower bound for when you might have more work to do" -- or
maybe instead of returning the event(s) it could just call the
associated callback (it might have to if it is part of a GUI library
that has callbacks written in C/C++ for certain events like screen
refreshes).

Anyway, it would be good to have input from representatives from Wx,
Qt, Twisted and Tornado to ensure that the *functionality* required is
all there (never mind the exact signatures of the APIs needed to
provide all that functionality).

 
--
--Guido van Rossum (python.org/~guido)

--
cheers
lvh

Laurens Van Houtven

unread,
Oct 12, 2012, 5:29:06 AM10/12/12
to Guido van Rossum, python...@python.org
I'm not quite sure why Deferreds + @inlineCallbacks is more complicated than Futures + coroutines. They seem, at least from a high level perspective, quite similar. You mention that you can both attach callbacks and use them in coroutines: deferreds do pretty much exactly the same thing (that is, at least there's something to translate your coroutine into a sequence of callbacks/errbacks).

If the arcane part of deferreds is from people writing ridiculous errback/callback chains, then I understand. Unfortunately people will write terrible code.

cheers
lvh

Devin Jeanpierre

unread,
Oct 12, 2012, 8:44:39 AM10/12/12
to Antoine Pitrou, python...@python.org
On Fri, Oct 12, 2012 at 3:14 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> Mochikit has been dead for years.

From the front page: "MochiKit is "feature complete" at 1.4 and not
currently in active development. It has done what we've needed it to
do for a number of years so we haven't bothered to make any major
changes to it."

Last update to the github repository was a few months ago.

That said, looking at their APIs now, I'm pretty sure mochikit was not
in that presentation. Its API isn't jQuery-like.

> As for the others, just because they are called "Deferred" doesn't mean
> they are the same thing. None of them seems to look like Twisted's
> Deferred abstraction.

They have separate callbacks for error and success, which are passed
values. That is the same. The callback chains are formed from
sequences of deferreds. That's different. If a callback returns a
deferred, then the rest of the chain is only called once that deferred
resolves -- that's the same, and super important.

There's some API differences, like .addCallbacks() --> .then(); and
.callback() --> .resolve(). And IIRC jQuery had other differences, but
maybe it's just that you use .pipe() to chain deferreds because
.then() returns a Promise instead of a Deferred? I don't remember what
was weird about jQuery, it's been a while since that talk. :(

>> The reason explicit non-deferred callbacks are involved in Twisted is
>> because of situations in which deferreds are not present, because of
>> past history in Twisted. It is not at all a limitation of deferreds or
>> something futures are better at, best as I'm aware.
>
> A Deferred can only be called once, but a dataReceived method can be
> called any number of times. So you can't use a Deferred for
> dataReceived unless you introduce significant hackery.

Haha, oops! I was being dumb and only thinking of minor cases when
callbacks are used, rather than major cases.

Some people complain that Twisted's protocols (and dataReceived)
should be like that GUI button example, though. Not major hackery,
just somewhat nasty and bug-prone.

-- Devin

Guido van Rossum

unread,
Oct 12, 2012, 12:41:10 PM10/12/12
to Python-Ideas
I am going to start some new threads on this topic, to avoid going
over 100 messages. Topics will be roughly:

- reactors
- protocol implementations
- Twisted (esp. Deferred)
- Tornado
- yield from vs. Futures

It may be a while (hours, not days).

--
--Guido van Rossum (python.org/~guido)

Christian Tismer

unread,
Oct 13, 2012, 7:42:32 AM10/13/12
to Guido van Rossum, Antoine Pitrou, python...@python.org
Hi Guido and folks,

On 07.10.12 17:04, Guido van Rossum wrote:
> On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
>> On Sat, 6 Oct 2012 17:23:48 -0700
>> Guido van Rossum <gu...@python.org> wrote:
>>> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>>>> greenlets/gevents only get you half the advantages of single-threaded
>>>> "async" programming: they get you scalability in the face of a high
>>>> number of concurrent connections, but they don't get you the robustness
>>>> of cooperative multithreading (because it's not obvious when reading
>>>> the code where the possible thread-switching points are).
>>> I used to think that too, long ago, until I discovered that as you add
>>> abstraction layers, cooperative multithreading is untenable -- sooner
>>> or later you will lose track of where the threads are switched.
>> Even with an explicit notation like "yield" / "yield from"?
> If you strictly adhere to using those you should be safe (though
> distinguishing between the two may prove challenging) -- but in
> practice it's hard to get everyone and every API to use this style. So
> you'll have some blocking API calls hidden deep inside what looks like
> a perfectly innocent call to some helper function.
>
> IIUC in Go this is solved by mixing threads and lighter-weight
> constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the
> rest of the system continues to make progress by spawning another
> thread.
>
> My own experience with NDB is that it's just too hard to make everyone
> use the async APIs all the time -- so I gave up and made async APIs an
> optional feature, offering a blocking and an async version of every
> API. I didn't start out that way, but once I started writing
> documentation aimed at unsophisticated users, I realized that it was
> just too much of an uphill battle to bother.
>
> So I think it's better to accept this and deal with it, possibly
> adding locking primitives into the mix that work well with the rest of
> the framework. Building a lock out of a tasklet-based (i.e.
> non-threading) Future class is easy enough.

I'm digging in, a bit late.
Still trying to read the myriad of messages.

For now just a word:
Guido: How much I would love to use your time machine and invite
you to discuss Pythons future in 1998.

Then we would have tossed greenlet/stackless and all that crap.
Entering a different context could have been folded deeply into Python,
by making it able to pickle program state in certain positions.

Just dreaming out loud :-)
It is great that this discussion is taking place, and I'll try to help.

cheers - Chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Matthias Urlichs

unread,
Oct 16, 2012, 2:40:07 PM10/16/12
to python...@python.org
I'll have to put in my ..02€ here …

Guido van Rossum <guido@...> writes:

> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library.

Yes.

I have two and a half reasons for this.

(½) Ultimately I think that switching stacks around is always going to be faster
than unwinding and re-winding things with yield().

(1) It's a whole lot easier to debug a problem with gevent than with anything
which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
standard stack trace. With anything else, the "where did this call come from"
information is not part of the call chain and thus is either unavailable, or
will have to be carried around preemptively (with associated overhead).

(2) Nothing against Twisted or any other async frameworks, but writing any
nontrivial program in it requires warping my brain into something that's *not*
second nature in Python, and never going to be.

Python is not Javascript; if you want to use the "loads of callbacks"
programming style, use node.js.


Personal experience: I have written an interpreter for an asynchronous and
vaguely Pythonic language which I use for home automation, my lawn sprinkers,
and related stuff (which I should probably release in some form). The code was
previously based on Twisted and was impossible to debug. It now uses gevent and
Just Works.

--
-- Matthias Urlichs

Greg Ewing

unread,
Oct 17, 2012, 3:30:16 AM10/17/12
to python...@python.org
Matthias Urlichs wrote:

> (1) It's a whole lot easier to debug a problem with gevent than with anything
> which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
> standard stack trace. With anything else, the "where did this call come from"
> information is not part of the call chain

With yield-from this is no longer true -- you get exactly the same
traceback from a yield-from call chain that you would get from the
corresponding ordinary call chain, without having to do anything
special. This is one of the beauties of it.

--
Greg

Christian Tismer

unread,
Oct 17, 2012, 4:25:03 AM10/17/12
to Matthias Urlichs, python...@python.org
Ok I'll add a buck...

On 16.10.12 20:40, Matthias Urlichs wrote:
> I'll have to put in my ..02€ here …
>
> Guido van Rossum <guido@...> writes:
>
>> (2) We're at a fork in the road here. On the one hand, we could choose
>> to deeply integrate greenlets/gevents into the standard library.
> Yes.
>
> I have two and a half reasons for this.
>
> (½) Ultimately I think that switching stacks around is always going to be faster
> than unwinding and re-winding things with yield().

If you are emulating things in Python, that may be true.

Also if you are really only switching stacks, that may be true.

But both assumptions do not fit, see below.


>
> (1) It's a whole lot easier to debug a problem with gevent than with anything
> which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
> standard stack trace. With anything else, the "where did this call come from"
> information is not part of the call chain and thus is either unavailable, or
> will have to be carried around preemptively (with associated overhead).

I'm absolutely your's on ease of coding straight forward.
But this new, efficient "yield from" is a big step into that direction,
see Greg's reply.

> (2) Nothing against Twisted or any other async frameworks, but writing any
> nontrivial program in it requires warping my brain into something that's *not*
> second nature in Python, and never going to be.

Same here.


> Python is not Javascript; if you want to use the "loads of callbacks"
> programming style, use node.js.
>
>
> Personal experience: I have written an interpreter for an asynchronous and
> vaguely Pythonic language which I use for home automation, my lawn sprinkers,
> and related stuff (which I should probably release in some form). The code was
> previously based on Twisted and was impossible to debug. It now uses gevent and
> Just Works.

You are using gevent, which uses greenlet!
That means no pure stack switching, but the stack is sliced and
moved onto the heap.
But that technique (originally from Stackless 2.0) is known to be
5-10 times slower, compared to a cooperative context switching
that is built into the interpreter.

This story is by far not over.
Even PyPy with all its advanced technology still depends on stack slicing
when it emulates concurrency.

Python 3.3 has done a huge move, because this efficient nesting
of generators can deeply influence how people are coding,
maybe with the effect that stack tricks loose more of their
importance. I expect more like this to come.

Greenlets are great. Stack inversion is faster.

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

_______________________________________________

Laurens Van Houtven

unread,
Oct 18, 2012, 7:01:30 AM10/18/12
to Matthias Urlichs, python...@python.org
Do you use gevent's monkeypatch-the-stdlib feature?


On Tue, Oct 16, 2012 at 8:40 PM, Matthias Urlichs <matt...@urlichs.de> wrote:
I'll have to put in my ..02€ here …

Guido van Rossum <guido@...> writes:

> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library.

Yes.

I have two and a half reasons for this.

(½) Ultimately I think that switching stacks around is always going to be faster
than unwinding and re-winding things with yield().

That seems like something that can be factually proven or counterproven.
 
(1) It's a whole lot easier to debug a problem with gevent than with anything
which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
standard stack trace. With anything else, the "where did this call come from"
information is not part of the call chain and thus is either unavailable, or
will have to be carried around preemptively (with associated overhead).

gevent uses stack slicing, which IIUC is pretty expensive. Why is it not subject to the performance overhead you mention?

Can you give an example of such a crappy stack trace in twisted? I develop in it all day, and get pretty decent stack traces. The closest thing I have to a crappy stack trace is when doing functional tests with an RPC API -- obviously on the client side all I'm going to see is a fairly crappy just-an-exception. That's okay, I also get the server side exception that looks like a plain old Python traceback to me and tells me exactly where the problem is from. 

 
(2) Nothing against Twisted or any other async frameworks, but writing any
nontrivial program in it requires warping my brain into something that's *not*
second nature in Python, and never going to be.

Which ones are you thinking about other than twisted? It seems that the issue you are describing is one of semantics, not so much of whether or not it actually does things asynchronously under the hood, as e.g gevent does too.
 
Python is not Javascript; if you want to use the "loads of callbacks"
programming style, use node.js.

None of the solutions on the table have node.js-style "loads of callbacks". Everything has some way of structuring them. It's either implicit switches (as in "can happen in the caller"), explicit switches (as in yield/yield from) or something like deferreds, some options having both of the latter.
 
Personal experience: I have written an interpreter for an asynchronous and
vaguely Pythonic language which I use for home automation, my lawn sprinkers,
and related stuff (which I should probably release in some form). The code was
previously based on Twisted and was impossible to debug. It now uses gevent and
Just Works.

If you have undebuggable code samples from that I'd love to take a look.
 

--
-- Matthias Urlichs

_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas



--
cheers
lvh

Reply all
Reply to author
Forward
0 new messages