[Python-ideas] Async API: some code to review

Guido van Rossum

unread,

Oct 28, 2012, 7:52:02 PM10/28/12

to Python-Ideas

I am finally ready to show the code I worked on for the past two
weeks. This is definitely not ready for anything except as a quick
demo, but I learned enough while writing it to feel comfortable with
the PEP 380 paradigm.

I've set up a Hg repo on code.google.com, and I picked a codename:
tulip. View the code here:
http://code.google.com/p/tulip/source/browse/

It runs on Linux and OSX; I have no easy access to Windows but I'd be
happy to take contributions.

Key files in the directory:

- main.py: the main program for testing, and a rough HTTP client
- sockets.py: transports for sockets and SSL, and a buffering layer
- scheduling.py: a Task class and related stuff; this is where the PEP
380 scheduler is implemented
- polling.py: an event loop and basic polling implementations for:
select(), poll(), epoll(), kqueue()

Other junk: .hgignore, Makefile, README, p3time.py (benchmark yield
from vs. plain functions), longlines.py (stupid style checker)

More detailed discussions per file follows; please read the code along
with my description (separately they may not make much sense):

polling.py: http://code.google.com/p/tulip/source/browse/polling.py

I found it remarkably easy to come up with polling implementations
using all those different system calls. I ended up mixing in the
pollster class with the event loop class, although I'm not sure that's
the best design -- perhaps it's better if the event loop just
references the pollster as a separate object.

The pollster has a very simple API: add_reader(fd, callback, *args),
add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and
poll(timeout) -> list of events. (fd means file descriptor.) There's
also pollable() which just checks if there are any fds registered. My
implementation requires fd to be an int, but that could easily be
extended to support other types of event sources. I'm not super happy
that I have parallel reader/writer APIs, but passing a separate
read/write flag didn't come out any more elegant, and I don't foresee
other operation types (though I may be wrong).

The event list started out as a tuple of (fd, flag, callback, args),
where flag is 'r' or 'w' (easily extensible); in practice neither the
fd nor the flag are used, and one of the last things I did was to wrap
callback and args into a simple object that allows cancelling the
callback; the add_*() methods return this object. (This could probably
use a little more abstraction.) Note that poll() doesn't call the
callbacks -- that's up to the event loop.

The event loop has two basic ways to register callbacks:
call_soon(callback, *args) causes callback(*args) to be called the
next time the event loop runs; call_later(delay, callback, *args)
schedules a callback at some time (relative or absolute) in the
future. It also inherits add_reader() and add_writer() from the
pollster. Then there is run(), which runs the event loop until there's
nothing left to do (no readers, no writers, no soon or later
callbacks), and run_once(), which goes through the entire list of
event sources once. (I think the order in which I do this isn't quite
right but it works for now.)

Finally, there's a helper class (ThreadRunner) here which lets you run
something in a separate thread using the features of
concurrent.futures. It uses the "self-pipe trick" (Google it :-) to
ensure that the poll() call wakes up -- this is needed by
call_in_thread() at the next layer (scheduling.py). (There may be a
race condition here, but I think it can be fixed.)

Note that there are no yields (or yield froms) here; that's for the next layer:

scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py

This is the scheduler for PEP-380 style coroutines. I started with a
Scheduler class and operations along the lines of Greg Ewing's design,
with a Scheduler instance as a global variable, but ended up ripping
it out in favor of a Task object that represents a single stack of
generators chained via yield-from. There is a Context object holding
the event loop and the current task in thread-local storage, so that
multiple threads can (and must) have independent event loops.

Most user (and much library) code in this system should be written as
generators invoking other generators directly using yield from.
However to run something as an independent task, you wrap the
generator call in a Task() constructor, possibly giving it a timeout,
and then calling its start() method. A Task also acts a little like a
future -- you can wait() for it, add done-callbacks, and it preserves
the return value of the generator call. This can be used to introduce
concurrency or to give something a separate timeout. (There are also
primitives to wait for the first N completed of a bunch of Tasks.)

To invoke a primitive I/O operation, you call the current task's
block() method and then immediately yield (similar to Greg Ewing's
approach). There are helpers block_r() and block_w() that arrange for
a task to block until a file descriptor is ready for reading/writing.
Examples of their use are in sockets.py.

There is also call_in_thread() which integrates with
polling.ThreadRunner to run a function in a separate thread and wait
for it. Also used in sockets.py.

In the docstrings I use the prefix "COROUTINE:" to indicate public
APIs that should be invoked using yield from.

sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py

This implements some internet primitives using the APIs in
scheduling.py (including block_r() and block_w()). I call them
transports but they are different from transports Twisted; they are
closer to idealized sockets. SocketTransport wraps a plain socket,
offering recv() and send() methods that must be invoked using yield
from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
stdlib ssl sockets have good async support!). Then there is a
BufferedReader class that implements more traditional read() and
readline() coroutines (i.e., to be invoked using yield from), the
latter handy for line-oriented transports. Finally there are some
functions for connecting sockets, the highest-level one
create_transport(). These use call_in_thread() to run
socket.getaddrinfo() in a thread (this provides IPv6 support).

I don't particularly care about the exact abstractions in this module;
they are convenient and I was surprised how easy it was to add SSL,
but still these mostly serve as somewhat realistic examples of how to
use scheduling.py. (Afterthought: I think the SocketTransport's recv()
and send() methods could be made more similar to SslTransport.)

More examples in the final file:

main.py: http://code.google.com/p/tulip/source/browse/main.py

There is a simplistic HTTP client here built on top of the
sockets.*Transport abstractions. And the main code exercises this by
spawning four tasks fetching a variety of URLs (more when you
uncomment a block of code) and waiting for their results. The code is
a bit of a mess because I used it as a place to try out various APIs.

I'm most interested in feedback on the design of polling.py and
scheduling.py, and to a lesser extent on the design of sockets.py;
main.py is just an example of how this style works out in practice.

Sorry for the brain-dump style; I would like to write it all up
better, but at the same time waiting longer doesn't necessarily make
it better, so here it is, for all to see. (I also have a list of
problems I had to debug during the development and what I learned from
that; but that's too raw to post right now.)

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Richard Oudkerk

unread,

Oct 29, 2012, 9:13:15 AM10/29/12

to python...@python.org

On 28/10/2012 11:52pm, Guido van Rossum wrote:
> I'm most interested in feedback on the design of polling.py and
> scheduling.py, and to a lesser extent on the design of sockets.py;
> main.py is just an example of how this style works out in practice.

What happens if two tasks try to do a read op (or two tasks try to do a
write op) on the same file descriptor? It looks like the second one to
do scheduling.block_r(fd) will cause the first task to be forgotten,
causing the first task to block forever.

Shouldn't there be a list of pending readers and a list of pending
writers for each fd?

--
Richard

Steve Dower

unread,

Oct 29, 2012, 10:00:35 AM10/29/12

to Richard Oudkerk, python...@python.org

Richard Oudkerk wrote:
> On 28/10/2012 11:52pm, Guido van Rossum wrote:
>> I'm most interested in feedback on the design of polling.py and
>> scheduling.py, and to a lesser extent on the design of sockets.py;
>> main.py is just an example of how this style works out in practice.
>
> What happens if two tasks try to do a read op (or two tasks try to do a
> write op) on the same file descriptor? It looks like the second one to
> do scheduling.block_r(fd) will cause the first task to be forgotten,
> causing the first task to block forever.

I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time. We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready.

IMO, the important questions are:

- how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer?
- how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
- how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer?
- how easy/difficult/flexible/restrictive is it to write async operations as an end user?
- how straightforward is it to consume async operations?
- how easy is it to write async code that is correct?

Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion.

Cheers,
Steve

Guido van Rossum

unread,

Oct 29, 2012, 10:47:55 AM10/29/12

to Steve Dower, Richard Oudkerk, python...@python.org

On Mon, Oct 29, 2012 at 7:00 AM, Steve Dower <Steve...@microsoft.com> wrote:
> Richard Oudkerk wrote:
>> On 28/10/2012 11:52pm, Guido van Rossum wrote:
>>> I'm most interested in feedback on the design of polling.py and
>>> scheduling.py, and to a lesser extent on the design of sockets.py;
>>> main.py is just an example of how this style works out in practice.
>>
>> What happens if two tasks try to do a read op (or two tasks try to do a
>> write op) on the same file descriptor? It looks like the second one to
>> do scheduling.block_r(fd) will cause the first task to be forgotten,
>> causing the first task to block forever.
>
> I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time.

Kind of. I think if it was an important use case it might affect the
shape of the API. However I can't think of a use case where it might
make sense for two tasks to read or write the same file descriptor
without some higher-level mediation. (Even at a higher level I find it
hard to imagine, except for writing to a common log file -- but even
there you want to be sure that individual lines aren't spliced into
each other, and the semantics of send() don't prevent that.)

> We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready.
>
> IMO, the important questions are:
>
> - how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer?
> - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
> - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer?
> - how easy/difficult/flexible/restrictive is it to write async operations as an end user?
> - how straightforward is it to consume async operations?
> - how easy is it to write async code that is correct?

Yes, these are all important questions. I'm not sure that end users
would be writing new schedulers -- but 3rd party library developers
will be, and I suppose that's what you are referring to.

My own approach to answering these is to first try to figure out what
a typical application would be trying to accomplish. That's why I made
a point of implementing a 100% async HTTP client -- it's just quirky
enough that it exercises various issues (e.g. switching between
line-mode and blob mode, and the need to invoke getaddrinfo()).

> Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion.

I'm looking forward to it! I suspect we'll be merging our designs shortly...

--
--Guido van Rossum (python.org/~guido)

Richard Oudkerk

unread,

Oct 29, 2012, 12:03:07 PM10/29/12

to python...@python.org

On 29/10/2012 2:47pm, Guido van Rossum wrote:
> Kind of. I think if it was an important use case it might affect the
> shape of the API. However I can't think of a use case where it might
> make sense for two tasks to read or write the same file descriptor
> without some higher-level mediation. (Even at a higher level I find it
> hard to imagine, except for writing to a common log file -- but even
> there you want to be sure that individual lines aren't spliced into
> each other, and the semantics of send() don't prevent that.)

It is a common pattern to have multiple threads/processes trying to
accept connections on an single listening socket, so it would be
unfortunate to disallow that. Writing (short messages) to a pipe also
has atomic guarantees that can make having multiple writers perfectly
reasonable.

--
Richard

Antoine Pitrou

unread,

Oct 29, 2012, 12:07:31 PM10/29/12

to python...@python.org

Hello Guido,

Le Sun, 28 Oct 2012 16:52:02 -0700,
Guido van Rossum <gu...@python.org> a
écrit :

>
> The event list started out as a tuple of (fd, flag, callback, args),
> where flag is 'r' or 'w' (easily extensible); in practice neither the
> fd nor the flag are used, and one of the last things I did was to wrap
> callback and args into a simple object that allows cancelling the
> callback; the add_*() methods return this object. (This could probably
> use a little more abstraction.) Note that poll() doesn't call the
> callbacks -- that's up to the event loop.

I don't understand why the pollster takes callback objects if it never
calls them. Also the fact that it wraps them into DelayedCalls is more
mysterious to me. DelayedCalls represent one-time cancellable callbacks
with a given deadline, not callbacks which are called any number of
times on I/O events and that you can't cancel.

> scheduling.py:
> http://code.google.com/p/tulip/source/browse/scheduling.py
>
> This is the scheduler for PEP-380 style coroutines. I started with a
> Scheduler class and operations along the lines of Greg Ewing's design,
> with a Scheduler instance as a global variable, but ended up ripping
> it out in favor of a Task object that represents a single stack of
> generators chained via yield-from. There is a Context object holding
> the event loop and the current task in thread-local storage, so that
> multiple threads can (and must) have independent event loops.

YMMV, but I tend to be wary of implicit thread-local storage. What if
someone runs a function or method depending on that thread-local
storage from inside a thread pool? Weird bugs ensue.

I think explicit context is much less error-prone. Even a single global
instance (like Twisted's reactor) would be better :-)

As for the rest of the scheduling module, I can't say much since I have
a hard time reading and understanding it.

> To invoke a primitive I/O operation, you call the current task's
> block() method and then immediately yield (similar to Greg Ewing's
> approach). There are helpers block_r() and block_w() that arrange for
> a task to block until a file descriptor is ready for reading/writing.
> Examples of their use are in sockets.py.

That's weird and kindof ugly IMHO. Why would you write:

scheduling.block_w(self.sock.fileno())
yield

instead of say:

yield scheduling.block_w(self.sock.fileno())

?

Also, the fact that each call to SocketTransport.{recv,send} explicitly
registers then removes the fd on the event loop looks wasteful.

By the way, even when a fd is signalled ready, you must still be
prepared for recv() to return EAGAIN (see
http://bugs.python.org/issue9090).

> In the docstrings I use the prefix "COROUTINE:" to indicate public
> APIs that should be invoked using yield from.

Hmm, should they? Your approach looks a bit weird: you have functions
that should use yield, and others that should use "yield from"? That
sounds confusing to me.

I'd much rather either have all functions use "yield", or have all
functions use "yield from".

(also, I wouldn't be shocked if coroutines had to wear a special
decorator; it's a better marker than having the word COROUTINE in the
docstring, anyway :-))

> sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
>
> This implements some internet primitives using the APIs in
> scheduling.py (including block_r() and block_w()). I call them
> transports but they are different from transports Twisted; they are
> closer to idealized sockets. SocketTransport wraps a plain socket,
> offering recv() and send() methods that must be invoked using yield
> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
> stdlib ssl sockets have good async support!).

SslTransport.{recv,send} need the same kind of logic as do_handshake():
catch both SSLWantReadError and SSLWantWriteError, and call block_r /
block_w accordingly.

> Then there is a
> BufferedReader class that implements more traditional read() and
> readline() coroutines (i.e., to be invoked using yield from), the
> latter handy for line-oriented transports.

Well... It would be nice if BufferedReader could re-use the actual
io.BufferedReader and its fast readline(), read(), readinto()
implementations.

Regards

Antoine.

Mark Hackett

unread,

Oct 29, 2012, 12:09:51 PM10/29/12

to python...@python.org

On Monday 29 Oct 2012, Richard Oudkerk wrote:
> Writing (short messages) to a pipe also
> has atomic guarantees that can make having multiple writers perfectly
> reasonable.
>
> --
> Richard
>
> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>

Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux
(because of the string operations available in the x86 instruction set), but I
don't thing anything other than an IPC message has a "you can write a string
atomically" guarantee. And I may be misremembering that.

And even if it's part of the SUS, how do we know this is true for non-UNIX
compatible systems?

Guido van Rossum

unread,

Oct 29, 2012, 12:35:16 PM10/29/12

to Richard Oudkerk, python...@python.org

On Mon, Oct 29, 2012 at 9:03 AM, Richard Oudkerk <shib...@gmail.com> wrote:
> On 29/10/2012 2:47pm, Guido van Rossum wrote:
>>
>> Kind of. I think if it was an important use case it might affect the
>> shape of the API. However I can't think of a use case where it might
>> make sense for two tasks to read or write the same file descriptor
>> without some higher-level mediation. (Even at a higher level I find it
>> hard to imagine, except for writing to a common log file -- but even
>> there you want to be sure that individual lines aren't spliced into
>> each other, and the semantics of send() don't prevent that.)
>
>
> It is a common pattern to have multiple threads/processes trying to accept
> connections on an single listening socket, so it would be unfortunate to
> disallow that.

Ah, but that will work -- each thread has its own pollster, event loop
and scheduler and collection of tasks. And listening on a socket is a
pretty special case anyway -- I imagine we'd build a special API just
for that purpose.

> Writing (short messages) to a pipe also has atomic
> guarantees that can make having multiple writers perfectly reasonable.

That's a good one. I'll keep that on the list of requirements.

--
--Guido van Rossum (python.org/~guido)

Richard Oudkerk

unread,

Oct 29, 2012, 12:41:57 PM10/29/12

to python...@python.org

On 29/10/2012 4:09pm, Mark Hackett wrote:
> Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux
> (because of the string operations available in the x86 instruction set), but I
> don't thing anything other than an IPC message has a "you can write a string
> atomically" guarantee. And I may be misremembering that.

The guarantee I was talking about is for pipes on Unix:

<quote>
POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be
atomic: the output data is written to the pipe as a contiguous sequence.
Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may
interleave the data with data written by other processes. POSIX.1-2001
requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096
bytes.) ...
</quote>

On Windows writes to pipes in message oriented mode are also atomic.

> And even if it's part of the SUS, how do we know this is true for non-UNIX
> compatible systems?

We don't, but that isn't necessarily a reason to ban it as evil.

Mark Hackett

unread,

Oct 29, 2012, 12:46:13 PM10/29/12

to python...@python.org

On Monday 29 Oct 2012, Richard Oudkerk wrote:
>

> On Windows writes to pipes in message oriented mode are also atomic.
>
> > And even if it's part of the SUS, how do we know this is true for
> > non-UNIX compatible systems?
>
> We don't, but that isn't necessarily a reason to ban it as evil.

Hey, good idea I didn't say ban it, then hey?

But if the OS cannot guarantee atomic writes (and enforce that size to ensure
atomic writes for the system run under), then you cannot just say "Atomic
writes mean we can have safely multiple threads accessing the pipe".

The multiple access requires atomic access.

If that cannot be guaranteed, then you cannot give multiple access.

Yury Selivanov

unread,

Oct 29, 2012, 12:47:50 PM10/29/12

to Antoine Pitrou, python...@python.org

On 2012-10-29, at 12:07 PM, Antoine Pitrou <soli...@pitrou.net> wrote:

>> To invoke a primitive I/O operation, you call the current task's
>> block() method and then immediately yield (similar to Greg Ewing's
>> approach). There are helpers block_r() and block_w() that arrange for
>> a task to block until a file descriptor is ready for reading/writing.
>> Examples of their use are in sockets.py.
>
> That's weird and kindof ugly IMHO. Why would you write:
>
> scheduling.block_w(self.sock.fileno())
> yield
>
> instead of say:
>
> yield scheduling.block_w(self.sock.fileno())
>
> ?

I, personally, like and use the second approach. But I believe the
main incentive for Guido & Greg to use 'yields' like that is to make
one thing *very* clear: always use 'yield from' to call something.
'yield' statement is just an explicit context switch point, and it
should be used only for that purpose and only when you write a
low-level APIs.

-
Yury

Mark Hackett

unread,

Oct 29, 2012, 12:47:46 PM10/29/12

to python...@python.org

On Monday 29 Oct 2012, Richard Oudkerk wrote:

> On Windows writes to pipes in message oriented mode are also atomic.
>

PS this means, like I said maybe, that you have to be running an IPC message
to get guaranteed atomic writes.

If someone has their python programming with multiple thread accessing the
pipe, but that pipe is NOT running in message oriented mode, then you will get
corruption.

Yury Selivanov

unread,

Oct 29, 2012, 12:59:12 PM10/29/12

to Antoine Pitrou, python...@python.org

On 2012-10-29, at 12:07 PM, Antoine Pitrou <soli...@pitrou.net> wrote:

>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
>
> Hmm, should they? Your approach looks a bit weird: you have functions
> that should use yield, and others that should use "yield from"? That
> sounds confusing to me.
>
> I'd much rather either have all functions use "yield", or have all
> functions use "yield from".
>
> (also, I wouldn't be shocked if coroutines had to wear a special
> decorator; it's a better marker than having the word COROUTINE in the
> docstring, anyway :-))

That's what bothers me is well. 'yield from' looks too long for a
simple thing it does (1); users will be confused whether they should
use 'yield' or 'yield from' (2); there is no visible difference between
a plain generator and a coroutine (3).

Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword).
With that approach it's easy to distinguish coroutines, generators and
plain functions. And it'd be easier to add some special
methods/properties to codefs, like 'in_finally()' method etc.

-
Yury

Guido van Rossum

unread,

Oct 29, 2012, 1:03:00 PM10/29/12

to Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 9:07 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> Le Sun, 28 Oct 2012 16:52:02 -0700,
> Guido van Rossum <gu...@python.org> a écrit :
>> The event list started out as a tuple of (fd, flag, callback, args),
>> where flag is 'r' or 'w' (easily extensible); in practice neither the
>> fd nor the flag are used, and one of the last things I did was to wrap
>> callback and args into a simple object that allows cancelling the
>> callback; the add_*() methods return this object. (This could probably
>> use a little more abstraction.) Note that poll() doesn't call the
>> callbacks -- that's up to the event loop.
>
> I don't understand why the pollster takes callback objects if it never
> calls them. Also the fact that it wraps them into DelayedCalls is more
> mysterious to me. DelayedCalls represent one-time cancellable callbacks
> with a given deadline, not callbacks which are called any number of
> times on I/O events and that you can't cancel.

Yeah, this part definitely needs reworking. In the current design the
pollster is a base class of the eventloop, and the latter *does* call
them; but I want to refactor that anyway. I'll probably end up with a
pollster that registers (what are to it) opaque tokens and returns
just a list of tokens from poll(). (Unrelated: would it be useful if
poll() was an iterator?)

>> scheduling.py:
>> http://code.google.com/p/tulip/source/browse/scheduling.py
>>
>> This is the scheduler for PEP-380 style coroutines. I started with a
>> Scheduler class and operations along the lines of Greg Ewing's design,
>> with a Scheduler instance as a global variable, but ended up ripping
>> it out in favor of a Task object that represents a single stack of
>> generators chained via yield-from. There is a Context object holding
>> the event loop and the current task in thread-local storage, so that
>> multiple threads can (and must) have independent event loops.
>
> YMMV, but I tend to be wary of implicit thread-local storage. What if
> someone runs a function or method depending on that thread-local
> storage from inside a thread pool? Weird bugs ensue.

Agreed, I had to figure out one of these in the implementation of
call_in_thread() and it wasn't fun.

I don't know what else to do -- I think it's probably best if I base
my implementation on this for now so that I know it works correctly in
such an environment. In the end there will probably be an API to get
the current context and another to influence how that API gets it, so
people can plug in their own schemes, from TLS to a simple global to
something determined by an external library.

> I think explicit context is much less error-prone. Even a single global
> instance (like Twisted's reactor) would be better :-)

I find that passing the context around everywhere makes for awkward APIs though.

> As for the rest of the scheduling module, I can't say much since I have
> a hard time reading and understanding it.

That's a problem, I need to write this up properly so that everyone
can understand it.

>> To invoke a primitive I/O operation, you call the current task's
>> block() method and then immediately yield (similar to Greg Ewing's
>> approach). There are helpers block_r() and block_w() that arrange for
>> a task to block until a file descriptor is ready for reading/writing.
>> Examples of their use are in sockets.py.
>
> That's weird and kindof ugly IMHO. Why would you write:
>
> scheduling.block_w(self.sock.fileno())
> yield
>
> instead of say:
>
> yield scheduling.block_w(self.sock.fileno())
>
> ?

This has been debated at nauseam already (be glad you missed it);
basically, there's not a whole lot of difference but if there are some
APIs that require "yield X(args)" and others that require "yield from
Y(args)" that's really confusing. The "bare yield only" makes it
possible (though I didn't implement it here) to put some strict checks
in the scheduler -- next() should never return anything except None.
But there are other ways to do that too.

Anyway, I probably will change the API so that e.g. sockets.py doesn't
have to use this paradigm; I'll just wrap these low-level APIs in a
proper "coroutine" and then sockets.py can just use "yield from
block_r(fd)". (This is one reason why I like the "bare generators with
yield from" approach that Greg Ewing and PEP 380 recommend: it's
really cheap to wrap an API in an extra layer of yield-from. (See the
yyftime.py benchmark I added to the tulip drectory.)

> Also, the fact that each call to SocketTransport.{recv,send} explicitly
> registers then removes the fd on the event loop looks wasteful.

I am hoping to add some optimization for this -- I am actually
planning a hackathon (or re-education session :-) with some Twisted
folks where I hope they'll explain to me how they do this.

> By the way, even when a fd is signalled ready, you must still be
> prepared for recv() to return EAGAIN (see
> http://bugs.python.org/issue9090).

Yeah, I should know, I ran into this for a Google project too (there
was a kernel driver that was lying...). I had a cryptic remark in my
post above referring to this.

>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
>
> Hmm, should they? Your approach looks a bit weird: you have functions
> that should use yield, and others that should use "yield from"? That
> sounds confusing to me.

Yeah, see above.

> I'd much rather either have all functions use "yield", or have all
> functions use "yield from".

Agreed, and I'm strongly in favor of "yield from". The block_r() +
yield is considered an *internal* API.

> (also, I wouldn't be shocked if coroutines had to wear a special
> decorator; it's a better marker than having the word COROUTINE in the
> docstring, anyway :-))

Agreed it would be useful as documentation, and maybe an API can use
this to enforce proper coding style. It would have to be purely
decoration though -- I don't want an extra layer of wrapping to occur
each time you call a coroutine. (I.e. the decorator should just return
"func".)

>> sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
>>
>> This implements some internet primitives using the APIs in
>> scheduling.py (including block_r() and block_w()). I call them
>> transports but they are different from transports Twisted; they are
>> closer to idealized sockets. SocketTransport wraps a plain socket,
>> offering recv() and send() methods that must be invoked using yield
>> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
>> stdlib ssl sockets have good async support!).
>
> SslTransport.{recv,send} need the same kind of logic as do_handshake():
> catch both SSLWantReadError and SSLWantWriteError, and call block_r /
> block_w accordingly.

Oh... Thanks for the tip. I didn't find this in the ssl module docs.

>> Then there is a
>> BufferedReader class that implements more traditional read() and
>> readline() coroutines (i.e., to be invoked using yield from), the
>> latter handy for line-oriented transports.
>
> Well... It would be nice if BufferedReader could re-use the actual
> io.BufferedReader and its fast readline(), read(), readinto()
> implementations.

Agreed, I would love that too, but the problem is, *this*
BufferedReader defines methods you have to invoke with yield from.
Maybe we can come up with a solution for sharing code by modifying the
_io module though; that would be great! (I've also been thinking of
layering TextIOWrapper on top of these.)

Thanks for the thorough review!

--
--Guido van Rossum (python.org/~guido)

Cesare Di Mauro

unread,

Oct 29, 2012, 1:02:09 PM10/29/12

to Mark Hackett, python...@python.org

2012/10/29 Mark Hackett <mark.h...@metoffice.gov.uk>

On Monday 29 Oct 2012, Richard Oudkerk wrote:
> Writing (short messages) to a pipe also
> has atomic guarantees that can make having multiple writers perfectly
> reasonable.
>
> --
> Richard
>
> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>

Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux
(because of the string operations available in the x86 instruction set), but I
don't thing anything other than an IPC message has a "you can write a string
atomically" guarantee. And I may be misremembering that.

x86 and x64 string operations aren't atomic. Only a few, selected, instructions can be LOCK prefixed (XCHG is the only one that doesn't require it, since it's always locked) to ensure an atomic RMW memory operation.

Regards,

Cesare

Yury Selivanov

unread,

Oct 29, 2012, 1:08:14 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

On 2012-10-29, at 1:03 PM, Guido van Rossum <gu...@python.org> wrote:

> Agreed it would be useful as documentation, and maybe an API can use
> this to enforce proper coding style. It would have to be purely
> decoration though -- I don't want an extra layer of wrapping to occur
> each time you call a coroutine. (I.e. the decorator should just return
> "func".)

I'd also set something like 'func.__coroutine__' to True. That will allow
to analyze, introspect, validate and do other useful things.

-
Yury

Guido van Rossum

unread,

Oct 29, 2012, 1:43:09 PM10/29/12

to Yury Selivanov, Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 10:08 AM, Yury Selivanov
<yseliv...@gmail.com> wrote:
> On 2012-10-29, at 1:03 PM, Guido van Rossum <gu...@python.org> wrote:
>
>> Agreed it would be useful as documentation, and maybe an API can use
>> this to enforce proper coding style. It would have to be purely
>> decoration though -- I don't want an extra layer of wrapping to occur
>> each time you call a coroutine. (I.e. the decorator should just return
>> "func".)
>
> I'd also set something like 'func.__coroutine__' to True. That will allow
> to analyze, introspect, validate and do other useful things.

Yes, that sounds about right.

--
--Guido van Rossum (python.org/~guido)

Andrew Svetlov

unread,

Oct 29, 2012, 2:02:09 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

Pollster has to support any object as file descriptor.
The use case is ZeroMQ sockets: they are implemented at user level and
socket is just some opaque structure wrapped by Python object.
ZeroMQ has own poll function to process zmq sockets as well as regular
sockets/pipes/files.

I would to see add_{reader,writer} and call_{soon,later} accepting
**kwargs as well as *args. At least to respect functions with
keyword-only arguments.

+1 for explicit passing loop instance and clearing role of DelayedCall.

Decorating coroutines with setting some flag looks good to me, but I
expect some problems with setting extra attribute to objects like
staticmethod/classmethod.

Thanks, Andrew.

Giampaolo Rodolà

unread,

Oct 29, 2012, 2:08:45 PM10/29/12

to Guido van Rossum, Python-Ideas

2012/10/29 Guido van Rossum <gu...@python.org>

>
> I'm most interested in feedback on the design of polling.py and
> scheduling.py, and to a lesser extent on the design of sockets.py;
> main.py is just an example of how this style works out in practice.

Follows my comments.

=== About polling.py ===

1 - I think DelayedCall should have a reset() method, other than just cancel().

2 - EventLoopMixin should have a call_every() method other than just
call_later()

3 - call_later() and call_every() should also take **kwargs other than
just *args

4 - I think PollsterBase should provide a method to modify() the
events registered for a certain fd (both poll() and epoll() have such
a method and it's faster compared to un/registering a fd).

Feel free to take a look at my scheduler implementation which looks
quite similar to what you've done in polling.py:
http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85

=== About sockets.py ===

1 - In SocketTransport it seems there's no error handling provisioned
for send() and recv().
You should expect these errors
http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60
signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"

2 - SslTransport's send() and recv() methods should suffer the same problem.

3 - I don't fully understand how data transfer works exactly but keep
in mind that the transport should interact with the pollster.
What I mean is that generally speaking a connected socket should
*always* be readable ("r"), even when it's idle, then switch to "rw"
events when sending data, then get back to "r" when all the data has
been sent.
This is *crucial* if you want to achieve high performances/scalability
and that is why PollsterBase should probably provide a modify()
method.
Please take a look at what I've done here:
http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809

=== Other considerations ===

This 'yield' / 'yield from' approach is new to me (I'm more of a
"callback guy") so I can't say I fully understand what's going on just
by reading the code.
What I would like to see instead of main.py is a bunch of code samples
/ demos showing how this library is supposed to be used in different
circumstances.
In details I'd like to see at least:

1 - a client example (connect(), send() a string, recv() a response, close())
2 - an echo server example (accept(), recv() string, send() it back(), close()
3 - how to use a different transport (e.g. UDP)?
4 - how to run long running tasks in a thread?

Also:

5 - is it possible to use multiple "reactors" in different threads?
How? (asyncore for example achieves this by providing a separate
'map' argument for both the 'reactor' and the dispatchers)

I understand you just started with this so I'm probably asking too
much at this point in time.
Feel free to consider this a kind of a "long term review".

--- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/

Guido van Rossum

unread,

Oct 29, 2012, 2:10:42 PM10/29/12

to Andrew Svetlov, Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov
<andrew....@gmail.com> wrote:
> Pollster has to support any object as file descriptor.
> The use case is ZeroMQ sockets: they are implemented at user level and
> socket is just some opaque structure wrapped by Python object.
> ZeroMQ has own poll function to process zmq sockets as well as regular
> sockets/pipes/files.

Good call! This seem to be an excellent use case to validate the
pollster design. Are you saying that the approach I used for
SslTransport doesn't work here? (I can believe it, I've never looked
at 0MQ, but I can't tell from your message.) The insistence on
isinstance(fd, int) is mostly there so that I don't accidentally
register a socket object *and* its file descriptor at the same time --
but there are other ways to ensure that. I've added a TODO item for
now.

> I would to see add_{reader,writer} and call_{soon,later} accepting
> **kwargs as well as *args. At least to respect functions with
> keyword-only arguments.

Hmm... I intentionally ruled those out because I wanted to leave the
door open for keyword args that modify the registration function
(add_reader etc.); it is awkward to require conventions like "your
function cannot have a keyword arg named X because we use that for our
own API" and it is even more awkward to have to retrofit new values of
X into that rule. Maybe we can come up with a simple wrapper.

> +1 for explicit passing loop instance and clearing role of DelayedCall.

Will do. (I think you meant clarifying?)

> Decorating coroutines with setting some flag looks good to me, but I
> expect some problems with setting extra attribute to objects like
> staticmethod/classmethod.

Noted.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 29, 2012, 2:43:57 PM10/29/12

to Giampaolo Rodolà, Python-Ideas

On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.ro...@gmail.com> wrote:
> 2012/10/29 Guido van Rossum <gu...@python.org>
>>
>> I'm most interested in feedback on the design of polling.py and
>> scheduling.py, and to a lesser extent on the design of sockets.py;
>> main.py is just an example of how this style works out in practice.
>
> Follows my comments.
>
> === About polling.py ===
>
> 1 - I think DelayedCall should have a reset() method, other than just cancel().

So, essentially an uncancel()? Why not just re-register in that case?
Or what's your use case? (Right now there's no problem in calling one
of these many times -- it's just that cancellation is permanent.)

> 2 - EventLoopMixin should have a call_every() method other than just
> call_later()

Arguably you can emulate that with a simple loop:

def call_every(secs, func, *args):
while True:
yield from scheduler.sleep(secs)
func(*args)

(Flavor to taste to log exceptions, handle cancellation, automatically
spawn a separate task, etc.)

I can build lots of other useful things out of call_soon() and
call_later() -- but I do need at least those two as "axioms".

> 3 - call_later() and call_every() should also take **kwargs other than
> just *args

I just replied to that in a previous message; there's also a comment
in the code. How important is this really? Are there lots of use cases
that require you to pass keyword args? If it's only on occasion you
can use a lambda. (The *args is a compromise so we don't need a lambda
to wrap every callback. But I want to reserve keyword args for future
extensions to the registration functions.)

> 4 - I think PollsterBase should provide a method to modify() the
> events registered for a certain fd (both poll() and epoll() have such
> a method and it's faster compared to un/registering a fd).

Did you see the concrete implementations? Those where this matters
implicitly uses modify() if the required flags change. I can imagine
more optimizations of the implementations (e.g. delaying
register()/modify() calls until poll() is actually called, to avoid
unnecessary churn) without making the API more complex.

> Feel free to take a look at my scheduler implementation which looks
> quite similar to what you've done in polling.py:
> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85

Thanks, I had seen it previously, I think this also proves that
there's nothing particularly earth-shattering about this design. :-)
I'd love to copy some more of your tricks, e.g. the occasional
re-heapifying. (What usage pattern is this dealing with exactly?) I
should also check that I've taken care of all the various flags and
other details (I recall being quite surprised that with poll(), on
some platforms I need to check for POLLHUP but not on others).

> === About sockets.py ===
>
> 1 - In SocketTransport it seems there's no error handling provisioned
> for send() and recv().
> You should expect these errors
> http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60
> signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"

Right, I know have been naive about these and have already got a TODO note.

> 2 - SslTransport's send() and recv() methods should suffer the same problem.

Ditto, Antoine told me.

> 3 - I don't fully understand how data transfer works exactly but keep
> in mind that the transport should interact with the pollster.
> What I mean is that generally speaking a connected socket should
> *always* be readable ("r"), even when it's idle, then switch to "rw"
> events when sending data, then get back to "r" when all the data has
> been sent.
> This is *crucial* if you want to achieve high performances/scalability
> and that is why PollsterBase should probably provide a modify()
> method.
> Please take a look at what I've done here:
> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809

Hm. I am not convinced that managing this explicitly from the
transport is the right solution (note that my transports are quite
different from those in Twisted). But I'll keep this in mind -- I
would like to set up a benchmark suite at some point. I will probably
have to implement the server side of HTTP for that purpose, so I can
point e.g. ab at my app.

> === Other considerations ===
>
> This 'yield' / 'yield from' approach is new to me (I'm more of a
> "callback guy") so I can't say I fully understand what's going on just
> by reading the code.

Fair enough. You should probably start by reading Greg Ewing's
tutorial -- it's short and sweet:
http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html

> What I would like to see instead of main.py is a bunch of code samples
> / demos showing how this library is supposed to be used in different
> circumstances.

Agreed, more examples are needed.

> In details I'd like to see at least:
>
> 1 - a client example (connect(), send() a string, recv() a response, close())

Hm, that's all in urlfetch().

> 2 - an echo server example (accept(), recv() string, send() it back(), close()

Yes, that's missing.

> 3 - how to use a different transport (e.g. UDP)?

I haven't looked into this yet. I expect I'll have to write a
different SocketTransport for this (the existing transports are
implicitly stream-oriented) but I know that the scheduler and
eventloop implementation can handle this fine.

> 4 - how to run long running tasks in a thread?

That's implemented. Check out call_in_thread(). Note that you can pass
it an alternate threadpool (executor).

> Also:
>
> 5 - is it possible to use multiple "reactors" in different threads?

Should be possible.

> How? (asyncore for example achieves this by providing a separate
> 'map' argument for both the 'reactor' and the dispatchers)

It works by making the Context class use thread-local storage (TLS).

> I understand you just started with this so I'm probably asking too
> much at this point in time.
> Feel free to consider this a kind of a "long term review".

You have asked many useful questions already. Since you have
implemented a real-world I/O loop yourself, your input is extremely
valuable. Thanks, and keep at it!

--
--Guido van Rossum (python.org/~guido)

Yury Selivanov

unread,

Oct 29, 2012, 3:10:17 PM10/29/12

to Andrew Svetlov, python...@python.org

On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew....@gmail.com> wrote:

> Pollster has to support any object as file descriptor.
> The use case is ZeroMQ sockets: they are implemented at user level and
> socket is just some opaque structure wrapped by Python object.
> ZeroMQ has own poll function to process zmq sockets as well as regular
> sockets/pipes/files.

Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets.
Just get the underlying file descriptor with 'getsockopt', as described
here: http://api.zeromq.org/master:zmq-getsockopt#toc20

For instance, here is a stripped out zmq support classes I have in my
framework:

class Socket(_zmq_Socket):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.fileno = self.getsockopt(FD)

...

#coroutine
def send(self, data, *, flags=0, copy=True, track=False):
flags |= NOBLOCK

try:
result = _zmq_Socket.send(self, data, flags, copy, track)
except ZMQError as e:
if e.errno != EAGAIN:
raise
self._sending = (Promise(), data, flags, copy, track)
self._scheduler.proactor._schedule_write(self)
return self._sending[0]
else:
p = Promise()
p.send(result)
return p
...

class Context(_zmq_Context):
_socket_class = Socket

And '_schedule_write' accepts any object with 'fileno' property, and
uses an appropriate polling mechanism to poll.

So to use a non-blocking ZMQ sockets, you simply do:

context = Context()
socket = context.socket(zmq.REP)
...
yield socket.send(message)

Andrew Svetlov

unread,

Oct 29, 2012, 3:24:25 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <gu...@python.org> wrote:
> On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov
> <andrew....@gmail.com> wrote:
>> Pollster has to support any object as file descriptor.
>> The use case is ZeroMQ sockets: they are implemented at user level and
>> socket is just some opaque structure wrapped by Python object.
>> ZeroMQ has own poll function to process zmq sockets as well as regular
>> sockets/pipes/files.
>
> Good call! This seem to be an excellent use case to validate the
> pollster design. Are you saying that the approach I used for
> SslTransport doesn't work here? (I can believe it, I've never looked
> at 0MQ, but I can't tell from your message.) The insistence on
> isinstance(fd, int) is mostly there so that I don't accidentally
> register a socket object *and* its file descriptor at the same time --
> but there are other ways to ensure that. I've added a TODO item for
> now.
>

0MQ socket has no file descriptor at all, it's just pointer to some
unspecified structure.
So 0MQ has own *poll* function which can process that sockets as well
as file descriptors.
Interface is mimic to poll object from python stdlib.
You can see https://github.com/zeromq/pyzmq/blob/master/zmq/eventloop/ioloop.py
as example.
For 0MQ support tulip has to have yet another reactor implementation
in line of select, epoll, kqueue etc.
Not big deal, but it would be nice if PollsterBase will not assume the
registered object is always int file descriptor.

>> I would to see add_{reader,writer} and call_{soon,later} accepting
>> **kwargs as well as *args. At least to respect functions with
>> keyword-only arguments.
>
> Hmm... I intentionally ruled those out because I wanted to leave the
> door open for keyword args that modify the registration function
> (add_reader etc.); it is awkward to require conventions like "your
> function cannot have a keyword arg named X because we use that for our
> own API" and it is even more awkward to have to retrofit new values of
> X into that rule. Maybe we can come up with a simple wrapper.

It can be solved easy with using names like __when, __callback etc.
That names will never clutter with user provided kwargs I believe.

>
>> +1 for explicit passing loop instance and clearing role of DelayedCall.
>
> Will do. (I think you meant clarifying?)

Exactly. Thanks.

>
>> Decorating coroutines with setting some flag looks good to me, but I
>> expect some problems with setting extra attribute to objects like
>> staticmethod/classmethod.
>
> Noted.
>
> --
> --Guido van Rossum (python.org/~guido)

Thank you, Andrew Svetlov

Andrew Svetlov

unread,

Oct 29, 2012, 3:32:41 PM10/29/12

to Yury Selivanov, python...@python.org

On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov <yseliv...@gmail.com> wrote:
> On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew....@gmail.com> wrote:
>
>> Pollster has to support any object as file descriptor.
>> The use case is ZeroMQ sockets: they are implemented at user level and
>> socket is just some opaque structure wrapped by Python object.
>> ZeroMQ has own poll function to process zmq sockets as well as regular
>> sockets/pipes/files.
>
> Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets.
> Just get the underlying file descriptor with 'getsockopt', as described
> here: http://api.zeromq.org/master:zmq-getsockopt#toc20

Well, will take a look. I used zmq poll only.
It works for reading only, not for writing, right?
As I know you use proactor pattern.
Can reactor has some problems with this approach?
May embedded 0MQ poll be more effective via some internal optimizations?

--
Thanks,
Andrew Svetlov

Guido van Rossum

unread,

Oct 29, 2012, 3:54:24 PM10/29/12

to Andrew Svetlov, Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov
<andrew....@gmail.com> wrote:
> On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <gu...@python.org> wrote:

[Andrew]

>>> I would to see add_{reader,writer} and call_{soon,later} accepting
>>> **kwargs as well as *args. At least to respect functions with
>>> keyword-only arguments.
>>
>> Hmm... I intentionally ruled those out because I wanted to leave the
>> door open for keyword args that modify the registration function
>> (add_reader etc.); it is awkward to require conventions like "your
>> function cannot have a keyword arg named X because we use that for our
>> own API" and it is even more awkward to have to retrofit new values of
>> X into that rule. Maybe we can come up with a simple wrapper.
>
> It can be solved easy with using names like __when, __callback etc.
> That names will never clutter with user provided kwargs I believe.

No, those names have different meaning inside a class (they would be
transformed into _<class>__when, where <class> is the name of the
*current* class textually enclosing the use). I am not closing the
door on this one but I'd have to see a lot more evidence that this
issue is widespread.

--
--Guido van Rossum (python.org/~guido)

Yury Selivanov

unread,

Oct 29, 2012, 3:57:26 PM10/29/12

to Andrew Svetlov, python...@python.org

On 2012-10-29, at 3:32 PM, Andrew Svetlov <andrew....@gmail.com> wrote:

> On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov <yseliv...@gmail.com> wrote:
>> On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew....@gmail.com> wrote:
>>
>>> Pollster has to support any object as file descriptor.
>>> The use case is ZeroMQ sockets: they are implemented at user level and
>>> socket is just some opaque structure wrapped by Python object.
>>> ZeroMQ has own poll function to process zmq sockets as well as regular
>>> sockets/pipes/files.
>>
>> Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets.
>> Just get the underlying file descriptor with 'getsockopt', as described
>> here: http://api.zeromq.org/master:zmq-getsockopt#toc20
>
> Well, will take a look. I used zmq poll only.
> It works for reading only, not for writing, right?
> As I know you use proactor pattern.
> Can reactor has some problems with this approach?
> May embedded 0MQ poll be more effective via some internal optimizations?

It's officially documented and supported approach. We haven't seen any
problem with it so far.

It works both for reading and writing, however, 99.9% EAGAIN errors occur
on reading. When you 'send', it just stores your data in an internal
buffer and sends it itself. When you 'read', well, if there is no data
in buffers you get EAGAIN.

As for the performance -- I haven't tested 'zmq.poll' vs (let's say) epoll,
but I doubt there is any significant difference. And if I would want to
write a benchmark, I'd first compare pure blocking ZMQ sockets vs
non-blocking ZMQ sockets with ZMQ.poll, as ZMQ uses threads heavily, and
probably, blocking threads-driven IO is faster then non-blocking with
polling (when FDs count is relatively small), no matter whether you use
zmq.poll or epoll/etc.

-
Yury

Andrew Svetlov

unread,

Oct 29, 2012, 3:58:35 PM10/29/12

to Guido van Rossum, Python-Ideas

Well, using keyword-only arguments for passing flags can be good point.
I can live with *args only. Maybe using **kwargs for call_later family
only is good compromise?
Really I don't care on add_reader/add_writer, that functions intended
to library writers.
call_later and call_soon can be used in user code often enough and
passing keyword arguments can be convenient.

--
Thanks,
Andrew Svetlov

Andrew Svetlov

unread,

Oct 29, 2012, 4:03:12 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

I mean just something like:

def call_soon(__self, __callback, *__args, **__kwargs):
dcall = DelayedCall(None, __callback, __args, __kwargs)
__self.ready.append(dcall)
return dcall

Not big deal, through. We can delay this discussion for later.

On Mon, Oct 29, 2012 at 9:54 PM, Guido van Rossum <gu...@python.org> wrote:
> On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov
> <andrew....@gmail.com> wrote:
>> On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <gu...@python.org> wrote:
> [Andrew]
>>>> I would to see add_{reader,writer} and call_{soon,later} accepting
>>>> **kwargs as well as *args. At least to respect functions with
>>>> keyword-only arguments.
>>>
>>> Hmm... I intentionally ruled those out because I wanted to leave the
>>> door open for keyword args that modify the registration function
>>> (add_reader etc.); it is awkward to require conventions like "your
>>> function cannot have a keyword arg named X because we use that for our
>>> own API" and it is even more awkward to have to retrofit new values of
>>> X into that rule. Maybe we can come up with a simple wrapper.
>>
>> It can be solved easy with using names like __when, __callback etc.
>> That names will never clutter with user provided kwargs I believe.
>
> No, those names have different meaning inside a class (they would be
> transformed into _<class>__when, where <class> is the name of the
> *current* class textually enclosing the use). I am not closing the
> door on this one but I'd have to see a lot more evidence that this
> issue is widespread.
>
> --
> --Guido van Rossum (python.org/~guido)

--
Thanks,
Andrew Svetlov

Giampaolo Rodolà

unread,

Oct 29, 2012, 5:20:44 PM10/29/12

to Guido van Rossum, Python-Ideas

2012/10/29 Guido van Rossum <gu...@python.org>:

> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.ro...@gmail.com> wrote:
>> 2012/10/29 Guido van Rossum <gu...@python.org>

>> === About polling.py ===
>>
>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>
> So, essentially an uncancel()? Why not just re-register in that case?
> Or what's your use case? (Right now there's no problem in calling one
> of these many times -- it's just that cancellation is permanent.)

The most common use case is when you want to disconnect the other peer
after a certain time of inactivity.
Ideally what you would do is schedule() a idle/timeout function and
reset() it every time the other peer sends you some data.

>> 2 - EventLoopMixin should have a call_every() method other than just
>> call_later()
>
> Arguably you can emulate that with a simple loop:
>
> def call_every(secs, func, *args):
> while True:
> yield from scheduler.sleep(secs)
> func(*args)
>
> (Flavor to taste to log exceptions, handle cancellation, automatically
> spawn a separate task, etc.)
>
> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".

Agreed.

>> 3 - call_later() and call_every() should also take **kwargs other than
>> just *args
>
> I just replied to that in a previous message; there's also a comment
> in the code. How important is this really? Are there lots of use cases
> that require you to pass keyword args? If it's only on occasion you
> can use a lambda. (The *args is a compromise so we don't need a lambda
> to wrap every callback. But I want to reserve keyword args for future
> extensions to the registration functions.)

It's not crucial to have kwargs, just nice, but I understand your
motives to rule them out, in fact I reserved two kwarg names
('_errback' and '_scheduler') for the same reason.
In my experience I learned that passing an extra error handler
function (what Twisted calls 'errrback') can be desirable, so that's
another thing you might want to consider.
In my scheduler implementation I achieved that by passing an _errback
keyword parameter, like this:

>>> ioloop.call_later(30, callback, _errback=err_callback)

Not very nice to use a reserved keyword, I agree.
Perhaps you can keep ruling out kwargs referred to the callback
function and change the current call_later signature as such:

- def call_later(self, when, callback, *args):
+ def call_later(self, when, callback, *args, errback=None):

...or maybe provide a DelayedCall.add_errback() method a-la Twisted.

> Thanks, I had seen it previously, I think this also proves that
> there's nothing particularly earth-shattering about this design. :-)
> I'd love to copy some more of your tricks,

Sure, go on. It's MIT licensed code.

> e.g. the occasional re-heapifying. (What usage pattern is this
> dealing with exactly?)

It's intended to avoid making the list grow with too many cancelled functions.
Imagine this use case:

WEEK = 60 x 60 x 24 x 7
for x in xrange(1000000):
f = call_later(WEEK, fun)
f.cancel()

You'll end up having a heap with milions of cancelled items which will
be freed after a week.
Instead you can keep track of the number of cancelled functions every
time cancel() is called and re-heapify the list when that number gets
too high:
http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#122

> should also check that I've taken care of all the various flags and
> other details (I recall being quite surprised that with poll(), on
> some platforms I need to check for POLLHUP but not on others).

Yeah, that's a painful part.
Try to look here:
http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464
Instead of handle_close()ing you should add the fd to the list of
readable ones ("r").
The call to recv() which will be coming next will then cause the
socket to close (you have to add the error handling to recv() first
though).

>> 3 - I don't fully understand how data transfer works exactly but keep
>> in mind that the transport should interact with the pollster.
>> What I mean is that generally speaking a connected socket should
>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>> events when sending data, then get back to "r" when all the data has
>> been sent.
>> This is *crucial* if you want to achieve high performances/scalability
>> and that is why PollsterBase should probably provide a modify()
>> method.
>> Please take a look at what I've done here:
>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
>
> Hm. I am not convinced that managing this explicitly from the
> transport is the right solution (note that my transports are quite
> different from those in Twisted). But I'll keep this in mind -- I
> would like to set up a benchmark suite at some point. I will probably
> have to implement the server side of HTTP for that purpose, so I can
> point e.g. ab at my app.

I think you might want to apply that to something slighlty higher
level than the mere transport.
Something like the equivalent of asynchat.push /
asynchat.push_with_producer, if you'll ever want to go that far in
terms of abstraction, or maybe avoid that at all but make it clear in
the doc that the user should take care of that.
My point is that having a socket registered for both "r" AND "w"
events when in fact you want only "r" OR "w" is an exponential waste
of CPU cycles and it should be avoided either by the lib or by the
user.
"old select() implementation" vs "new select() implementation"
benchmark shown here reflects exactly this problem which still affects
base asyncore module:
https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6

I'll keep following the progress on this and hopefully come up with
another set of questions and/or random thoughts.

--- Giampaolo
https://code.google.com/p/pyftpdlib/
https://code.google.com/p/psutil/
https://code.google.com/p/pysendfile/

Antoine Pitrou

unread,

Oct 29, 2012, 5:25:41 PM10/29/12

to python...@python.org

On Mon, 29 Oct 2012 10:03:00 -0700

Guido van Rossum <gu...@python.org> wrote:

> >> Then there is a
> >> BufferedReader class that implements more traditional read() and
> >> readline() coroutines (i.e., to be invoked using yield from), the
> >> latter handy for line-oriented transports.
> >
> > Well... It would be nice if BufferedReader could re-use the actual
> > io.BufferedReader and its fast readline(), read(), readinto()
> > implementations.
>
> Agreed, I would love that too, but the problem is, *this*
> BufferedReader defines methods you have to invoke with yield from.
> Maybe we can come up with a solution for sharing code by modifying the
> _io module though; that would be great! (I've also been thinking of
> layering TextIOWrapper on top of these.)

There is a rather infamous issue about _io.BufferedReader and
non-blocking I/O at http://bugs.python.org/issue13322
It is a bit problematic because currently non-blocking readline()
returns '' instead of None when no data is available, meaning EOF can't
be easily detected :(

Once this issue is solved, you could use _io.BufferedReader, and
workaround the "partial read/readline result" issue by iterating
(hopefully in most cases there is enough data in the buffer to
return a complete read or readline, so the C optimizations are useful).
Here is how it may work:

def __init__(self, fd):
self.fd = fd
self.bufio = _io.BufferedReader(...)

def readline(self):
chunks = []
while True:
line = self.bufio.readline()
if line is not None:
chunks.append(line)
if line == b'' or line.endswith(b'\n'):
# EOF or EOL
return b''.join(chunks)
yield from scheduler.block_r(self.fd)

def read(self, n):
chunks = []
bytes_read = 0
while True:
data = self.bufio.read(n - bytes_read)
if data is not None:
chunks.append(data)
bytes_read += len(data)
if data == b'' or bytes_read == n:
# EOF or read satisfied
break
yield from scheduler.block_r(self.fd)
return b''.join(chunks)

As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all
(but my memories are vague).

By the way I don't know how this whole approach (of mocking socket-like
or file-like objects with coroutine-y read() / readline() methods)
lends itself to plugging into Windows' IOCP. You may rely on some raw
I/O object that registers a callback when a read() is requested and
then yields a Future object that gets completed by the callback.
I'm sure Richard has some ideas about that :-)

Regards

Antoine.

Guido van Rossum

unread,

Oct 29, 2012, 6:03:07 PM10/29/12

to Giampaolo Rodolà, Python-Ideas

On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodolà <g.ro...@gmail.com> wrote:
> 2012/10/29 Guido van Rossum <gu...@python.org>:
>> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.ro...@gmail.com> wrote:
>>> 2012/10/29 Guido van Rossum <gu...@python.org>
>>> === About polling.py ===
>>>
>>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>>
>> So, essentially an uncancel()? Why not just re-register in that case?
>> Or what's your use case? (Right now there's no problem in calling one
>> of these many times -- it's just that cancellation is permanent.)
>
> The most common use case is when you want to disconnect the other peer
> after a certain time of inactivity.
> Ideally what you would do is schedule() a idle/timeout function and
> reset() it every time the other peer sends you some data.

Um, ok, I think you are saying that you want to be able to set
timeouts and then "reset" that timeout. This is a much higher-level
thing than canceling the DelayedCall object. (I have no desire to make
DelayedCall have functionality like Twisted's Deferred. It is
something *much* simpler; it's just the API for cancelling a callback
passed to call_later(), and its other uses are similar to this.)

[...]

> Not very nice to use a reserved keyword, I agree.
> Perhaps you can keep ruling out kwargs referred to the callback
> function and change the current call_later signature as such:
>
> - def call_later(self, when, callback, *args):
> + def call_later(self, when, callback, *args, errback=None):
>
> ...or maybe provide a DelayedCall.add_errback() method a-la Twisted.

I really don't want that though! But I'm glad you're not too hell-bent
on supporting callbacks with keyword-only args.

[...]

>> should also check that I've taken care of all the various flags and
>> other details (I recall being quite surprised that with poll(), on
>> some platforms I need to check for POLLHUP but not on others).
>
> Yeah, that's a painful part.
> Try to look here:
> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464
> Instead of handle_close()ing you should add the fd to the list of
> readable ones ("r").
> The call to recv() which will be coming next will then cause the
> socket to close (you have to add the error handling to recv() first
> though).

Aha, are you suggesting that I close the socket when I detect that the
socket is closed? But what if the other side uses shutdown() to close
only one end? Depending on the protocol it might be useful to either
stop reading but keep sending, or vice versa. Maybe I could detect
that both ends are closed and then close the socket. Or are you
suggesting something else?

>>> 3 - I don't fully understand how data transfer works exactly but keep
>>> in mind that the transport should interact with the pollster.
>>> What I mean is that generally speaking a connected socket should
>>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>>> events when sending data, then get back to "r" when all the data has
>>> been sent.
>>> This is *crucial* if you want to achieve high performances/scalability
>>> and that is why PollsterBase should probably provide a modify()
>>> method.
>>> Please take a look at what I've done here:
>>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
>>
>> Hm. I am not convinced that managing this explicitly from the
>> transport is the right solution (note that my transports are quite
>> different from those in Twisted). But I'll keep this in mind -- I
>> would like to set up a benchmark suite at some point. I will probably
>> have to implement the server side of HTTP for that purpose, so I can
>> point e.g. ab at my app.
>
> I think you might want to apply that to something slighlty higher
> level than the mere transport.

(Apply *what*?)

> Something like the equivalent of asynchat.push /
> asynchat.push_with_producer, if you'll ever want to go that far in
> terms of abstraction, or maybe avoid that at all but make it clear in
> the doc that the user should take care of that.

I'm actually not sufficiently familiar with asynchat to comment. I
think it's got quite a different model than what I am trying to set up
here.

> My point is that having a socket registered for both "r" AND "w"
> events when in fact you want only "r" OR "w" is an exponential waste
> of CPU cycles and it should be avoided either by the lib or by the
> user.

One task can only be blocked for reading OR writing. The only way to
have a socket registered for both is if there are separate tasks for
reading and writing, and then presumably that is what you want. (I
have a feeling you haven't fully grokked my HTTP client code yet?)

> "old select() implementation" vs "new select() implementation"
> benchmark shown here reflects exactly this problem which still affects
> base asyncore module:
> https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6

Hm, I am already using epoll or kqueue if available, otherwise poll,
falling back to select only if there's nothing else available (in
practice that's only Windows).

But I will diligently work towards a benchmarkable demo.

> I'll keep following the progress on this and hopefully come up with
> another set of questions and/or random thoughts.

Thanks!

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 29, 2012, 6:08:54 PM10/29/12

to Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 2:25 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Mon, 29 Oct 2012 10:03:00 -0700
> Guido van Rossum <gu...@python.org> wrote:
>> >> Then there is a
>> >> BufferedReader class that implements more traditional read() and
>> >> readline() coroutines (i.e., to be invoked using yield from), the
>> >> latter handy for line-oriented transports.
>> >
>> > Well... It would be nice if BufferedReader could re-use the actual
>> > io.BufferedReader and its fast readline(), read(), readinto()
>> > implementations.
>>
>> Agreed, I would love that too, but the problem is, *this*
>> BufferedReader defines methods you have to invoke with yield from.
>> Maybe we can come up with a solution for sharing code by modifying the
>> _io module though; that would be great! (I've also been thinking of
>> layering TextIOWrapper on top of these.)
>
> There is a rather infamous issue about _io.BufferedReader and
> non-blocking I/O at http://bugs.python.org/issue13322
> It is a bit problematic because currently non-blocking readline()
> returns '' instead of None when no data is available, meaning EOF can't
> be easily detected :(

Eeew!

> Once this issue is solved, you could use _io.BufferedReader, and
> workaround the "partial read/readline result" issue by iterating
> (hopefully in most cases there is enough data in the buffer to
> return a complete read or readline, so the C optimizations are useful).

Yes, that's what I'm hoping for.

Hm... I wonder if it would make more sense if these standard APIs were
to return specific exceptions, like the ssl module does in
non-blocking mode? Look here (I updated since posting last night):
http://code.google.com/p/tulip/source/browse/sockets.py#142

> As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all
> (but my memories are vague).

Same suggestion... (I only found out about ssl's approach to async I/O
a few days ago. It felt brilliant and right to me. But maybe I'm
missing something?)

> By the way I don't know how this whole approach (of mocking socket-like
> or file-like objects with coroutine-y read() / readline() methods)
> lends itself to plugging into Windows' IOCP.

Me neither. I hope Steve Dower can tell us.

> You may rely on some raw
> I/O object that registers a callback when a read() is requested and
> then yields a Future object that gets completed by the callback.
> I'm sure Richard has some ideas about that :-)

Which Richard?

--
--Guido van Rossum (python.org/~guido)

Andrew Svetlov

unread,

Oct 29, 2012, 6:19:16 PM10/29/12

to Guido van Rossum, Python-Ideas

On Tue, Oct 30, 2012 at 12:03 AM, Guido van Rossum <gu...@python.org> wrote:
> On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodolà <g.ro...@gmail.com> wrote:
>> 2012/10/29 Guido van Rossum <gu...@python.org>:
>>> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.ro...@gmail.com> wrote:
>>>> 2012/10/29 Guido van Rossum <gu...@python.org>
>>>> === About polling.py ===
>>>>
>>>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>>>
>>> So, essentially an uncancel()? Why not just re-register in that case?
>>> Or what's your use case? (Right now there's no problem in calling one
>>> of these many times -- it's just that cancellation is permanent.)
>>
>> The most common use case is when you want to disconnect the other peer
>> after a certain time of inactivity.
>> Ideally what you would do is schedule() a idle/timeout function and
>> reset() it every time the other peer sends you some data.
>
> Um, ok, I think you are saying that you want to be able to set
> timeouts and then "reset" that timeout. This is a much higher-level
> thing than canceling the DelayedCall object. (I have no desire to make
> DelayedCall have functionality like Twisted's Deferred. It is
> something *much* simpler; it's just the API for cancelling a callback
> passed to call_later(), and its other uses are similar to this.)
>

Twisted's DelayedCall is different from Deferred, it used for
reactor.callLater and returned from this function (the same as
call_later from tulip)
Interface is: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/interfaces.py#L676
Implementation is
http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L35
DelayedCall from twisted has nothing common with Deferred, it's just
an interface for scheduled activity, which can be called once,
cancelled or rescheduled to another time.

I've found that concept very useful when I used twisted.

--
Thanks,
Andrew Svetlov

Rene Nejsum

unread,

Oct 29, 2012, 6:23:34 PM10/29/12

to Yury Selivanov, Antoine Pitrou, python...@python.org

On Oct 29, 2012, at 5:59 PM, Yury Selivanov <yseliv...@gmail.com> wrote:

> On 2012-10-29, at 12:07 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>
>>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>>> APIs that should be invoked using yield from.
>>
>> Hmm, should they? Your approach looks a bit weird: you have functions
>> that should use yield, and others that should use "yield from"? That
>> sounds confusing to me.
>>
>> I'd much rather either have all functions use "yield", or have all
>> functions use "yield from".
>>
>> (also, I wouldn't be shocked if coroutines had to wear a special
>> decorator; it's a better marker than having the word COROUTINE in the
>> docstring, anyway :-))
>
> That's what bothers me is well. 'yield from' looks too long for a
> simple thing it does (1); users will be confused whether they should
> use 'yield' or 'yield from' (2); there is no visible difference between
> a plain generator and a coroutine (3).

I agree, was this ever commented ? I know it maybe late in the discussion
but just because you can use yield/yield from for concurrent stuff, should you?

it looks very implicit to me (breaking the second rule)

Have the delegate/event model of C# been discussed ?

As always i recommend moving the concurrent stuff to the object level, it
would be so much easier to state that a message for an object is just that:
An async message sent from one object to another… :-)
A simple decorator like @task would be enough:

@task # explicit run instance in own thread/coroutine
class SomeTask(object):
def asyc_add(self, x, y)
return x + y # returns a Future() with result

task = SomeTask()
n = task.async_add(2,2)
# Do other stuff while waiting for answer
print( "result is %d" % n ) # Future will wait/hang until result is ready

br
/rene

Guido van Rossum

unread,

Oct 29, 2012, 6:26:59 PM10/29/12

to Andrew Svetlov, Python-Ideas

On Mon, Oct 29, 2012 at 3:19 PM, Andrew Svetlov
<andrew....@gmail.com> wrote:
> Twisted's DelayedCall is different from Deferred, it used for
> reactor.callLater and returned from this function (the same as
> call_later from tulip)
> Interface is: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/interfaces.py#L676
> Implementation is
> http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L35
> DelayedCall from twisted has nothing common with Deferred, it's just
> an interface for scheduled activity, which can be called once,
> cancelled or rescheduled to another time.
>
> I've found that concept very useful when I used twisted.

Oh dear. I had no idea there was something named DelayedCall in
Twisted. There is no intention of similarity.

Steve Dower

unread,

Oct 29, 2012, 7:00:14 PM10/29/12

to Rene Nejsum, Yury Selivanov, Antoine Pitrou, python...@python.org

Rene Nejsum wrote:
>> [SNIP]

>>
>> That's what bothers me is well. 'yield from' looks too long for a
>> simple thing it does (1); users will be confused whether they should
>> use 'yield' or 'yield from' (2); there is no visible difference
>> between a plain generator and a coroutine (3).
>
> I agree, was this ever commented ? I know it maybe late in the discussion
> but just because you can use yield/yield from for concurrent stuff, should you?
>
> it looks very implicit to me (breaking the second rule)
>
> Have the delegate/event model of C# been discussed ?
>
> As always i recommend moving the concurrent stuff to the object level, it
> would be so much easier to state that a message for an object is just that:

> An async message sent from one object to another... :-) A simple decorator

> like @task would be enough:
>
> @task # explicit run instance in own thread/coroutine class SomeTask(object):
> def asyc_add(self, x, y)
> return x + y # returns a Future() with result
>
> task = SomeTask()
> n = task.async_add(2,2)
> # Do other stuff while waiting for answer print( "result is %d" % n ) # Future will
> wait/hang until result is ready

I think you'll like what I'll be sending out later tonight (US Pacific time), so hold on :) (In the meantime, feel free to read up on C#'s async/await model, which is very similar to what both Guido and I are proposing and has already been pretty well received.)

Cheers,
Steve

Greg Ewing

unread,

Oct 29, 2012, 7:16:22 PM10/29/12

to python...@python.org

Steve Dower wrote:

> - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?

I don't think that writing new schedulers is something an end user
will do very often. Or more precisely, it's not something they should
*have* to do except in extremely unusual circumstances.

I believe it will be possible to provide a scheduler in the stdlib
that will be satisfactory for the vast majority of applications.

--
Greg

Steve Dower

unread,

Oct 29, 2012, 7:12:54 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

Guido van Rossum wrote:
>
>> By the way I don't know how this whole approach (of mocking
>> socket-like or file-like objects with coroutine-y read() / readline()
>> methods) lends itself to plugging into Windows' IOCP.
>
> Me neither. I hope Steve Dower can tell us.

I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do.

From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know).

The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each.

What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries.

Cheers,
Steve

Guido van Rossum

unread,

Oct 29, 2012, 7:21:38 PM10/29/12

to Rene Nejsum, Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 3:23 PM, Rene Nejsum <re...@stranden.com> wrote:
>
> On Oct 29, 2012, at 5:59 PM, Yury Selivanov <yseliv...@gmail.com> wrote:
>
>> On 2012-10-29, at 12:07 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>>
>>>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>>>> APIs that should be invoked using yield from.
>>>
>>> Hmm, should they? Your approach looks a bit weird: you have functions
>>> that should use yield, and others that should use "yield from"? That
>>> sounds confusing to me.
>>>
>>> I'd much rather either have all functions use "yield", or have all
>>> functions use "yield from".
>>>
>>> (also, I wouldn't be shocked if coroutines had to wear a special
>>> decorator; it's a better marker than having the word COROUTINE in the
>>> docstring, anyway :-))
>>
>> That's what bothers me is well. 'yield from' looks too long for a
>> simple thing it does (1); users will be confused whether they should
>> use 'yield' or 'yield from' (2); there is no visible difference between
>> a plain generator and a coroutine (3).
>
> I agree, was this ever commented ? I know it maybe late in the discussion
> but just because you can use yield/yield from for concurrent stuff, should you?

I explained my position on yield vs. yield from twice already in this thread.

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 29, 2012, 7:26:17 PM10/29/12

to Greg Ewing, python...@python.org

Greg Ewing wrote:
> Steve Dower wrote:
>
>> - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
>
> I don't think that writing new schedulers is something an end user will do very often. Or
> more precisely, it's not something they should *have* to do except in extremely
> unusual circumstances.
>
> I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory
> for the vast majority of applications.

I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation.

Cheers,
Steve

Guido van Rossum

unread,

Oct 29, 2012, 7:29:00 PM10/29/12

to Steve Dower, Antoine Pitrou, python...@python.org

On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower <Steve...@microsoft.com> wrote:
> Guido van Rossum wrote:
>>
>>> By the way I don't know how this whole approach (of mocking
>>> socket-like or file-like objects with coroutine-y read() / readline()
>>> methods) lends itself to plugging into Windows' IOCP.
>>
>> Me neither. I hope Steve Dower can tell us.
>
> I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do.

Aha, somehow I thought Richard was a Mac expert. :-(

> From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know).

Experts all point in its direction, so I believe IOCP is solid.

> The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each.

Right. Did you see my call_in_thread() yet?
http://code.google.com/p/tulip/source/browse/scheduling.py#210
http://code.google.com/p/tulip/source/browse/polling.py#481

> What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries.

I wonder if this could be done by varying the transports by platform?
Not too many people are going to write new transports -- there just
aren't that many options. And those that do might be doing something
platform-specific anyway. It shouldn't be that hard to come up with a
transport abstraction that lets protocol implementations work
regardless of whether it's a UNIX style transport or a Windows style
transport. UNIX systems with IOCP support could use those too.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 29, 2012, 7:37:52 PM10/29/12

to Steve Dower, python...@python.org

On Mon, Oct 29, 2012 at 4:26 PM, Steve Dower <Steve...@microsoft.com> wrote:
> Greg Ewing wrote:
>> Steve Dower wrote:
>>
>>> - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
>>
>> I don't think that writing new schedulers is something an end user will do very often. Or
>> more precisely, it's not something they should *have* to do except in extremely
>> unusual circumstances.
>>
>> I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory
>> for the vast majority of applications.
>
> I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation.

BTW, would it be useful to separate this into pollster, eventloop, and
scheduler? At least in my world these are different; of these three,
only the pollster contains platform-specific code (and then again the
transports do too -- this is a nice match IMO).

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 29, 2012, 7:47:51 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

> Guido van Rossum wrote:
> [SNIP]

>
> On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower <Steve...@microsoft.com> wrote:
>> The whole blocking coroutine model works really well with callback-based unblocks
>> (whether they call Future.set_result or unblock_task), so I don't think there's anything
>> to worry about here. Compatibility-wise, it should be easy to make programs portable,
>> and since we can have completely separate implementations for Linux/Mac/Windows it
>> will be possible to get good, if not excellent, performance out of each.
>
> Right. Did you see my call_in_thread() yet?
> http://code.google.com/p/tulip/source/browse/scheduling.py#210
> http://code.google.com/p/tulip/source/browse/polling.py#481

Yes, and it really stood out as one of the similarities between our work. I don't have an equivalent function, since writing "yield thread_pool.submit(...)" is sufficient (because it already returns a Future), but I haven't actually made the thread pool a property of the current scheduler. I think there's value in it

>> What will make a difference is the ready vs. complete notifications - most async Windows
>> APIs will signal when they are complete (for example, the data has been read from the file)
>> unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this
>> difference up by making all APIs notify on completion, and if we don't do this then user code
>> may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important
>> consideration for good cross-platform libraries.
>
> I wonder if this could be done by varying the transports by platform?
> Not too many people are going to write new transports -- there just aren't that many options.
> And those that do might be doing something platform-specific anyway. It shouldn't be that hard
> to come up with a transport abstraction that lets protocol implementations work regardless of
> whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support
> could use those too.

I feel like a bit of a tease now, since I still haven't posted my code (it's coming, but I also have day work to do [also Python related]), but I've really left this side of things out of my definition completely in favour of allowing schedulers to "unblock" known functions. For example, (library) code that needs a socket to be ready can ask the current scheduler if it can do "select([sock], [], [])", and if the scheduler can then it will give the library code a Future. How the scheduler ends up implementing the asynchronous-select is entirely up to the scheduler, and if it can't do it, the caller can do it their own way (which probably means using a thread pool as a last resort).

What I would expect this to result in is a set of platform-specific default schedulers that do common operations well and other (3rd-party) schedulers that do particular things really well. So if you want high performance single-threaded sockets, you replace the default scheduler with another one - but if Windows doesn't support the optimized scheduler, you can use the default scheduler without your code breaking.

Writing this now it seems to be even clearer that we've approached the problem differently, which should mean there'll be room to share parts of the designs and come up with a really solid result. I'm looking forward to it.

Cheers,
Steve

Greg Ewing

unread,

Oct 29, 2012, 7:53:56 PM10/29/12

to python...@python.org

Mark Hackett wrote:

> Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux
> (because of the string operations available in the x86 instruction set), but I
> don't thing anything other than an IPC message has a "you can write a string
> atomically" guarantee. And I may be misremembering that.

It seems to be a POSIX requirement:

PIPE_BUF
POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be
atomic: the output data is written to the pipe as a contiguous
sequence.

(From http://dell9.ma.utexas.edu/cgi-bin/man-cgi?pipe+7)

There's no corresponding guarantee for reading, though. The process
on the other end can't be sure of getting the data from one write()
call in a single read() call. In other words, the write does *not*
establish a record boundary.

--
Greg

Richard Oudkerk

unread,

Oct 29, 2012, 8:01:23 PM10/29/12

to python...@python.org

On 29/10/2012 11:29pm, Guido van Rossum wrote:
> I wonder if this could be done by varying the transports by platform?
> Not too many people are going to write new transports -- there just
> aren't that many options. And those that do might be doing something
> platform-specific anyway. It shouldn't be that hard to come up with a
> transport abstraction that lets protocol implementations work
> regardless of whether it's a UNIX style transport or a Windows style
> transport. UNIX systems with IOCP support could use those too.

Yes, having separate implementations of the transport layer should work.

But I think it would be cleaner to put all the platform specific stuff
in the pollster, and make the pollster poll-for-completion rather than
poll-for-readiness. (Is this the "proactor pattern"?) That seems to be
the direction libevent has moved in.

--
Richard

Greg Ewing

unread,

Oct 29, 2012, 8:19:08 PM10/29/12

to python...@python.org

Guido van Rossum wrote:
>>I would to see add_{reader,writer} and call_{soon,later} accepting
>>**kwargs as well as *args. At least to respect functions with
>>keyword-only arguments.
>
> Hmm... I intentionally ruled those out because I wanted to leave the
> door open for keyword args that modify the registration function

One way to accommodate that would be to make the
registration API look like this:

call_later(my_func)(arg1, ..., kwd = value, ...)

--
Greg

Greg Ewing

unread,

Oct 29, 2012, 8:24:18 PM10/29/12

to Python-Ideas

Guido van Rossum wrote:

> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".

Isn't call_soon() equivalent to call_later() with a
time delay of 0?

If so, then call_later() is really the only axiomatic one.

Greg Ewing

unread,

Oct 29, 2012, 8:25:43 PM10/29/12

to python...@python.org

Andrew Svetlov wrote:

> 0MQ socket has no file descriptor at all, it's just pointer to some
> unspecified structure.
> So 0MQ has own *poll* function which can process that sockets as well
> as file descriptors.

Aaargh... yet another event loop that wants to rule
the world. This is not good.

--
Greg

Nick Coghlan

unread,

Oct 29, 2012, 8:34:24 PM10/29/12

to Guido van Rossum, Antoine Pitrou, python...@python.org

On Tue, Oct 30, 2012 at 9:29 AM, Guido van Rossum <gu...@python.org> wrote:
> Aha, somehow I thought Richard was a Mac expert. :-(

Just in case anyone else confused the two names (I know I have in the past):

Ronald Oussoren = Mac expert
Richard Oudkerk = multiprocessing expert (including tools for
inter-process communication)

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Yury Selivanov

unread,

Oct 29, 2012, 8:43:23 PM10/29/12

to Guido van Rossum, Python-Ideas

Guido,

Finally got some time to do a review & read what others posted.
Some comments are more general, some are more implementation-specific
(hopefully you want to hear latter ones as well)

And I'm still in the process of digesting your approach & code (as
I've spent too much time with my implementation)...

On 2012-10-28, at 7:52 PM, Guido van Rossum <gu...@python.org> wrote:
[...]
> polling.py: http://code.google.com/p/tulip/source/browse/polling.py
[...]

1. I'd make EventLoopMixin a separate entity from pollsters. So that you'd
be able to add many different pollsters to one EventLoop. This way
you can have specialized pollster for different types of IO, including
UI etc.

2. Sometimes, there is a need to run a coroutine in a threadpool. I know it
sounds weird, but it's probably worth exploring.

3. In my framework each threadpool worker has its own local context, with
various information like what Task run the operation etc.

And few small things:

4. epoll.poll and other syscalls need to be wrapped in try..except to catch
and ignore (and log?) EINTR type of exceptions.

5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors
too.

> scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py
[...]

> In the docstrings I use the prefix "COROUTINE:" to indicate public
> APIs that should be invoked using yield from.

[...]

As others, I would definitely suggest adding a decorator to make
coroutines more distinguishable. It would be even better if we can return
a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)',
instead of:

task = scheduling.Task(doit(), timeout=2.1)
task.start()
scheduling.run()

And avoid manual Task instantiation at all.

I also liked the simplicity of the Task class. I think it'd be easy
to mix greenlets in it by switching in a new greenlet on each 'step'.
That will give you 'yield_()' function, which you can use in the same
way you use 'yield' statement now (I'm not proposing to incorporate
greenlets in the lib itself, but rather to provide an option to do so)
Hence there should be a way to plug your own Task (sub-)class in.

Thank you,
Yury

Guido van Rossum

unread,

Oct 29, 2012, 9:02:43 PM10/29/12

to Richard Oudkerk, python...@python.org

On Mon, Oct 29, 2012 at 5:01 PM, Richard Oudkerk <shib...@gmail.com> wrote:
> On 29/10/2012 11:29pm, Guido van Rossum wrote:
>>
>> I wonder if this could be done by varying the transports by platform?
>> Not too many people are going to write new transports -- there just
>> aren't that many options. And those that do might be doing something
>> platform-specific anyway. It shouldn't be that hard to come up with a
>> transport abstraction that lets protocol implementations work
>> regardless of whether it's a UNIX style transport or a Windows style
>> transport. UNIX systems with IOCP support could use those too.
>
>
> Yes, having separate implementations of the transport layer should work.
>
> But I think it would be cleaner to put all the platform specific stuff in
> the pollster, and make the pollster poll-for-completion rather than
> poll-for-readiness. (Is this the "proactor pattern"?) That seems to be the
> direction libevent has moved in.

Interesting. I'd like to hear what Twisted thinks of this. (I will
find out next week. :-)

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 29, 2012, 10:07:21 PM10/29/12

to Yury Selivanov, Python-Ideas

On Mon, Oct 29, 2012 at 5:43 PM, Yury Selivanov <yseliv...@gmail.com> wrote:
> Finally got some time to do a review & read what others posted.

Great!

> Some comments are more general, some are more implementation-specific
> (hopefully you want to hear latter ones as well)

Yes!

> And I'm still in the process of digesting your approach & code (as
> I've spent too much time with my implementation)...

Heh. :-)

> On 2012-10-28, at 7:52 PM, Guido van Rossum <gu...@python.org> wrote:
> [...]
>> polling.py: http://code.google.com/p/tulip/source/browse/polling.py
> [...]
>
> 1. I'd make EventLoopMixin a separate entity from pollsters. So that you'd
> be able to add many different pollsters to one EventLoop. This way
> you can have specialized pollster for different types of IO, including
> UI etc.

I came to the same conclusion, so I fixed this. See the latest version.

(BTW, I also renamed add_reader() etc. on the Pollster class to
register_reader() etc. -- I dislike similar APIs on different classes
to have the same name if there's not a strict super class override
involved.)

> 2. Sometimes, there is a need to run a coroutine in a threadpool. I know it
> sounds weird, but it's probably worth exploring.

I think that can be done quite simply. Since each thread has its own
eventloop (via the magic of TLS), it's as simple as writing a function
that creates a task, starts it, and then runs the eventloop. There's
nothing else running in that particular thread, and its eventloop will
terminate when there's nothing left to do there -- i.e. when the task
is done. Sketch:

def some_generator(arg):
...stuff using yield from...
return 42

def run_it_in_the_threadpool(arg):
t = Task(some_generator(arg))
t.start()
scheduling.run()
return t.result

# And in your code:
result = yield from scheduling.call_in_thread(run_it_in_the_threadpool, arg)

# Now result == 42.

> 3. In my framework each threadpool worker has its own local context, with
> various information like what Task run the operation etc.

I think I have this too -- Thread-Local Storage!

> And few small things:
>
> 4. epoll.poll and other syscalls need to be wrapped in try..except to catch
> and ignore (and log?) EINTR type of exceptions.

Good point.

> 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors
> too.

Do you have a code sample? I haven't found a need yet.

>> scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py
> [...]
>
>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
> [...]
>
> As others, I would definitely suggest adding a decorator to make
> coroutines more distinguishable.

That's definitely on my TODO list.

> It would be even better if we can return
> a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)',
> instead of:
>
> task = scheduling.Task(doit(), timeout=2.1)
> task.start()
> scheduling.run()

The run() call shouldn't be necessary unless you are at the toplevel.

> And avoid manual Task instantiation at all.

Hm. I want the generator function to return just a generator object,
and I can't add methods to that. But we can come up with a decent API.

> I also liked the simplicity of the Task class. I think it'd be easy
> to mix greenlets in it by switching in a new greenlet on each 'step'.
> That will give you 'yield_()' function, which you can use in the same
> way you use 'yield' statement now (I'm not proposing to incorporate
> greenlets in the lib itself, but rather to provide an option to do so)
> Hence there should be a way to plug your own Task (sub-)class in.

Hm. Someone else will have to give that a try.

Thanks for your feedback!!

--
--Guido van Rossum (python.org/~guido)

Yury Selivanov

unread,

Oct 29, 2012, 10:18:25 PM10/29/12

to Guido van Rossum, Python-Ideas

On 2012-10-29, at 10:07 PM, Guido van Rossum <gu...@python.org> wrote:
[...]

>> 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors
>> too.
>
> Do you have a code sample? I haven't found a need yet.

Just a code dump from my epoll proactor:

if ev & EPOLLHUP:
sock.close(_error_cls=ConnectionResetError)
self._unschedule(fd)
continue

if ev & EPOLLERR:
sock.close(_error_cls=ConnectionError, _error_msg='socket error in epoll proactor')
self._unschedule(fd)
continue

[...]

>> It would be even better if we can return
>> a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)',
>> instead of:
>>
>> task = scheduling.Task(doit(), timeout=2.1)
>> task.start()
>> scheduling.run()
>
> The run() call shouldn't be necessary unless you are at the toplevel.

Yes, that's just a sugar to make top-level runs more appealing.
You'll also get a nice way of setting timeouts,

yield from coro().with_timeout(1.0)

[...]

>> I also liked the simplicity of the Task class. I think it'd be easy
>> to mix greenlets in it by switching in a new greenlet on each 'step'.
>> That will give you 'yield_()' function, which you can use in the same
>> way you use 'yield' statement now (I'm not proposing to incorporate
>> greenlets in the lib itself, but rather to provide an option to do so)
>> Hence there should be a way to plug your own Task (sub-)class in.
>
> Hm. Someone else will have to give that a try.

I'll be that someone once we choose the direction ;) IMO the greenlets
integration is a very important topic.

-
Yury

Greg Ewing

unread,

Oct 30, 2012, 1:10:28 AM10/30/12

to python...@python.org

Steve Dower wrote:

> From my point of view, IOCP fits in very well provided the callbacks (which will
> run in the IOCP thread pool) are only used to unblock tasks.

Is it really necessary to have a separate thread just to handle
unblocking tasks? That thread will have very little to do, so
it could just as well run the tasks too, couldn't it?

--
Greg

Greg Ewing

unread,

Oct 30, 2012, 1:20:13 AM10/30/12

to python...@python.org

Steve Dower wrote:

>>I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory
>>for the vast majority of applications.
>
> I agree, and I chose my words poorly for that point: "library/framework
> developers" is more accurate than "end user".

I don't think that even library developers should need to write
their own scheduler very often.

> And since I expect every GUI
> framework is going to need (or at least want) their own scheduler,

I don't agree with that. They might need their own event loop,
but I haven't seen any reason so far to think they would need
their own coroutine scheduler.

Remember that Guido wants to keep the event loop stuff and the
scheduler stuff very clearly separated. The scheduler will all
be pure Python and should be usable with just about any
event loop.

--
Greg

Greg Ewing

unread,

Oct 30, 2012, 1:27:34 AM10/30/12

to python...@python.org

Steve Dower wrote:
> For example, (library) code that needs
> a socket to be ready can ask the current scheduler if it can do "select([sock],
> [], [])",

I think you're mixing up the scheduler and event loop layers
here. If the scheduler is involved in this at all, it would only
be to pass the request on to the event loop.

--
Greg

Greg Ewing

unread,

Oct 30, 2012, 1:36:10 AM10/30/12

to Python-Ideas

Yury Selivanov wrote:
> It would be even better if we can return
> a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)',
> instead of:
>
> task = scheduling.Task(doit(), timeout=2.1)
> task.start()
> scheduling.run()

I would prefer spelling this something like

scheduling.spawn(doit(), timeout=2.1)

A newly spawned task should be scheduled automatically; if
you're not ready for it to run yet, then don't spawn it
until you are.

Also, it should almost *never* be necessary to call
scheduling.run(). That should happen only in a very few
places, mostly buried deep inside the scheduling/event
loop system.

--
Greg

Laurens Van Houtven

unread,

Oct 30, 2012, 6:12:17 AM10/30/12

to Guido van Rossum, Python-Ideas

Hi,

I've been following the PEP380-related threads and I've reviewed this stuff, while trying to do the protocols/transports PEP, and trying to glue the two together.

The biggest difference I can see is that protocols as they've been discussed are "pull": they get called when some data arrives. They don't know how much data there is; they just get told "here's some data". The obvious difference with the API in, eg:

https://code.google.com/p/tulip/source/browse/sockets.py#56

... is that now I have to tell a socket to read n bytes, which "blocks" the coroutine, then I get some data.

Now, there doesn't have to be an issue; you could simply say:

data = yield from s.recv(4096) # that's the magic number usually right
proto.data_received(4096)

It seems a bit boilerplatey, but I suppose that eventually could be hidden away.

But this style is pervasive, for example that's how reading by lines works:

https://code.google.com/p/tulip/source/browse/echosvr.py#20

While I'm not a big fan (I may be convinced if I see a protocol test that looks nice); I'm just wondering if there's any point in trying to write the pull-style protocols when this works quite differently.

Additionally, I'm not sure if readline belongs on the socket. I understand the simile with files, though. With the coroutine style I could see how the most obvious fit would be something like tornado's read_until, or an as_lines that essentially calls read_until repeatedly. Can the delimiter for this be modified?

My main syntactic gripe is that when I write @inlineCallbacks code or monocle code or whatever, when I say "yield" I'm yielding to the reactor. That makes sense to me (I realize natural language arguments don't always make sense in a programming language context). "yield from" less so (but okay, that's what it has to look like). But this just seems weird to me:

yield from trans.send(line.upper())

Not only do I not understand why I'm yielding there in the first place (I don't have to wait for anything, I just want to push some data out!), it feels like all of my yields have been replaced with yield froms for no obvious reason (well, there are reasons, I'm just trying to look at this naively).

I guess Twisted gets away with this because of deferred chaining: that one deferred might have tons of callbacks in the background, many of which also doing IO operations, resulting in a sequence of asynchronous operations that only at the end cause the generator to be run some more.

I guess that belongs in a different thread, though. Even, then, I'm not sure if I'm uncomfortable because I'm seeing something different from what I'm used to, or if my argument from English actually makes any sense whatsoever.

Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this).

cheers
lvh

Antoine Pitrou

unread,

Oct 30, 2012, 7:36:41 AM10/30/12

to python...@python.org

Le Tue, 30 Oct 2012 18:10:28 +1300,
Greg Ewing <greg....@canterbury.ac.nz> a
écrit :

> Steve Dower wrote:
>
> > From my point of view, IOCP fits in very well provided the
> > callbacks (which will run in the IOCP thread pool) are only used to
> > unblock tasks.
>
> Is it really necessary to have a separate thread just to handle
> unblocking tasks? That thread will have very little to do, so
> it could just as well run the tasks too, couldn't it?

The IOCP thread pool is managed by Windows, not you.

Regards

Antoine.

Jakob Bowyer

unread,

Oct 30, 2012, 9:10:53 AM10/30/12

to Laurens Van Houtven, Python-Ideas

Sorry to chime in, but would this be a case where there could be the
syntax `yield to` ?

Kristján Valur Jónsson

unread,

Oct 30, 2012, 12:05:45 PM10/30/12

to Guido van Rossum, Richard Oudkerk, python...@python.org

> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-
> bounces+kristjan=ccpgam...@python.org] On Behalf Of Guido van
> Rossum
> Sent: 29. október 2012 16:35
> To: Richard Oudkerk
> Cc: python...@python.org
> Subject: Re: [Python-ideas] Async API: some code to review
> > It is a common pattern to have multiple threads/processes trying to
> > accept connections on an single listening socket, so it would be
> > unfortunate to disallow that.
>
> Ah, but that will work -- each thread has its own pollster, event loop and
> scheduler and collection of tasks. And listening on a socket is a pretty special
> case anyway -- I imagine we'd build a special API just for that purpose.
>

I don't think he meant actual "threads" but rather thread in the context of coroutines.
in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal.

We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense. The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked. Such errors are suprising and hard to debug.

K

Steve Dower

unread,

Oct 30, 2012, 12:27:37 PM10/30/12

to Greg Ewing, python...@python.org

Greg Ewing wrote:
> Steve Dower wrote:
>> For example, (library) code that needs a socket to be ready can ask
>> the current scheduler if it can do "select([sock], [], [])",
>
> I think you're mixing up the scheduler and event loop layers here. If the scheduler
> is involved in this at all, it would only be to pass the request on to the event loop.

Could you clarify for me what goes into each layer? I've been treating "scheduler" and "event loop" as more-or-less synonyms (I see an event loop as one possible implementation of a scheduler).

If you consider the scheduler to be the part that calls __next__() on the generator and sets up callbacks, that is implemented in my _Awaiter class, and should never need to be touched.

Possibly the difference in terminology comes out because I'm not treating I/O specially? As far as wattle is concerned, I/O is just another operation that will eventually call Future.set_result(). I've tried to capture this in my write-up: https://bitbucket.org/stevedower/wattle/wiki/Proposal

Cheers,
Steve

Steve Dower

unread,

Oct 30, 2012, 12:32:19 PM10/30/12

to Greg Ewing, python...@python.org

Greg Ewing wrote:
> Steve Dower wrote:
>
>> From my point of view, IOCP fits in very well provided the callbacks
>> (which will run in the IOCP thread pool) are only used to unblock tasks.
>
> Is it really necessary to have a separate thread just to handle unblocking tasks?
> That thread will have very little to do, so it could just as well run the tasks too,
> couldn't it?

In the C10k problem (which seems to keep coming up as our "goal") that thread will have a lot to do.

I would expect that most actual users of this API could keep running on that thread without issue, but since it is OS managed and belongs to a pool, the chances of deadlocking are much higher than on a 'real' CPU thread. Limiting its work to unblocking at least prevents the end developer from having to worry about this.

Cheers,
Steve

Guido van Rossum

unread,

Oct 30, 2012, 12:40:18 PM10/30/12

to Kristján Valur Jónsson, Richard Oudkerk, python...@python.org

[Richard Oudkerk (?)]

>> > It is a common pattern to have multiple threads/processes trying to
>> > accept connections on an single listening socket, so it would be
>> > unfortunate to disallow that.

[Guido]

>> Ah, but that will work -- each thread has its own pollster, event loop and
>> scheduler and collection of tasks. And listening on a socket is a pretty special
>> case anyway -- I imagine we'd build a special API just for that purpose.

On Tue, Oct 30, 2012 at 9:05 AM, Kristján Valur Jónsson
<kris...@ccpgames.com> wrote:
> I don't think he meant actual "threads" but rather thread in the context of coroutines.

(Yes, we figured that out already. :-)

> in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal.

What kind of time savings are we talking about? I imagine that the
accept() loop I put in tulip/echosvr.py is fast enough in terms of
response time (latency) -- throughput would seem the more important
measure (and I have no idea of this yet).
http://code.google.com/p/tulip/source/browse/echosvr.py#37

> We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense. The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked. Such errors are suprising and hard to debug.

That's a good point. It should either cause an immediate, clear
exception, or interleave the data without compromising integrity of
the scheduler or the app.

--
--Guido van Rossum (python.org/~guido)

Kristján Valur Jónsson

unread,

Oct 30, 2012, 12:11:40 PM10/30/12

to Greg Ewing, python...@python.org

> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-

> bounces+kristjan=ccpgam...@python.org] On Behalf Of Greg Ewing
> Sent: 30. október 2012 05:10
> To: python...@python.org
> Subject: Re: [Python-ideas] non-blocking buffered I/O
wrote:
>
> > From my point of view, IOCP fits in very well provided the callbacks
> > (which will run in the IOCP thread pool) are only used to unblock tasks.
>
> Is it really necessary to have a separate thread just to handle unblocking
> tasks? That thread will have very little to do, so it could just as well run the
> tasks too, couldn't it?

StacklessIO (which is an IOCP implementation for stackless) uses callbacks on an arbitrary thread (in practice a worker thread from window's own threadpool that it keeps for such things) to unblock tasklets. You don't want to do any significant work on such a thread because it is used for other stuff by the system.

By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient.

K

Guido van Rossum

unread,

Oct 30, 2012, 1:34:12 PM10/30/12

to Laurens Van Houtven, Python-Ideas

On Tue, Oct 30, 2012 at 3:12 AM, Laurens Van Houtven <_...@lvh.cc> wrote:
> I've been following the PEP380-related threads and I've reviewed this stuff,
> while trying to do the protocols/transports PEP, and trying to glue the two
> together.

Thanks! I know it can't be easy to keep up with all the threads (and
now code repos).

> The biggest difference I can see is that protocols as they've been discussed
> are "pull": they get called when some data arrives. They don't know how much
> data there is; they just get told "here's some data". The obvious difference
> with the API in, eg:
>
> https://code.google.com/p/tulip/source/browse/sockets.py#56
>
> ... is that now I have to tell a socket to read n bytes, which "blocks" the
> coroutine, then I get some data.

Yes. But do note that sockets.py is mostly a throw-away example
written to support the only style I am familiar with -- synchronous
reads and writes. My point in writing this particular set of
transports is that I want to take existing synchronous code (e.g. a
threaded server built using the stdlib's
socketserver.ThreadingTCPServer class) and make minimal changes to the
protocol logic to support async operation -- those minimal changes
should boil down to using a different way to set up a connection or a
listening socket or constructing a stream from a socket, and putting
"yield from" in front of the blocking operations (recv(), send(), and
the read/readline/write operations on the streams.

I'm still looking for guidance from Twisted and Tornado (and you!) to
come up with better abstractions for transports and protocols. The
underlying event loop *does* support a style where an object registers
a callback function once which is called repeatedly, as long as the
socket is readable (or writable, depending on the registration call).

> Now, there doesn't have to be an issue; you could simply say:
>
> data = yield from s.recv(4096) # that's the magic number usually right
> proto.data_received(4096)

(Off-topic: ages ago I determined that the optimal block size is
actually 8192. But for all I know it is 256K these days. :-)

> It seems a bit boilerplatey, but I suppose that eventually could be hidden
> away.
>
> But this style is pervasive, for example that's how reading by lines works:
>
> https://code.google.com/p/tulip/source/browse/echosvr.py#20

Right -- again, this is all geared towards making it palatable for
people used to write synchronous code (either single-threaded or
multi-threaded), not for people used to Twisted.

> While I'm not a big fan (I may be convinced if I see a protocol test that
> looks nice);

Check out urlfetch() in main.py:
http://code.google.com/p/tulip/source/browse/main.py#39

For sure, this isn't "pretty" and it should be rewritten using more
abstraction -- I only wrote the entire thing as a single function
because I was focused on the scheduler and event loop. And it is
clearly missing a buffering layer for writing (it currently uses a
separate send() call for each line of the HTTP headers, blech). But it
implements a fairly complex (?) protocol and it performs well enough.

> I'm just wondering if there's any point in trying to write the
> pull-style protocols when this works quite differently.

Perhaps you could try to write some pull-style transports and
protocols for tulip to see if anything's missing from the scheduler
and eventloop APIs or implementations? I'd be happy to rename
sockets.py to push_sockets.py so there's room for a competing
pull_sockets.py, and then we can compare apples to apples.

(Unlike the yield vs. yield-from issue, where I am very biased, I am
not biased about push vs. pull style. I just coded up what I was most
familiar with first.)

> Additionally, I'm not sure if readline belongs on the socket.

It isn't -- it is on the BufferedReader, which wraps around the socket
(or other socket-like transport, like SSL). This is similar to the way
the stdlib socket.socket class has a makefile() method that returns a
stream wrapping the socket.

> I understand the simile with files, though.

Right, that's where I've gotten most of my inspiration. I figure they
are a good model to lure unsuspecting regular Python users in. :-)

> With the coroutine style I could see how the
> most obvious fit would be something like tornado's read_until, or an
> as_lines that essentially calls read_until repeatedly. Can the delimiter for
> this be modified?

You can write your own BufferedReader, and if this is a common pattern
we can make it a standard API. Unlike the SocketTransport and
SslTransport classes, which contain various I/O hacks and integrate
tightly with the polling capability of the eventloop, I consider
BufferedReader plain user code. Antoine also hinted that with not too
many changes we could reuse the existing buffering classes in the
stdlib io module, which are implemented in C.

> My main syntactic gripe is that when I write @inlineCallbacks code or
> monocle code or whatever, when I say "yield" I'm yielding to the reactor.
> That makes sense to me (I realize natural language arguments don't always
> make sense in a programming language context). "yield from" less so (but
> okay, that's what it has to look like). But this just seems weird to me:
>
> yield from trans.send(line.upper())
>
> Not only do I not understand why I'm yielding there in the first place (I
> don't have to wait for anything, I just want to push some data out!), it
> feels like all of my yields have been replaced with yield froms for no
> obvious reason (well, there are reasons, I'm just trying to look at this
> naively).

Are you talking about yield vs. yield-from here, or about the need to
suspend every write? Regarding yield vs. yield-from, please squint and
get used to seeing yield-from everywhere -- the scheduler
implementation becomes *much* simpler and *much* faster using
yield-from, so much so that there really is no competition.

As to why you would have to suspend each time you call send(), that's
mostly just an artefact of the incomplete example -- I didn't
implement a BufferedWriter yet. I also have some worries about a task
producing data at a rate faster than the socket can drain it from the
buffer, but in practice I would probably relent and implement a
write() call that returns immediately and should *not* be used with
yield-from. (Unfortunately you can't have a call that works with or
without yield-from.) I think there's a throttling mechanism in Twisted
that can probably be copied here.

> I guess Twisted gets away with this because of deferred chaining: that one
> deferred might have tons of callbacks in the background, many of which also
> doing IO operations, resulting in a sequence of asynchronous operations that
> only at the end cause the generator to be run some more.
>
> I guess that belongs in a different thread, though. Even, then, I'm not sure
> if I'm uncomfortable because I'm seeing something different from what I'm
> used to, or if my argument from English actually makes any sense whatsoever.
>
> Speaking of protocol tests, what would those look like? How do I yell, say,
> "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
> transport, and call the handler with that? (I realize it's early days to be
> thinking that far ahead; I'm just trying to figure out how I can contribute
> a good protocol definition to all of this).

Actually I think the ease of writing tests should definitely be taken
into account when designing the APIs here. In the Zope world, Jim
Fulton wrote a simple abstraction for networking code that explicitly
provides for testing: http://packages.python.org/zc.ngi/ (it also
supports yield-style callbacks, similar to Twisted's inlineCallbacks).

I currently don't have any tests, apart from manually running main.py
and checking its output. I am a bit hesitant to add unit tests in this
early stage, because keeping the tests passing inevitably slows down
the process of ripping apart the API and rebuilding it in a different
way -- something I do at least once a day, whenever I get feedback or
a clever thought strikes me or something annoying reaches my trigger
level.

But I should probably write at least *some* tests, I'm sure it will be
enlightening and I will end up changing the APIs to make testing
easier. It's in the TODO.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 30, 2012, 1:47:24 PM10/30/12

to Kristján Valur Jónsson, python...@python.org

On Tue, Oct 30, 2012 at 9:11 AM, Kristján Valur Jónsson
<kris...@ccpgames.com> wrote:
> By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient.

In which Python version? The GIL has been redesigned at least once.
Also the latency (not necessarily cost) to acquire the GIL varies by
the sys.setswitchinterval setting. (Actually the more responsive you
make it, the more it will cost you in overall performance.)

I do think that using the pending call mechanism is the right solution here.

--
--Guido van Rossum (python.org/~guido)

Richard Oudkerk

unread,

Oct 30, 2012, 1:50:53 PM10/30/12

to python...@python.org

On 30/10/2012 4:40pm, Guido van Rossum wrote:
> What kind of time savings are we talking about? I imagine that the
> accept() loop I put in tulip/echosvr.py is fast enough in terms of
> response time (latency) -- throughput would seem the more important
> measure (and I have no idea of this yet).
> http://code.google.com/p/tulip/source/browse/echosvr.py#37

With Windows overlapped I/O I think you can get substantially better
throughput by starting many AcceptEx() calls in parallel. (For bonus
points you can also recycle the accepted connections using DisconnectEx().)

Even so, Windows socket code always seems to be much slower than the
equivalent on Linux.

--
Richard

Guido van Rossum

unread,

Oct 30, 2012, 2:10:10 PM10/30/12

to Richard Oudkerk, python...@python.org

On Tue, Oct 30, 2012 at 10:50 AM, Richard Oudkerk <shib...@gmail.com> wrote:
> On 30/10/2012 4:40pm, Guido van Rossum wrote:
>>
>> What kind of time savings are we talking about? I imagine that the
>> accept() loop I put in tulip/echosvr.py is fast enough in terms of
>> response time (latency) -- throughput would seem the more important
>> measure (and I have no idea of this yet).
>> http://code.google.com/p/tulip/source/browse/echosvr.py#37

> With Windows overlapped I/O I think you can get substantially better
> throughput by starting many AcceptEx() calls in parallel. (For bonus points
> you can also recycle the accepted connections using DisconnectEx().)

Hm... I already have on my list that the transports should probably be
platform dependent. So this would suggest that the standard accept
loop should be abstracted as a method on the transport object, right?

> Even so, Windows socket code always seems to be much slower than the
> equivalent on Linux.

Is this Python sockets code or are you also talking about other
languages, like C++?

--
--Guido van Rossum (python.org/~guido)

Antoine Pitrou

unread,

Oct 30, 2012, 3:31:01 PM10/30/12

to python...@python.org

On Tue, 30 Oct 2012 10:34:12 -0700

Guido van Rossum <gu...@python.org> wrote:
> >

> > Speaking of protocol tests, what would those look like? How do I yell, say,
> > "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
> > transport, and call the handler with that? (I realize it's early days to be
> > thinking that far ahead; I'm just trying to figure out how I can contribute
> > a good protocol definition to all of this).
>
> Actually I think the ease of writing tests should definitely be taken
> into account when designing the APIs here.

+11 !

Regards

Antoine.

Guido van Rossum

unread,

Oct 30, 2012, 4:24:23 PM10/30/12

to Richard Oudkerk, Python-Ideas

On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk <shib...@gmail.com> wrote:
> The difference in speed between AF_INET sockets and pipes on Windows is much
> bigger than the difference between AF_INET sockets and pipes on Unix.
>
> (Who knows, maybe it is just my firewall which is causing the slowdown...)

Here's another unscientific benchmark: I wrote a stupid "http" server
(stupider than echosvr.py actually) that accepts HTTP requests and
responds with the shortest possible "200 Ok" response. This should
provide an adequate benchmark of how fast the event loop, scheduler,
and transport are at accepting and closing connections (and reading
and writing small amounts). On my linux box at work, over localhost,
it seems I can handle 10K requests (sent using 'ab' over localhost) in
1.6 seconds. Is that good or bad? The box has insane amounts of memory
and 12 cores (?) and rates at around 115K pystones.

(I tried to repro this on my Mac, but I am running into problems,
perhaps due to system limits.)

Carlo Pires

unread,

Oct 30, 2012, 4:33:12 PM10/30/12

to Guido van Rossum, Richard Oudkerk, Python-Ideas

2012/10/30 Guido van Rossum <gu...@python.org>

Here's another unscientific benchmark: I wrote a stupid "http" server
(stupider than echosvr.py actually) that accepts HTTP requests and
responds with the shortest possible "200 Ok" response. This should
provide an adequate benchmark of how fast the event loop, scheduler,
and transport are at accepting and closing connections (and reading
and writing small amounts). On my linux box at work, over localhost,
it seems I can handle 10K requests (sent using 'ab' over localhost) in
1.6 seconds. Is that good or bad? The box has insane amounts of memory
and 12 cores (?) and rates at around 115K pystones.

Take a look at

http://nichol.as/benchmark-of-python-web-servers

It is a bit outdated but can be useful to get some insight.
--

Carlo Pires

Greg Ewing

unread,

Oct 30, 2012, 4:37:44 PM10/30/12

to python...@python.org

Kristján Valur Jónsson wrote:
> in StacklessIO (our custom sockets lib for stackless) multiple tasklets can
> have an "accept" pending on a socket, so that when multiple connections arrive,
> wakeup time is minimal.

With sufficiently cheap tasks, there's another way to approach
this: one task is dedicated to accepting connections from the
socket, and it spawns a new task to handle each connection.

--
Greg

Antoine Pitrou

unread,

Oct 30, 2012, 5:30:20 PM10/30/12

to python...@python.org

On Tue, 30 Oct 2012 13:24:23 -0700

Guido van Rossum <gu...@python.org> wrote:

> On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk <shib...@gmail.com> wrote:
> > The difference in speed between AF_INET sockets and pipes on Windows is much
> > bigger than the difference between AF_INET sockets and pipes on Unix.
> >
> > (Who knows, maybe it is just my firewall which is causing the slowdown...)
>
> Here's another unscientific benchmark: I wrote a stupid "http" server
> (stupider than echosvr.py actually) that accepts HTTP requests and
> responds with the shortest possible "200 Ok" response. This should
> provide an adequate benchmark of how fast the event loop, scheduler,
> and transport are at accepting and closing connections (and reading
> and writing small amounts). On my linux box at work, over localhost,
> it seems I can handle 10K requests (sent using 'ab' over localhost) in
> 1.6 seconds. Is that good or bad? The box has insane amounts of memory
> and 12 cores (?) and rates at around 115K pystones.

It sounds both good and useless to me :)

Regards

Antoine.

Paul Colomiets

unread,

Oct 30, 2012, 5:45:46 PM10/30/12

to Richard Oudkerk, python...@python.org

Hi Richard,

On Mon, Oct 29, 2012 at 3:13 PM, Richard Oudkerk <shib...@gmail.com> wrote:
> On 28/10/2012 11:52pm, Guido van Rossum wrote:
>>
>> I'm most interested in feedback on the design of polling.py and
>> scheduling.py, and to a lesser extent on the design of sockets.py;
>> main.py is just an example of how this style works out in practice.
>
>
> What happens if two tasks try to do a read op (or two tasks try to do a
> write op) on the same file descriptor? It looks like the second one to do
> scheduling.block_r(fd) will cause the first task to be forgotten, causing
> the first task to block forever.
>
> Shouldn't there be a list of pending readers and a list of pending writers
> for each fd?
>

There is another approach to handle this. You create a dedicated
coroutine which does writing (or reading). And if other coroutine
needs to write, it puts data into a queue (or channel), and wait until
writer coroutine picks it up. This way you don't care about atomicity
of writes, and a lot of other things.

This approach is similar to what Greg Ewing proposed for handling
accept() recently.

--
Paul

Richard Oudkerk

unread,

Oct 30, 2012, 6:01:19 PM10/30/12

to python...@python.org

On 30/10/2012 8:24pm, Guido van Rossum wrote:
> Here's another unscientific benchmark: I wrote a stupid "http" server
> (stupider than echosvr.py actually) that accepts HTTP requests and
> responds with the shortest possible "200 Ok" response. This should
> provide an adequate benchmark of how fast the event loop, scheduler,
> and transport are at accepting and closing connections (and reading
> and writing small amounts). On my linux box at work, over localhost,
> it seems I can handle 10K requests (sent using 'ab' over localhost) in
> 1.6 seconds. Is that good or bad? The box has insane amounts of memory
> and 12 cores (?) and rates at around 115K pystones.

I tried the simple single threaded benchmark below on my laptop.

| Connections/sec
---------------------------------------+-----------------
Linux | 6000-11000
Linux in a VM (with 1 cpu assigned) | 4600
Windows | 1400

On Windows this sometimes failed with:

OSError: [WinError 10055] An operation on a socket could not
be performed because the system lacked sufficient buffer
space or because a queue was full

import socket, time, sys, argparse

N = 10000

def server():
l = socket.socket()
l.bind(('127.0.0.1', 0))
l.listen(100)
print('listening on port', l.getsockname()[1])
while True:
a, _ = l.accept()
data = a.recv(20)
a.sendall(data.upper())
a.close()

def client(port):
start = time.time()
for i in range(N):
with socket.socket() as c:
c.connect(('127.0.0.1', port))
c.sendall(b'foo')
res = c.recv(20)
assert res == b'FOO'
c.close()
elapsed = time.time() - start
print("elapsed=%s, connections/sec=%s" % (elapsed, N/elapsed))

parser = argparse.ArgumentParser()
parser.add_argument('--port', type=int, default=None,
help='port to connect to')
args = parser.parse_args()

if args.port is not None:
client(args.port)
else:
server()

--
Richard

Rene Nejsum

unread,

Oct 31, 2012, 2:16:47 AM10/31/12

to Paul Colomiets, Richard Oudkerk, python...@python.org

>
> There is another approach to handle this. You create a dedicated
> coroutine which does writing (or reading). And if other coroutine
> needs to write, it puts data into a queue (or channel), and wait until
> writer coroutine picks it up. This way you don't care about atomicity
> of writes, and a lot of other things.

I support this idea, IMHO it's by far the easiest (or least problematic)
way to handle the complexity of concurrency.

What's the general position on monkey patching existing libs ? This
might not be possible with the above ?

/rene

Kristján Valur Jónsson

unread,

Oct 31, 2012, 5:29:43 AM10/31/12

to Guido van Rossum, python...@python.org

> -----Original Message-----
> From: gvanr...@gmail.com [mailto:gvanr...@gmail.com] On Behalf
> Of Guido van Rossum

> Sent: 30. október 2012 17:47
> To: Kristján Valur Jónsson
> Cc: python...@python.org
> Subject: Re: [Python-ideas] non-blocking buffered I/O
>

> On Tue, Oct 30, 2012 at 9:11 AM, Kristján Valur Jónsson
> <kris...@ccpgames.com> wrote:
> > By the way: We found that acquiring the GIL by a random external thread
> in response to the IOCP to wake up tasklets was incredibly expensive. I
> spent a lot of effort figuring out why that is and found no real answer. The
> mechanism we now use is to let the external worker thread schedule a
> "pending call" which is serviced by the main thread at the earliest
> opportunity. Also, the main thread is interrupted if it is doing a sleep. This is
> much more efficient.
>
> In which Python version? The GIL has been redesigned at least once.
> Also the latency (not necessarily cost) to acquire the GIL varies by the
> sys.setswitchinterval setting. (Actually the more responsive you make it, the
> more it will cost you in overall performance.)
>
> I do think that using the pending call mechanism is the right solution here.

I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :)

Anyway I don't think the issue is much affected by the particular GIL implementation.
Alternative a)
Callback comes on arbitrary thread
arbitrary thread calls PyGILState_Ensure
(This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired)
arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet
arbitrary thread calls PyGILState_Release

For whatever reason, this approach _increased CPU usage_ on a loaded server. Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok. I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur. I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere. This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu.

Alternative b)
Callback comes on arbitrary thread
external thread callse PyEval_SchedulePendingCall()
This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately.
external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout.
Main thread wakes up from its sleep (if it was sleeping).
Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait.

In reality, StacklessIO uses a slight variation of the above:

StacklessIO dispatch system
Callback comes on arbitrary thread
external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread. This is protected by its own lock, and doesn't need the GIL.
external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer
external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout.
If main thread is sleeping: Main thread wakes up from its sleep
Immediately at after sleeping, the main thread will 'tick' the dispatch queue
After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work. If not, it may continue sleeping.
Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue. This may be a no-op if the main thread was sleeping and was already ticked.

The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up. A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu.

The reason I'm mentioning this here is that this is important. We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result. Perhaps things can be done to improve this. Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field.

Glyph

unread,

Oct 31, 2012, 6:10:18 AM10/31/12

to Guido van Rossum, Python-Ideas

Finally getting around to this one...

I am sorry if I'm repeating any criticism that has already been rehashed in this thread. There is really a deluge of mail here and I can't keep up with it. I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week.

To make a long story short, my main points here are:

I think tulip unfortunately has a lot of the problems I tried to describe in earlier messages,
it would be really great if we could have a core I/O interface that we could use for interoperability with Twisted before bolting a requirement for coroutine trampolines on to everything,
twisted-style protocol/transport separation is really important and this should not neglect it. As I've tried to illustrate in previous messages, an API where applications have to call send() or recv() is just not going to behave intuitively in edge cases or perform well,
I know it's a prototype, but this isn't such an unexplored area that it should be developed without TDD: all this code should both have tests and provide testing support to show how applications that use it can be tested
the scheduler module needs some example implementation of something like Twisted's gatherResults for me to critique its expressiveness; it looks like it might be missing something in the area of one task coordinating multiple others but I can't tell

On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at python.org> wrote:

The pollster has a very simple API: add_reader(fd, callback, *args),

add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and
poll(timeout) -> list of events. (fd means file descriptor.) There's
also pollable() which just checks if there are any fds registered. My
implementation requires fd to be an int, but that could easily be
extended to support other types of event sources.

I don't see how that is. All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources). Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer? If so, how would application code know how to construct something else.

I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong).

add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. echosvr.py) needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport. These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to.

It looks like you've already addressed the fact that some transports need to be platform-specific. That's not quite accurate, unless you take a very broad definition of "platform". In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past).

You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O. This is why I keep talking about IOCP. It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD. (GUI libraries often do this because they have to support Windows and therefore IOCP.) Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance. This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-).

More importantly, concretely tying everything to sockets is just bad design. You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()). You want to be able to able to operate on these things in unit tests without involving any actual file descriptors or syscalls. The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress echosvr.py down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of <http://twistedmatrix.com/trac/>. It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab.

Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders. One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket. But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing.

The event loop has two basic ways to register callbacks:
call_soon(callback, *args) causes callback(*args) to be called the
next time the event loop runs; call_later(delay, callback, *args)
schedules a callback at some time (relative or absolute) in the
future.

"relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :).

sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py

This implements some internet primitives using the APIs in
scheduling.py (including block_r() and block_w()). I call them
transports but they are different from transports Twisted; they are
closer to idealized sockets. SocketTransport wraps a plain socket,
offering recv() and send() methods that must be invoked using yield
from.

I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read.

(But most importantly, block_r and block_w are insufficient as primitives; you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.)

SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
stdlib ssl sockets have good async support!).

stdlib ssl sockets have async support that makes a number of UNIX-y assumptions. The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop. This plagued us for many years within Twisted and has only relatively recently been fixed: <http://tm.tl/593>.

Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe. Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout.

It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows. And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets. But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this.

I don't particularly care about the exact abstractions in this module;
they are convenient and I was surprised how easy it was to add SSL,
but still these mostly serve as somewhat realistic examples of how to
use scheduling.py.

This is where I think we really differ.

I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones). However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python. Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it. I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks. (Except for the exit-at-a-distance problem, which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators? <http://twistedmatrix.com/trac/ticket/4157>)

What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer.

Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks. However, lots of Python programmers are going to use what you come up with. They'd use it even if it didn't really work, just because it's bundled in and it's convenient. But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads.

What I think is really very important in the design of this new system is to present an API whereby:

if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C library, they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work,
if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program,
if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much,
if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and
if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted.

As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate.

It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed. In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working".

This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators. I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated.

task.unblock() is a method; protocol.data_received is a method. Both can be invoked at the same level by an event loop. Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for.

I'm most interested in feedback on the design of polling.py and
scheduling.py, and to a lesser extent on the design of sockets.py;
main.py is just an example of how this style works out in practice.

It looks to me like there's a design error in scheduling.py with respect to coordinating concurrent operations. If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done. I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work.

Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code?

Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that. It looks like the first task to call it will just hang forever, and the second one will "win"? What are the intended semantics?

Speaking from the perspective of I/O scheduling, it will also be thrashing any stateful multiplexor with a ton of unnecessary syscalls. A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued. tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon. Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request.

Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit.

-glyph

Glyph

unread,

Oct 31, 2012, 6:20:29 AM10/31/12

to Greg Ewing, Python-Ideas

On Oct 29, 2012, at 5:25 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

> Andrew Svetlov wrote:
>
>> 0MQ socket has no file descriptor at all, it's just pointer to some
>> unspecified structure.
>> So 0MQ has own *poll* function which can process that sockets as well
>> as file descriptors.
>
> Aaargh... yet another event loop that wants to rule
> the world. This is not good.

As a wise man once said, "everybody wants to rule the world".

All event loops have their own run() API, and expect to be on top of everything, driving the loop. This is one of the central principles of Twisted's design; by not attempting to directly take control of any loop, and providing a high-level wrapper around run, and an API that would accommodate every wacky wrapper around poll and select and kqueue and GetQueuedCompletionStatus, we could be a single loop that everything can use as an API and get the advantages of whatever event driven thing is popular this week.

You can't accomplish this by trying to force other loops to play by your rules; rather, accommodate and pave over their peculiarities and it'll be your API that their users actually write to.

(In the land of Mordor, where the shadows lie.)

-glyph

Kristján Valur Jónsson

unread,

Oct 31, 2012, 6:07:10 AM10/31/12

to Guido van Rossum, Richard Oudkerk, python...@python.org

> -----Original Message-----
> From: gvanr...@gmail.com [mailto:gvanr...@gmail.com] On Behalf
> Of Guido van Rossum

> Sent: 30. október 2012 16:40
> To: Kristján Valur Jónsson

> Cc: Richard Oudkerk; python...@python.org
> Subject: Re: [Python-ideas] Async API: some code to review
>

> What kind of time savings are we talking about? I imagine that the
> accept() loop I put in tulip/echosvr.py is fast enough in terms of response
> time (latency) -- throughput would seem the more important measure (and I
> have no idea of this yet).
> http://code.google.com/p/tulip/source/browse/echosvr.py#37
>

To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important.
Looking at your code:
c

a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available. I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible. You either go for one or the other on windows, at least. in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing. It might be best to stick to one system.
b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop.

I this case, t will not start executing until after going around the loop twice. A new connection can only be accepted each loop. Imagine two http requests coming in simultaneously, at t=0

The sequence of operations will then be this (assuming FIFO scheduling)
main loop runs
accept 1 returns. task 1 created. accept 2 scheduled
main loop runs making task 1 and accep2 runnable
task 1 runs. does processing. performs send, and blocks
accept2 returns, task2 created
main loop runs, making task2 runnable
task2 runs, does processing, performs send.

Contributing to latency in this scenario are all the "main loop" runs. Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved.

An alternative something like this:
def loop():
while True:
conn, addr = yield from listener.accept()
handler(conn, addr)
for I in range(n_handlers):
t = scheduling.Task(loop)
t.start()

Here, events will be different:
main loop runs, accept 1 and accept 2 runnable
accept 1 returns, stariting handler, processing and blocking on send
accept 2 returns, starting handler, processing, and blocking on send

As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client.

In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness..

Cheers

Guido van Rossum

unread,

Oct 31, 2012, 11:01:03 AM10/31/12

to Kristján Valur Jónsson, python...@python.org

Modern CPUs are black boxes full of magic. I'm not too surprised that
running Python code on multiple threads incurs some kind of overhead
that keeping the Python interpreter in one thread avoids.

On Wed, Oct 31, 2012 at 2:29 AM, Kristján Valur Jónsson

--
--Guido van Rossum (python.org/~guido)

Kristján Valur Jónsson

unread,

Oct 31, 2012, 11:10:10 AM10/31/12

to Guido van Rossum, python...@python.org

> -----Original Message-----
> From: gvanr...@gmail.com [mailto:gvanr...@gmail.com] On Behalf
> Of Guido van Rossum
> Sent: 31. október 2012 15:01
> To: Kristján Valur Jónsson
> Cc: python...@python.org
> Subject: Re: [Python-ideas] non-blocking buffered I/O
>

> Modern CPUs are black boxes full of magic. I'm not too surprised that running
> Python code on multiple threads incurs some kind of overhead that keeping
> the Python interpreter in one thread avoids.
>

Ah, but I forgot to mention one weird thing:
If we used a pool of threads for the callbacks, and pre-initalized those threads with python states, and then acquired the GIL using
PyEval_RestoreThread(), then this overhead went away.
It was only the dynamic tread state acquired using PyGilState_Ensure() that caused cpu overhead.
Using the fixed pool was not acceptable in the long run, in particular we din't want to complicate things to another level by adding a thread pool manger to the whole thing when the OS is fully capable of providing an external callback thread.

I regret not spending more time on this and to be able to provide an actual performance analysis and fix. Instead I have to be that weird old man in the tavern uttering inscrutable warnings that no young adventurer pays any attention to :)

K

Guido van Rossum

unread,

Oct 31, 2012, 11:37:01 AM10/31/12

to Kristján Valur Jónsson, Richard Oudkerk, python...@python.org

Ok, this is a good point: the more you can do without having to go
through the main loop again the better.

I already took this to heart in my recent rewrites of recv() and
send() -- they try to read/write the underlying socket first, and if
it works, the task isn't suspended; only if they receive EAGAIN or
something similar do they block the task and go back to the top.

In fact, Listener.accept() does the same thing -- meaning the loop can
go around many times without blocking a single time. (The listening
socket is in non-blocking mode so accept() will raise EAGAIN when
there *isn't* another client connection ready immediately.)

This is also one of the advantages of yield-from; you *never* go back
to the end of the ready queue just to invoke another layer of
abstraction. (Steve tries to approximate this by running the generator
immediately until the first yield, but the caller still ends up
suspending to the scheduler, because they are using yield which
doesn't avoid the suspension, unlike yield-from.)

--Guido

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 31, 2012, 11:51:35 AM10/31/12

to Guido van Rossum, Kristján Valur Jónsson, Richard Oudkerk, python...@python.org

Guido van Rossum wrote:
> This is also one of the advantages of yield-from; you *never* go back to the end
> of the ready queue just to invoke another layer of abstraction. (Steve tries to
> approximate this by running the generator immediately until the first yield, but
> the caller still ends up suspending to the scheduler, because they are using
> yield which doesn't avoid the suspension, unlike yield-from.)

This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.)

The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour).

Cheers,
Steve

Kristján Valur Jónsson

unread,

Oct 31, 2012, 11:59:18 AM10/31/12

to Guido van Rossum, Richard Oudkerk, python...@python.org

> -----Original Message-----
> From: gvanr...@gmail.com [mailto:gvanr...@gmail.com] On Behalf
> Of Guido van Rossum
> Sent: 31. október 2012 15:37
> To: Kristján Valur Jónsson
> Cc: Richard Oudkerk; python...@python.org
> Subject: Re: [Python-ideas] Async API: some code to review
>

> Ok, this is a good point: the more you can do without having to go through
> the main loop again the better.
>
> I already took this to heart in my recent rewrites of recv() and
> send() -- they try to read/write the underlying socket first, and if it works,
> the task isn't suspended; only if they receive EAGAIN or something similar do
> they block the task and go back to the top.

Yes, this is possible for non-blocking style IO. However, for IO architectures that are based on completions, you can't always mix and match.
On windows, for example it is complicated to do because of how AcceptEx works. I recall socket properties, overlapped property and other things interfering.
I also recall testing the use of first trying non-blocking IO (for accept and send/recv) and then resorting to an IOCP call. If I recall correctly, the added overhead of trying a non-blocking call for the usual case of it failing was detrimental to the whole exercise. the non-blocking IO calls took non-trivial time to complete.

The approach of having multiple "threads" doing accept also avoids the delay required to dispatch the request from the accepting thread to the worker thread.

> In fact, Listener.accept() does the same thing -- meaning the loop can go

> This is also one of the advantages of yield-from; you *never* go back to the
> end of the ready queue just to invoke another layer of abstraction.

My experience with this stuff is of course based on stackless/gevent style programming, so some of it may not apply :)
Personally, I feel that things should just magically work, from the programmer's point of view, rather than have to manually leave a trace of breadcrumbs through the stack using "yield" constructs. But that's just me.

K

Guido van Rossum

unread,

Oct 31, 2012, 5:18:28 PM10/31/12

to Steve Dower, Richard Oudkerk, python...@python.org

On Wed, Oct 31, 2012 at 8:51 AM, Steve Dower <Steve...@microsoft.com> wrote:
> Guido van Rossum wrote:
>> This is also one of the advantages of yield-from; you *never* go back to the end
>> of the ready queue just to invoke another layer of abstraction. (Steve tries to
>> approximate this by running the generator immediately until the first yield, but
>> the caller still ends up suspending to the scheduler, because they are using
>> yield which doesn't avoid the suspension, unlike yield-from.)
>
> This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.)

I think you are missing the point. Even if you don't make a roundtrip
through the queue, *each* yield statement, if it is executed at all,
must transfers control to the scheduler. What you're proposing is just
making the scheduler immediately resume the generator.

So, if you have a trivial task, like this:

@async
def trivial(x):
return x
yield # Unreachable, but makes it a generator

and a caller:

@async
caller():
foo = yield trivial(42)
print(foo)

then the call to trivial(42) returns a Future that already has the
result 42 set in it. But caller() still suspends to the scheduler,
yielding that Future. The scheduler can resume caller() immediately
but the damage (overhead) is done.

In contrast, in the yield-from world, we'd write this

def trivial(x):
return x
yield from () # Unreachable

def caller():
foo = yield from trivial(42)
print(foo)

where the latter expands roughly to the following, without reference
to the scheduler at all:

def caller():
_gen = trivial(42)
try:
while True:
_val = next(_gen)
yield _val
except StopIteration as _exc:
foo = _exc.value
print(foo)

The first next(gen) call raises StopIteration so the yield is never
reached -- the scheduler doesn't know that any of this is going in.
And there's no need to do anything special to advance the generator to
the first yield manually either.

(It's different of course when a generator is wrapped in a Task()
constructor. But that should be relatively rare.)

> The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour).

Just get with the program and use yield-from exclusively.

--
--Guido van Rossum (python.org/~guido)

Yury Selivanov

unread,

Oct 31, 2012, 5:31:02 PM10/31/12

to Guido van Rossum, Richard Oudkerk, python...@python.org

On 2012-10-31, at 5:18 PM, Guido van Rossum <gu...@python.org> wrote:

> @async
> def trivial(x):
> return x
> yield # Unreachable, but makes it a generator

FWIW, just a crazy comment: if we make @async decorator to clone
the code object of a passed function and set its (co_flags | 0x0020),
then any passed function becomes a generator, even if it doesn't
have yields/yield-froms ;)

-
Yury

Steve Dower

unread,

Oct 31, 2012, 5:31:58 PM10/31/12

to Guido van Rossum, Richard Oudkerk, python...@python.org

Guido van Rossum wrote:
> Just get with the program and use yield-from exclusively.

I didn't realise there was a "program" here, just a discussion about an API design. I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute.

When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance.

Cheers,
Steve

Andrew Svetlov

unread,

Oct 31, 2012, 5:34:02 PM10/31/12

to Yury Selivanov, Richard Oudkerk, python...@python.org

Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library.

--
Thanks,
Andrew Svetlov

Yury Selivanov

unread,

Oct 31, 2012, 5:41:51 PM10/31/12

to Andrew Svetlov, Richard Oudkerk, python...@python.org

On 2012-10-31, at 5:34 PM, Andrew Svetlov <andrew....@gmail.com> wrote:

> Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library.

I know that I sort of created an image for myself of
"a guy who solves any problem by patching opcodes on live code",
but don't worry, I'll never ever recommend such solutions for
stdlib/python :)

This is, however, a nice technique to rapidly prototype
and test interesting ideas.

Guido van Rossum

unread,

Oct 31, 2012, 5:51:47 PM10/31/12

to Steve Dower, Richard Oudkerk, python...@python.org

On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower <Steve...@microsoft.com> wrote:
> Guido van Rossum wrote:
>> Just get with the program and use yield-from exclusively.
>
> I didn't realise there was a "program" here, just a discussion about an API design.

Sorry, I left off a smiley. :-)

> I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute.

What about the usability argument? Don't you think users will be
confused by the need to use yield from some times and just yield other
times? Yes, they may be able to tell by looking up the definition and
checking how it is decorated, but that doesn't really help.

> When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance.

Understood. What exactly is it that makes Futures so ideal for your
current needs? Is it integration with threads?

Another tack: could you make use of tulip/polling.py? That doesn't use
generators of any form; it is meant as an integration point with other
styles of async programming (although I am not claiming that it is any
good in its current form -- this too is just a strawman to shoot
down).

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 31, 2012, 6:36:13 PM10/31/12

to Guido van Rossum, python...@python.org

Guido van Rossum wrote:
> On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower <Steve...@microsoft.com> wrote:
>> Guido van Rossum wrote:
>>> Just get with the program and use yield-from exclusively.
>>
>> I didn't realise there was a "program" here, just a discussion about an API
>> design.
>
> Sorry, I left off a smiley. :-)

Always a risk in email communication - no offence taken.

>> I've already raised my concerns with using yield from exclusively, but since
>> the performance argument trumps all of those then there is little more I can
>> contribute.
>
> What about the usability argument? Don't you think users will be confused by the
> need to use yield from some times and just yield other times? Yes, they may be
> able to tell by looking up the definition and checking how it is decorated, but
> that doesn't really help.

Users only ever _need_ to write yield. The only reason that wattle does not work with Python 3.2 is because of non-blank returns inside generators.

There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.

>> When a final design begins to stabilise, I will see how I can make use of it
>> in my own code. Until then, I'll continue using Futures, which are ideal for my
>> current needs. I won't be forcing 'yield from' onto my users until its usage is
>> clear and I can provide them with suitable guidance.
>
> Understood. What exactly is it that makes Futures so ideal for your current
> needs? Is it integration with threads?
>
> Another tack: could you make use of tulip/polling.py? That doesn't use
> generators of any form; it is meant as an integration point with other styles of
> async programming (although I am not claiming that it is any good in its current
> form -- this too is just a strawman to shoot down).

I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details.

We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread.

(* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.)

The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support. For Python, we are aiming for closer to the async/await model (which is also how we chose the names).

Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield.

There are three aspects of this that work better and result in cleaner code with wattle than with tulip:

- event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread. In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.)

- the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl).

- Future objects can be marshalled directly from Python into Windows, completing the interop story. Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type). Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs. At least with wattle, the user does not have to do anything different from any of their other @async functions.

Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.

Cheers,
Steve

Don Spaulding

unread,

Nov 1, 2012, 11:31:11 AM11/1/12

to Steve Dower, python...@python.org

On Wed, Oct 31, 2012 at 5:36 PM, Steve Dower <Steve...@microsoft.com> wrote:

Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.

Oh, what sad times are these when passing ruffians can say 'The Microsoft Way' at will to old developers. There is a pestilence upon this land! Nothing is sacred. Even those who arrange and design async APIs are under considerable hegemonic stress at this point in time.

/me crawls back under his rock.

Guido van Rossum

unread,

Nov 1, 2012, 11:44:48 AM11/1/12

to Steve Dower, python...@python.org

On Wed, Oct 31, 2012 at 3:36 PM, Steve Dower <Steve...@microsoft.com> wrote:
> Guido van Rossum wrote:
> There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.

Actually, it is not just optimization. The logic of the scheduler also
becomes much simpler.

> I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details.

Actually I wish you'd written this sooner. I don't know about you, but
my brain has a hard time understanding abstractions that are presented
without concrete use cases and implementations alongside; OTOH I
delight in taking a concrete mess and extract abstractions from it.
(The Twisted guys are also masters at this.)

So far I didn't really "get" the reasons you brought up for some of
complications you introduced (like multiple Future implementations).
Now I think I'm glimpsing your reasons.

> We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread.

Interesting. The lack of synchronous wrappers does seem a step back,
but is probably useful as a forcing function given the desire to keep
the UI responsive at all times.

> (* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.)
>
> The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support.

Erik Meijer introduced me to async/await on Elba two months ago. I was
very excited to recognize exactly what I'd done for NDB with @tasklet
and yield, supported by the type checking.

> For Python, we are aiming for closer to the async/await model (which is also how we chose the names).

If we weren't so reluctant to introduce new keywords in Python we
might introduce await as an alias for yield from in the future.

> Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield.

Very interesting. I'd love to see a much longer narrative on this.
(You can send it to me directly if you feel it would distract the list
or if you feel it's inappropriate to share widely. I'll keep it under
my hat as long as you say so.)

> There are three aspects of this that work better and result in cleaner code with wattle than with tulip:
>
> - event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread.

I think this is "fire-and-forget"? I.e. you initiate an action and
then just let it run until completion without ever checking the
result? In tulip you currently do that by wrapping it in a Task and
calling its start() method. (BTW I think I'm going to get rid of
start() -- creating a Task should just start it.)

> In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.)

Are you saying that this property (you don't wait for the result) is
required by the operation rather than an option for the user? I'm only
familiar with the latter -- e.g. I can imagine firing off an operation
that writes a log entry somewhere but not caring about whether it
succeeded -- but I would still make it *possible* to check on the
operation if the caller cares (what if it's a very important log
message?).

If there's no option for the caller, the API should present itself as
a regular function/method and the task-spawning part should be hidden
inside it -- I see no need for the caller to know about this.

What exactly do you mean by "reliably intercept this case" ? A
concrete example would help.

> - the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl).

Ok, so what is the API offered by the OS event loop? I really want to
make sure that tulip can interface with strange event loops, and this
may be the most concrete example so far -- and it may be an important
one.

> - Future objects can be marshalled directly from Python into Windows, completing the interop story.

What do you mean by marshalled here? Surely not the stdlib marshal
module. Do you just mean that Future objects can be recognized by the
foreign-function interface and wrapped by / copied into native Windows
8 datatypes?

I understand your event loop understands Futures? All of them? Or only
the ones of the specific type that it also returns?

> Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type).

I can't quite follow you here, probably due to lack of imagination on
my part. Can you help me with a (somewhat) concrete example?

> Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs.

Concrete example?

> At least with wattle, the user does not have to do anything different from any of their other @async functions.

This is because you can put type checks inside @async, which sees the
function object before it's called, rather than the scheduler, which
only sees what it returned, right? That's a trick I use in NDB as well
and I think tulip will end up requiring a decorator too -- but it will
just "mark" the function rather than wrap it in another one, unless
the function is not a generator (in which case it will probably have
to wrap it in something that is a generator). I could imagine a debug
version of the decorator that added wrappers in all cases though.

> Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.

No worries about that. I agree that we need concrete examples that
takes us beyond the world of sockets; it's just that sockets are where
most of the interest lies (Tornado is a webserver, Twisted is often
admired because of its implementations of many internet protocols,
people benchmark async frameworks on how many HTTP requests per second
they can serve) and I haven't worked with any type of GUI framework in
a very long time. (Kudos for trying your way Tk!)

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Nov 1, 2012, 12:44:45 PM11/1/12

to Guido van Rossum, python...@python.org

Guido van Rossum wrote:
> On Wed, Oct 31, 2012 at 3:36 PM, Steve Dower <Steve...@microsoft.com> wrote:
>> Guido van Rossum wrote:
>> There is only one reason to use 'yield from' and that is for the performance
> optimisation, which I do acknowledge and did observe in my own benchmarks.
>
> Actually, it is not just optimization. The logic of the scheduler also becomes
> much simpler.

I'd argue that it doesn't, it just happens that the implementation of 'yield from' in the interpreter happens to match the most common case. In any case, the affected area of code (which I haven't been calling 'scheduler', which seems to have caused some confusion elsewhere) only has to be written once and never touched again. It could even be migrated into C, which should significantly improve the performance. (In wattle, this is the _Awaiter class.)

>> I know I've been vague about our intended application (deliberately so, to try
>> and keep the discussion neutral), but I'll lay out some details.
>
> Actually I wish you'd written this sooner. I don't know about you, but my brain
> has a hard time understanding abstractions that are presented without concrete
> use cases and implementations alongside; OTOH I delight in taking a concrete
> mess and extract abstractions from it.
> (The Twisted guys are also masters at this.)
>
> So far I didn't really "get" the reasons you brought up for some of
> complications you introduced (like multiple Future implementations).
> Now I think I'm glimpsing your reasons.

Part of the art of conversation is figuring out how the other participants need to hear something. My apologies for not figuring this out sooner :)

>> We're working on adding support for Windows 8 apps (formerly known as Metro)
>> written in Python. These will use the new API (WinRT) which is highly
>> asynchronous - even operations such as opening a file are only* available as an
>> asynchronous function. The intention is to never block on the UI thread.
>
> Interesting. The lack of synchronous wrappers does seem a step back, but is
> probably useful as a forcing function given the desire to keep the UI responsive
> at all times.

Indeed. Based on the Win 8 apps I regularly use, it's worked well. On the other hand, updating CPython to avoid the synchronous ones (which I've done, and will be submitting for consideration soon, once I've been able to test on an ARM device) is less fun.

>> (* Some synchronous Win32 APIs are still available from C++, but these
>> are actively discouraged and restricted in many ways. Most of Win32 is
>> not usable.)
>>
>> The model used for these async APIs is future-based: every *Async() function
>> returns a future for a task that is already running. The caller is not allowed
>> to wait on this future - the only option is to attach a callback. C# and VB use
>> their async/await keywords (good 8 min intro video on those:
>> http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript
>> and C++ have multi-line lambda support.
>
> Erik Meijer introduced me to async/await on Elba two months ago. I was very
> excited to recognize exactly what I'd done for NDB with @tasklet and yield,
> supported by the type checking.
>
>> For Python, we are aiming for closer to the async/await model (which is also
>> how we chose the names).
>
> If we weren't so reluctant to introduce new keywords in Python we might
> introduce await as an alias for yield from in the future.

We discussed that internally and decided that it was unnecessary, or at least that it should be a proper keyword rather than an alias (as in, you can't use 'await' to delegate to a subgenerator). I'd rather see codef added first, since that (could) remove the need for the decorators.

>> Incidentally, our early designs used yield from exclusively. It was only when
> we started discovering edge-cases where things broke, as well as the impact on
> code 'cleanliness', that we switched to yield.
>
> Very interesting. I'd love to see a much longer narrative on this.
> (You can send it to me directly if you feel it would distract the list or if you
> feel it's inappropriate to share widely. I'll keep it under my hat as long as
> you say so.)

If I get a chance to write something up then I will do that. I'll quite happily post it publicly, though it may go on my blog rather than here - this email is going to be long enough already. There is very little already written up since we discussed most of it at a whiteboard, though I do still have some early code iterations.

>> There are three aspects of this that work better and result in cleaner code
>> with wattle than with tulip:
>>
>> - event handlers can be "async-void", such that when the event is raised by
>> the OS/GUI/device/whatever the handler can use asynchronous tasks without
>> blocking the main thread.
>
> I think this is "fire-and-forget"? I.e. you initiate an action and then just let
> it run until completion without ever checking the result? In tulip you currently
> do that by wrapping it in a Task and calling its start() method. (BTW I think
> I'm going to get rid of
> start() -- creating a Task should just start it.)

Yes, exactly. The only thing I dislike about tulip's current approach is that it requires two functions. If/when we support it, we'd provide a decorator that does the wrapping.

>> In this case, the caller receives a future but ignores it because it
>> does not care about the final result. (We could achieve this under
>> 'yield from' by requiring a decorator, which would then probably
>> prevent other Python code from calling the handler directly. There is
>> very limited opportunity for us to reliably intercept this case.)
>
> Are you saying that this property (you don't wait for the result) is required by
> the operation rather than an option for the user? I'm only familiar with the
> latter -- e.g. I can imagine firing off an operation that writes a log entry
> somewhere but not caring about whether it succeeded -- but I would still make it
> *possible* to check on the operation if the caller cares (what if it's a very
> important log message?).
>
> If there's no option for the caller, the API should present itself as a regular
> function/method and the task-spawning part should be hidden inside it -- I see
> no need for the caller to know about this.
>
> What exactly do you mean by "reliably intercept this case" ? A concrete example
> would help.

You're exactly right, there is no need for the original caller (for example, Windows itself) to know about the task. However, every incoming call initially comes through a COM interface that we provide (written in C) that will then invoke the Python function. This is our opportunity to intercept by looking at the returned value from the Python function before returning to the original caller.

Under wattle, we can type check here for a Future (or compatible interface), which is only ever used for async functions. On the other hand, we cannot reliable type-check for a generator to determine whether it is supposed to be async or supposed to be an iterator.

If the interface we implement expects an iterator then we can assume that we should treat the generator like that. However, if the user intended their code to be async and used 'yield from' with no decorator, we cannot provide any useful feedback: they will simply return a sequence of null pointers that is executed as quickly as the caller wants to - there is no scheduler involved in this case.

>> - the event loop is implemented by the OS. Our Scheduler implementation does
>> not need to provide an event loop, since we can submit() calls to the OS-level
>> loop. This pattern also allows wattle to 'sit on top of' any other event loop,
>> probably including Twisted and 0MQ, though I have not tried it (except with
>> Tcl).
>
> Ok, so what is the API offered by the OS event loop? I really want to make sure
> that tulip can interface with strange event loops, and this may be the most
> concrete example so far -- and it may be an important one.

There are three main APIs involved:

* Windows.UI.Core.CoreDispatcher.run_async() (and run_idle_async(), which uses a low priority)
* Windows.System.Threading.ThreadPool.run_async()
* any API that returns a future (==an object implementing IAsyncInfo)

Strictly, the third category covers the first two, since they both return a future, but they are also the APIs that allow the user/developer to schedule work on or off the UI thread (respectively).

For wattle, they equate directly to Scheduler.submit, Scheduler.thread_pool.submit (which wasn't in the code, but was suggested in the write-up) and Future.

>> - Future objects can be marshalled directly from Python into Windows,
>> completing the interop story.
>
> What do you mean by marshalled here? Surely not the stdlib marshal module.

No.

>Do you just mean that Future objects can be recognized by the foreign-function
> interface and wrapped by / copied into native Windows 8 datatypes?

Yes, this is exactly what we would do. The FFI creates a WinRT object that forwards calls between Python and Windows as necessary. (This is a general mechanism that we use for many types, so it doesn't matter how the Future is created. On a related note, returning a Future from Python code into Windows will not be a common occurrence - it is far more common for Python to consume Futures that are passed in.)

> I understand your event loop understands Futures? All of them? Or only the ones
> of the specific type that it also returns?

It's based on an interface, so as long as we can provide (equivalents of) add_done_callback() and result() then the FFI will do the rest.

>> Even with tulip, we would probably still require a decorator for this case so
>> that we can marshal regular generators as iterables (for which there is a
>> specific type).
>
> I can't quite follow you here, probably due to lack of imagination on my part.
> Can you help me with a (somewhat) concrete example?

Given a (Windows) prototype:

IIterable<String> GetItems();

We want to allow the Python function to be implemented as:

def get_items():
for data in ['a', 'b', 'c']:
yield data

This is a pretty straightforward mapping: Python returns a generator, which supports the same interface as IIterable, so we can marshal the object out and convert each element to a string.

The problem is when a (possibly too keen) user writes the following code:

def get_items():
data = yield from get_data_async()
return data

Now the returned generator is full of None, which we will happily convert to a sequence of empty strings (==null pointers in Win8). With wattle, the yielded objects would be Futures, which would still be converted to strings, but at least are obviously incorrect. Also, since the user should be in the habit of adding @async already, we can raise an error even earlier when the return value is a future and not a generator.

Unfortunately, nothing can fix this code (except maybe a new keyword):

def get_items():
data = yield from get_data_async()
for item in data:
yield item

>> Without a decorator, we would probably have to ban both cases to prevent
> subtly misbehaving programs.
>
> Concrete example?

Given above. By banning both cases we would always raise TypeError when a generator is returned, even if an iterable or an async operation is expected, because we can't be sure which one we have.

>> At least with wattle, the user does not have to do anything different from any
>> of their other @async functions.
>
> This is because you can put type checks inside @async, which sees the function
> object before it's called, rather than the scheduler, which only sees what it
> returned, right? That's a trick I use in NDB as well and I think tulip will end
> up requiring a decorator too -- but it will just "mark" the function rather than
> wrap it in another one, unless the function is not a generator (in which case it
> will probably have to wrap it in something that is a generator). I could imagine
> a debug version of the decorator that added wrappers in all cases though.

It's not so much the type checks inside @async - those are basically to support non-generator functions being wrapped (though there is little benefit to this apart from maintaining a consistent interface). The benefit is that the _returned object_ is always going to be some sort of Future.

Because of the way that our FFI will work, a simple marker on the function would be sufficient for our interop purposes. However, I don't think it is a general enough solution (for example, if the caller is already in Python then they may not get to see the function before it is called - Twisted might be affected by this, though I'm not sure).

What might work best is allowing the replacement scheduler/pollster to provide or override the decorator somehow, though I don't see any convenient way to do this

>> Despite this intended application, I have tried to approach this design task
>> independently to produce an API that will work for many cases, especially given
>> the narrow focus on sockets. If people decide to get hung up on "the Microsoft
>> way" or similar rubbish then I will feel vindicated for not mentioning it
>> earlier :-) - it has not had any more influence on wattle than any of my other
>> past experience has.
>
> No worries about that. I agree that we need concrete examples that takes us
> beyond the world of sockets; it's just that sockets are where most of the
> interest lies (Tornado is a webserver, Twisted is often admired because of its
> implementations of many internet protocols, people benchmark async frameworks on
> how many HTTP requests per second they can serve) and I haven't worked with any
> type of GUI framework in a very long time. (Kudos for trying your way Tk!)

I don't blame you for avoiding GUI frameworks... there are very few that work well. Hopefully when we fully support XAML-based GUIs that will change somewhat, at least for Windows developers.

Also, I didn't include the Tk scheduler in BitBucket, but just to illustrate the simplicity of wrapping an existing loop I've posted the full code below (it still has some old names in it):

import contexts

class TkContext(contexts.CallableContext):
def __init__(self, app):
self.app = app

@staticmethod
def invoke(callable, args, kwargs):
callable(*args, **kwargs)

def submit(self, callable, *args, **kwargs):
'''Adds a callable to invoke within this context.'''
self.app.after(0, TkContext.invoke, callable, args, kwargs)

Cheers,
Steve

Terry Reedy

unread,

Nov 1, 2012, 1:50:56 PM11/1/12

to python...@python.org

On 11/1/2012 12:44 PM, Steve Dower wrote:

>>> C# and VB use
>>> their async/await keywords (good 8 min intro video on those:
>>> http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778

Thanks for the link. It make much of this discussion more concrete for
me. As a potential user, the easy async = @async, await = yield from
transformation (additions) is what I would like for Python.

I do realize that the particular task was picked to be easy and that
other things might be harder (on Windows), and that Python has the
additional problem of working on multiple platforms. But I think 'make
easy things easy and difficult things possible' applies here.

I have no problem with 'yield from' instead of 'await' = 'wait for'.
Actually, the caller of the movie list fetcher did *not* wait for the
entire list to be fetched, even asynchronously. Rather, it displayed
items as they were available (yielded). So the app does less waiting,
and 'yield as available' is what 'await' does in that example.

Actually, I do not see how just adding 4 keywords would necessarily have
the effect it did. I imagine there is a bit more to the story than was
shown, like the 'original' code being carefully written so that the
change would have the effect it did. The video is, after all, an
advertorial. Nonetheless, it was impressive.

--
Terry Jan Reedy

Steve Dower

unread,

Nov 1, 2012, 2:07:56 PM11/1/12

to Terry Reedy, python...@python.org

Terry Reedy wrote:
> On 11/1/2012 12:44 PM, Steve Dower wrote:
>>>> C# and VB use
>>>> their async/await keywords (good 8 min intro video on those:
>>>> http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778
>

> [SNIP]

>
> Actually, I do not see how just adding 4 keywords would necessarily have
> the effect it did. I imagine there is a bit more to the story than was shown,
> like the 'original' code being carefully written so that the change would
> have the effect it did. The video is, after all, an advertorial. Nonetheless,
> it was impressive.

It is certainly a dramatic demo, and you are right to be skeptical. The "carefully written" part is that the code already used paging as part of its query - the "give me movies from 1950" request is actually a series of "give me 10 movies from 1950 starting from {0, 10, 20, 30, ...}" requests (this is why you see the progress counter go up by 10 each time) - and it's already updating the UI between each page. The "4 keywords" also activate a significant amount of compiler machinery that actually rewrites the original code, much like the conversion to a generator, so there is quite a bit of magic.

There are plenty of videos at http://channel9.msdn.com/search?term=async+await that go much deeper into how it all works, including the 3rd-party extensibility mechanisms.

(And apologies about the video only being available with Silverlight - I didn't realise this when I originally posted it. The videos at the later link are much more readily available, but also very deeply technical and quite long.)

Cheers,
Steve

Greg Ewing

unread,

Nov 1, 2012, 6:37:59 PM11/1/12

to python...@python.org

Guido van Rossum wrote:
> If we weren't so reluctant to introduce new keywords in Python we
> might introduce await as an alias for yield from in the future.

Or 'cocall'. :-)

> I think tulip will end up requiring a decorator too -- but it will
> just "mark" the function rather than wrap it in another one, unless
> the function is not a generator (in which case it will probably have
> to wrap it in something that is a generator).

I don't see how that helps much, because the scheduler
doesn't see generators used in yield-from calls. There
is *no* way to catch the mistake of writing

foo()

when you should have written

yield from foo()

instead.

This is one way that codef/cocall (or some variant on it)
would help, by clearly diagnosing that mistake.

--
Greg

Devin Jeanpierre

unread,

Nov 7, 2012, 4:11:47 AM11/7/12

to Glyph, Python-Ideas

It's been a week, and nobody has responded to Glyph's email. I don't
think I know enough to agree or disagree with what he said, but it was
well-written and it looked important. Also, Glyph has a lot of
experience with this sort of thing, and it would be a shame if he was
discouraged by the lack of response. We can't really expect people to
contribute if their opinions are ignored.

Can relevant people please take another look at his post?

-- Devin