[Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)

Eric Snow

unread,

Oct 14, 2012, 12:15:26 PM10/14/12

to python...@python.org

On Oct 14, 2012 8:42 AM, "Guido van Rossum" <gu...@python.org> wrote:
> Sadly it looks that
>
> r = yield from (f1(), f2())
>
> ends up interpreting the tuple as the iterator, and you end up with
>
> r = (f1(), f2())
>
> (i.e., a tuple of generators) rather than the desired
>
> r = ((yield from f1()), (yield from f2()))

Didn't want this tangent to get lost to the async discussion. Would it be too late to make a change along these lines? Would it be enough of an improvement to be warranted?

-eric

Guido van Rossum

unread,

Oct 14, 2012, 1:33:55 PM10/14/12

to Eric Snow, python...@python.org

3.3 has been released. It's too late. Also I'm not sure what change
*could* be made. Surely yield from <a function returning a tuple>
should just iterate over that tuple -- that's fundamental to yield
from. The only thing that could be done might be to change "yield from
x, y" to mean something different than "yield from (x, y)" -- but
that's questionable at best, and violates many other contexts (e.g.
"return x, y", "yield x, y", "for i in x, y:").

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Steven D'Aprano

unread,

Oct 14, 2012, 3:14:19 PM10/14/12

to python...@python.org

On 15/10/12 03:15, Eric Snow wrote:
> On Oct 14, 2012 8:42 AM, "Guido van Rossum"<gu...@python.org> wrote:
>> Sadly it looks that
>>
>> r = yield from (f1(), f2())
>>
>> ends up interpreting the tuple as the iterator, and you end up with
>>
>> r = (f1(), f2())
>>
>> (i.e., a tuple of generators) rather than the desired
>>
>> r = ((yield from f1()), (yield from f2()))

How about this?

r = yield from *(f1(), f2())

which currently is a SyntaxError in 3.3.

--
Steven

Guido van Rossum

unread,

Oct 14, 2012, 3:19:04 PM10/14/12

to Steven D'Aprano, python...@python.org

On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano <st...@pearwood.info> wrote:
> On 15/10/12 03:15, Eric Snow wrote:
>>
>> On Oct 14, 2012 8:42 AM, "Guido van Rossum"<gu...@python.org> wrote:
>>>
>>> Sadly it looks that
>>>
>>> r = yield from (f1(), f2())
>>>
>>> ends up interpreting the tuple as the iterator, and you end up with
>>>
>>> r = (f1(), f2())
>>>
>>> (i.e., a tuple of generators) rather than the desired
>>>
>>> r = ((yield from f1()), (yield from f2()))
>
>
> How about this?
>
> r = yield from *(f1(), f2())
>
>
> which currently is a SyntaxError in 3.3.

I think it's too early to start proposing new syntax for a problem we
don't even know is common at all.

Greg Ewing's proposal works for me:

r = yield from par(f1(), f2())

--
--Guido van Rossum (python.org/~guido)

Christian Tismer

unread,

Oct 14, 2012, 4:55:34 PM10/14/12

to Guido van Rossum, python...@python.org

Hmmm...

On 14.10.12 21:19, Guido van Rossum wrote:
> On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>> On 15/10/12 03:15, Eric Snow wrote:
>>> On Oct 14, 2012 8:42 AM, "Guido van Rossum"<gu...@python.org> wrote:
>>>> Sadly it looks that
>>>>
>>>> r = yield from (f1(), f2())
>>>>
>>>> ends up interpreting the tuple as the iterator, and you end up with
>>>>
>>>> r = (f1(), f2())
>>>>
>>>> (i.e., a tuple of generators) rather than the desired
>>>>
>>>> r = ((yield from f1()), (yield from f2()))
>>
>> How about this?
>>
>> r = yield from *(f1(), f2())
>>
>>
>> which currently is a SyntaxError in 3.3.
> I think it's too early to start proposing new syntax for a problem we
> don't even know is common at all.
>
> Greg Ewing's proposal works for me:
>
> r = yield from par(f1(), f2())

I'm not very positive about all I've read in the last 50 hours.

The concept of generators IMHO gets overly bent towards modelling
a sensible syntax for a problem that not even had a convincing
solution in a dialect that already has full coroutines.
'par' and 'join' and friends should be considered without thinking
of generators in the first place. This is attacking too many problems
in one shot.

My approach would be to first find out how async operations should
be modelled the best under the assumption that we have a coroutine
concept that works without headaches about yielding in and out from
something to whatnot.

After that is settled and gets consensus, then I would think about
bearable patterns to implement that using generators. And when we
know what we really need, maybe considering more suitable Syntax.

my 0.2 thousand yen - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

unread,

Oct 15, 2012, 1:10:56 AM10/15/12

to python...@python.org

Guido van Rossum wrote:

> I think it's too early to start proposing new syntax for a problem we
> don't even know is common at all.
>
> Greg Ewing's proposal works for me:
>
> r = yield from par(f1(), f2())

Also, whoever's proposing this needs to understand that even
if the suggested change to yield-from were made, it would NOT
automatically result in par() behaviour. It would just be
another way of sequentially calling two sub-generators.

--
Greg

Greg Ewing

unread,

Oct 15, 2012, 1:34:45 AM10/15/12

to python...@python.org

Christian Tismer wrote:

> My approach would be to first find out how async operations should
> be modelled the best under the assumption that we have a coroutine
> concept that works without headaches about yielding in and out from
> something to whatnot.

I think we already know that. People like Dijkstra and Hoare
figured it all out decades ago.

That's what my generator-oriented approach is based on --
using standard techniques for managing concurrency.

> After that is settled and gets consensus, then I would think about
> bearable patterns to implement that using generators. And when we
> know what we really need, maybe considering more suitable Syntax.

Given that we don't want to use OS threads or greenlets,
but we're happy to use generators, all that's left is to
find bearable patterns for doing so.

--
Greg

Christian Tismer

unread,

Oct 15, 2012, 4:24:53 AM10/15/12

to Greg Ewing, python...@python.org

On 15.10.12 07:34, Greg Ewing wrote:
> Christian Tismer wrote:
>
>> My approach would be to first find out how async operations should
>> be modelled the best under the assumption that we have a coroutine
>> concept that works without headaches about yielding in and out from
>> something to whatnot.
>
> I think we already know that. People like Dijkstra and Hoare
> figured it all out decades ago.
>
> That's what my generator-oriented approach is based on --
> using standard techniques for managing concurrency.
>

Sure, the theory is clear and well-known.
Not so clear is which of the concepts to implement to
what detail, and things like the C10K problem still are a challenge
to solve efficiently for Python.

http://www.kegel.com/c10k.html

I think it is necessary to take these considerations into account
at least and to think about updating large sets of waiting
channels efficiently, using appropriate data structures.

>> After that is settled and gets consensus, then I would think about
>> bearable patterns to implement that using generators. And when we
>> know what we really need, maybe considering more suitable Syntax.
>
> Given that we don't want to use OS threads or greenlets,
> but we're happy to use generators, all that's left is to
> find bearable patterns for doing so.

Question: Is it already given that something like greenlets is out
of consideration? I did not find a final say on that (but I'm bad at
searching...)

Is the whole discussion about what would be best, or just
"you can choose any implementation provided it's generators" ? :-)

cheers - Chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Laurens Van Houtven

unread,

Oct 15, 2012, 5:29:37 AM10/15/12

to Christian Tismer, python...@python.org

On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer <tis...@stackless.com> wrote:

Question: Is it already given that something like greenlets is out
of consideration? I did not find a final say on that (but I'm bad at
searching...)

I think an number of people have expressed a distaste for implicit task switching. That doesn't mean "no", but I'm guessing what's going to happen is having some kind of explicit, generator based thing, with an underlying API that makes implementing greenlets pretty easy.

Is the whole discussion about what would be best, or just
"you can choose any implementation provided it's generators" ? :-)

cheers - Chris

--
cheers

lvh

Christian Tismer

unread,

Oct 15, 2012, 6:38:21 AM10/15/12

to Laurens Van Houtven, python...@python.org

Thanks for your reply.

Just one thing that I don't get.
What do you mean by 'implicit taskswitching' ?
There is no such thing in greenlet, if you really meant that
Library from Armin Rigo.

greenlets do everything explicitly, no pre-emption at all.

So, is there a general understanding what a greenlet is and what not?
Just to make sure that the discussed terms are clearly defined.

Nick Coghlan

unread,

Oct 15, 2012, 8:18:54 AM10/15/12

to Christian Tismer, python...@python.org

On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer <tis...@stackless.com> wrote:
> Just one thing that I don't get.
> What do you mean by 'implicit taskswitching' ?
> There is no such thing in greenlet, if you really meant that
> Library from Armin Rigo.
>
> greenlets do everything explicitly, no pre-emption at all.
>
> So, is there a general understanding what a greenlet is and what not?
> Just to make sure that the discussed terms are clearly defined.

With greenlets, your potential switching points are every function
call (because you can call switch() from anywhere, and you can't
reliably know the name of *every* IO operation, or operation that
implicitly invokes an IO operation).

With generators, there is always an explicit *local* marker within the
generator body of the potential switching points: yield expressions
(including yield from). Ordinary function calls cannot cause the
function to be suspended.

So greenlets give you the scalability benefits of microthreading (as
almost any OS supports a couple of orders of magnitude more sockets
than it can threads), but without the same benefits of locally visible
suspension points that are provided by generators and explicit
callbacks.

That's the philosophical reason. As a *practical* matter, there's
still the problem you described in more detail elsewhere that CPython
relies too much on the C stack to support suspension of arbitrary call
chains without the stack switching assembly code in
Stackless/greenlets.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Christian Tismer

unread,

Oct 15, 2012, 9:57:53 AM10/15/12

to Nick Coghlan, python...@python.org

Hey Nick,

On 15.10.12 14:18, Nick Coghlan wrote:
> On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer <tis...@stackless.com> wrote:
>> Just one thing that I don't get.
>> What do you mean by 'implicit taskswitching' ?
>> There is no such thing in greenlet, if you really meant that
>> Library from Armin Rigo.
>>
>> greenlets do everything explicitly, no pre-emption at all.
>>
>> So, is there a general understanding what a greenlet is and what not?
>> Just to make sure that the discussed terms are clearly defined.
> With greenlets, your potential switching points are every function
> call (because you can call switch() from anywhere, and you can't
> reliably know the name of *every* IO operation, or operation that
> implicitly invokes an IO operation).

That's true, and you will wonder: I never liked that!
See below (you'll wonder even more)

> With generators, there is always an explicit *local* marker within the
> generator body of the potential switching points: yield expressions
> (including yield from). Ordinary function calls cannot cause the
> function to be suspended.
>
> So greenlets give you the scalability benefits of microthreading (as
> almost any OS supports a couple of orders of magnitude more sockets
> than it can threads), but without the same benefits of locally visible
> suspension points that are provided by generators and explicit
> callbacks.

Yes, I understood that a lot better now.
The nice trick of the (actually a bit ugly) explicit down-chaining
of the locally visible switching points is the one thing that makes
a huge difference, both for Stackless and Greenlets.
Because we could never know the exact switching points, things became
so difficult to handle.

> That's the philosophical reason. As a *practical* matter, there's
> still the problem you described in more detail elsewhere that CPython
> relies too much on the C stack to support suspension of arbitrary call
> chains without the stack switching assembly code in
> Stackless/greenlets.
>

Right, CPython still keeps unneccessary crap on the C stack.
But that's not the point right now, because on the other hand,
in the context of a possible yield (from or not), the C stack
is clean, and this enables switching.
And actually in such clean positions, Stackless Python (as opposed to
Greenlets) does soft-switching, which is very similar to what the generators
are doing - there is no assembly stuff involved at all.

So in the context of switching, CPython is presumably more efficient
than greenlet (because of stack slicing), and a bit less efficient than
stackless because of the generator chaining.

I have begun studying the code for YIELD_FROM. As it is written, every
next iteration elevates the chain of generators once up and down.
Maybe that can be avoided by changing the frame chain, so this can become
a cheaper O(1) operation.

Alternatively I could also imagine to write real generators or coroutines
as an extension module. It would use the same concept as generators,
internally. No big deal, not changing the interpreter, maybe adding a bit.

I think this would make Greenlet and even Stackless obsolete in most
cases which are of real use.

I would like to discuss this and maybe do a prototype.

cheers - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

unread,

Oct 15, 2012, 10:46:00 AM10/15/12

to Christian Tismer, python...@python.org

On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer <tis...@stackless.com> wrote:
> So in the context of switching, CPython is presumably more efficient
> than greenlet (because of stack slicing), and a bit less efficient than
> stackless because of the generator chaining.
>
> I have begun studying the code for YIELD_FROM. As it is written, every
> next iteration elevates the chain of generators once up and down.
> Maybe that can be avoided by changing the frame chain, so this can become
> a cheaper O(1) operation.

Yes, we certainly talked about that, but I don't believe anyone came
up with the code needed to make it behave itself properly when
unwinding the stack. (Either that or someone *did* try it, and then
undid it because it broke the test suite, which amounts to the same
thing. Mercurial could say for sure)

> Alternatively I could also imagine to write real generators or coroutines
> as an extension module. It would use the same concept as generators,
> internally. No big deal, not changing the interpreter, maybe adding a bit.

Tangentially related, there are some patches [1,2] on the tracker
looking to shuffle a few things related to generator state around to
get them out of the frame objects and into the generator objects where
they belong. There are definitely a few things that could do with
cleaning up in this space.

[1] http://bugs.python.org/issue13897
[2] http://bugs.python.org/issue13607

> I think this would make Greenlet and even Stackless obsolete in most
> cases which are of real use.

The "take this synchronous code and magically make it scale better"
aspect is still a nice feature of greenlets & gevent.

>
> I would like to discuss this and maybe do a prototype.

Sure, I think there's several things we can do better here, and I
think the test suite is comprehensive enough to keep us honest.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Christian Tismer

unread,

Oct 15, 2012, 12:41:27 PM10/15/12

to Nick Coghlan, python...@python.org

On 15.10.12 16:46, Nick Coghlan wrote:
> On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer <tis...@stackless.com> wrote:

...

>> Alternatively I could also imagine to write real generators or coroutines
>> as an extension module. It would use the same concept as generators,
>> internally. No big deal, not changing the interpreter, maybe adding a bit.
> Tangentially related, there are some patches [1,2] on the tracker
> looking to shuffle a few things related to generator state around to
> get them out of the frame objects and into the generator objects where
> they belong. There are definitely a few things that could do with
> cleaning up in this space.
>
> [1] http://bugs.python.org/issue13897
> [2] http://bugs.python.org/issue13607

Thanks for pointing me at that. I think Mark Shannon has quite similar
thoughts.
I need to talk to him.

>> I think this would make Greenlet and even Stackless obsolete in most
>> cases which are of real use.
> The "take this synchronous code and magically make it scale better"
> aspect is still a nice feature of greenlets & gevent.

I had a deeper look into gevent and how it uses greenlet and
does its monkey-patching. Indeed, cute!
My assumption was that I could write a surrogate greenlet
using the advanced generators.

But I overlooked that for this to work, everything must behave
like generators. Not only the surrogate greenlet, but also
the code that it wants to switch. Argh...

A work-around for gevent would be a rewrite of all supported
modules to patch. Not a cake walk.

Thanks, you gave me a lot of insight!

>> I would like to discuss this and maybe do a prototype.
> Sure, I think there's several things we can do better here, and I
> think the test suite is comprehensive enough to keep us honest.
>
> Cheers,
> Nick.
>

Cheers - Chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Christian Tismer

unread,

Oct 15, 2012, 1:25:31 PM10/15/12

to Nick Coghlan, python...@python.org

On 15.10.12 15:57, Christian Tismer wrote:
>
> Right, CPython still keeps unneccessary crap on the C stack.
> But that's not the point right now, because on the other hand,
> in the context of a possible yield (from or not), the C stack
> is clean, and this enables switching.
> And actually in such clean positions, Stackless Python (as opposed to
> Greenlets) does soft-switching, which is very similar to what the
> generators
> are doing - there is no assembly stuff involved at all.

I'm sorry about the expression "crap". Please read this as "stuff".

I was not aware of the unfriendliness of this word and will be more
careful next time.

cheers - Chris

Greg Ewing

unread,

Oct 15, 2012, 4:19:58 PM10/15/12

to python...@python.org

Christian Tismer wrote:

> Question: Is it already given that something like greenlets is out
> of consideration?

Greenlets will always be available to those who want and
are able to use them.

But there's a desire to have something in the standard library
that is completely portable and doesn't rely on any platform
dependent techniques or tricks. That's what we're talking
about here.

--
Greg

Greg Ewing

unread,

Oct 15, 2012, 8:44:22 PM10/15/12

to python...@python.org

Christian Tismer wrote:

> Right, CPython still keeps unneccessary crap on the C stack.

It's not just Python leaving stuff on the stack that's a
problem, it's external C code that calls back into Python.

> But that's not the point right now, because on the other hand,
> in the context of a possible yield (from or not), the C stack
> is clean, and this enables switching.

> And actually in such clean positions, Stackless Python (as opposed to
> Greenlets) does soft-switching, which is very similar to what the
> generators
> are doing - there is no assembly stuff involved at all.

But the assembly code still needs to be there to handle the
cases where you *can't* do soft switching. It's the presence
of the code that's the issue here, not how frequently it
gets called.

> I have begun studying the code for YIELD_FROM. As it is written, every
> next iteration elevates the chain of generators once up and down.
> Maybe that can be avoided by changing the frame chain, so this can become
> a cheaper O(1) operation.

My original implementation of yield-from actually *did* avoid
this, by keeping a C-level pointer chain of yielding-from frames.
But that part was ripped out at the last minute when someone
discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be
fixed, so maybe we will get this optimisation back one day.

--
Greg

Nick Coghlan

unread,

Oct 15, 2012, 9:49:01 PM10/15/12

to Greg Ewing, python...@python.org

On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
<greg....@canterbury.ac.nz> wrote:
> My original implementation of yield-from actually *did* avoid
> this, by keeping a C-level pointer chain of yielding-from frames.
> But that part was ripped out at the last minute when someone
> discovered that it had a detrimental effect on tracebacks.
>
> There are probably other ways the traceback problem could be
> fixed, so maybe we will get this optimisation back one day.

Ah, I thought I remembered something along those lines. IIRC, it was a
bug report on one of the alphas that prompted us to change it.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Christian Tismer

unread,

Oct 18, 2012, 9:12:58 PM10/18/12

to Greg Ewing, python...@python.org

Hi Greg,

coming back to this after quite a storm in my brain...

On 16.10.12 02:44, Greg Ewing wrote:
> Christian Tismer wrote:
>
>> Right, CPython still keeps unneccessary crap on the C stack.
>
> It's not just Python leaving stuff on the stack that's a
> problem, it's external C code that calls back into Python.
>

That one is something that I think to ignore.
Of course there are quite some situations where callbacks
into Python are a problem, but I don't want to put this into Python.

There are ways to cope with this, for instance using greenlet as
an optional extension module that handles these cases.
Alternatively, Python itself can do it with strictly controlled threads.

But I think leaving that out will simplify matters a lot and keeps Python
clean. In the end, I want to model something that is likely to be
accepted.

>> But that's not the point right now, because on the other hand,
>> in the context of a possible yield (from or not), the C stack
>> is clean, and this enables switching.
>
>> And actually in such clean positions, Stackless Python (as opposed to
>> Greenlets) does soft-switching, which is very similar to what the
>> generators
>> are doing - there is no assembly stuff involved at all.
>
> But the assembly code still needs to be there to handle the
> cases where you *can't* do soft switching. It's the presence
> of the code that's the issue here, not how frequently it
> gets called.

No, I'm intending to really rip that out.
Or better, I want to do a rewrite of a subset of Stackless,
actually the functionality that allows to implement greenlets
or multi-level generators, task scheduling and so on.

In effect, I want to find something that enables some extended
switching. Emulated, without hacking the kernel in the first place.

Generators are restricted to call/yield/return positions, and
I thing that's fine. What can be switched is totally clear by
definitions, and I like that. I'm talking of exactly that.

What I dislike is a different topic ;-)

>> I have begun studying the code for YIELD_FROM. As it is written, every
>> next iteration elevates the chain of generators once up and down.
>> Maybe that can be avoided by changing the frame chain, so this can
>> become
>> a cheaper O(1) operation.
>
> My original implementation of yield-from actually *did* avoid
> this, by keeping a C-level pointer chain of yielding-from frames.
> But that part was ripped out at the last minute when someone
> discovered that it had a detrimental effect on tracebacks.
>
> There are probably other ways the traceback problem could be
> fixed, so maybe we will get this optimisation back one day.

Ok, let's ignore this O(n) problem for now. _yield from_ is anyway
probably faster by more than an order of magnitude, so it will
serve your purpose (nesting generators) pretty well.

My problem is different because I want a scaling building block
for building higher level structures, and I would love to build them
using _yield from_ .

There are a few things which contradict completely my thinking:

- _yield from_ works from the top. That is, if I have five nested iterators
and I want to drive them, then I have to call the root generator?!
I see that that works, but it is against all what I'm used to.

How can I inject the info that I want to switch context?

- generators always yield to the caller, and also return values to the
caller. What I'm looking for is to express a switch so something
else?

- generators are able to free the stack, when they yield. But when they
are active, they use the full stack. At least when I follow the pattern
"generator is calling sub-generator".
A deeply nested recursion is therefore something to avoid. :-(

Now I'm playing around with different approaches to model something
flexible that gives me more freedom. Right now I'm trying a slightly
pervert approach to give me an _unwindable_, merely a frame-like
object that can vanish on demand.

I'm also experimenting with emulating different kinds of _yield_".
Since there is only one kind of yield to one target, I get the problem
to distinguish that for different purposes.

Example:

I can take a set of nested functions in their native form.
Then replacing ordinary calls by _yield from_ and inserting proper
yields before actually returning, I now have the equivalent of a nested
function call, that I can drive with another _yield from_ .
This is now a function that permanently releases the stack.

Now I would like to give one of the nested function the ability to
transfer execution somewhere else. The support is insofar there,
as the stack is freed all the time. But this function that wants to
switch needs to pass the fact that it wants to switch, plus the target
somewhere. As I understood it, I would need to yield that to the
driver function.
In order to do that, I would need to yield a tuple or a bound object.
This is a different purpose than the simple driver functionality.

Do you see it? In my understanding, a switch would not be driven from
the top and then dispatched upon, but a called function below the
function to be switched would modify something that leads to a
switch as a result.
In this example, the generator-emulated function would be driven
by thousands of yields, kind of polled to catch the one event that
needs to be supported by a switching action.

This looks wrong for me, like doing things upside down.

To shorten this: I have the problem that I have your very efficient yield
collector, but I need to dispatch on what is intended by the yield,
instead of initiating a reaction from where I am.
All in all, I can't get rid of the thought "un-pythonic".

So I'm still thinking of a frame-like object that allows me to control
its execution, let it vanish, and so on, and use it as a building block.
As always, I'm feeling bad when going this road, because I want to use
the eficient _yield from_ as much as possible.

But it may be my missing experience.

Do you understand, and maybe see where I have the wrong
brain shortcuts?
How do you write something composable that scales?

Cheers -- Chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

unread,

Oct 19, 2012, 1:15:33 AM10/19/12

to python...@python.org

Christian Tismer wrote:

> - generators are able to free the stack, when they yield. But when they
> are active, they use the full stack. At least when I follow the pattern
> "generator is calling sub-generator".
> A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they
used to be.

In any case, for the application we're talking about here,
the difference will probably not be noticeable.

> But this function that wants to
> switch needs to pass the fact that it wants to switch, plus the target
> somewhere. As I understood it, I would need to yield that to the
> driver function.

You understand incorrectly. In my scheduler, the yields
don't send or receive values at all. Communicating with the
scheduler, for example to tell it to allow another task to
run, is done by calling functions. A yield must be done to
actually allow a switch, but the yield itself doesn't send
any information.

> Do you see it? In my understanding, a switch would not be driven from
> the top and then dispatched upon, but a called function below the
> function to be switched would modify something that leads to a
> switch as a result.

That's pretty much what happens in my scheduler.

> Do you understand, and maybe see where I have the wrong
> brain shortcuts?
> How do you write something composable that scales?

I think you should study my scheduler tutorial. If you can
understand how that works, I think it will answer many of
your questions.

http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/

--
Greg

Christian Tismer

unread,

Oct 19, 2012, 8:05:20 AM10/19/12

to Greg Ewing, python...@python.org

On 19.10.12 07:15, Greg Ewing wrote:
> Christian Tismer wrote:
>
>> - generators are able to free the stack, when they yield. But when they
>> are active, they use the full stack. At least when I follow the
>> pattern
>> "generator is calling sub-generator".
>> A deeply nested recursion is therefore something to avoid. :-(
>
> Only if yield-from chains aren't optimised the way they
> used to be.

Does that mean a very deep recursion would be efficient?

I'm trying to find that change in the hg history right now.

Can you give me a hint how your initial implementation
works, the initial patch source?
>
> ...

>> But this function that wants to
>> switch needs to pass the fact that it wants to switch, plus the target
>> somewhere. As I understood it, I would need to yield that to the
>> driver function.
>
> You understand incorrectly. In my scheduler, the yields
> don't send or receive values at all. Communicating with the
> scheduler, for example to tell it to allow another task to
> run, is done by calling functions. A yield must be done to
> actually allow a switch, but the yield itself doesn't send
> any information.

I have studied that yesterday already in depth and like that quite much.
It is probably just the problem that I had with generators from their
beginning.

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Christian Tismer

unread,

Oct 19, 2012, 8:55:57 AM10/19/12

to Nick Coghlan, python...@python.org

Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:
> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
> <greg....@canterbury.ac.nz> wrote:
>> My original implementation of yield-from actually *did* avoid
>> this, by keeping a C-level pointer chain of yielding-from frames.
>> But that part was ripped out at the last minute when someone
>> discovered that it had a detrimental effect on tracebacks.
>>
>> There are probably other ways the traceback problem could be
>> fixed, so maybe we will get this optimisation back one day.
> Ah, I thought I remembered something along those lines. IIRC, it was a
> bug report on one of the alphas that prompted us to change it.
>

I was curious and searched quite a lot.
It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much
as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove
what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before,
build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

cheers - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

unread,

Oct 19, 2012, 9:56:04 AM10/19/12

to Christian Tismer, python...@python.org

On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tis...@stackless.com> wrote:
> I was curious and searched quite a lot.
> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
> from Marc Shannon, patched by Benjamin.
>
> Now I found the original implementation. That looks very much
> as I'm thinking it should be.
>
> Quite a dramatic change which works well, but really seems to remove
> what I would call "now I can emulate most of Stackless" efficiently.
>
> Maybe I should just try to think it would be implemented as before,
> build an abstraction and just use it for now.
>
> I will spend my time at PyCon de for sprinting on "yield from".

Yeah, if we can get Greg's original optimised behaviour while still
supporting introspection properly, that's really where we want to be.
That's the main reason I'm a fan of Mark's other patches moving more
of the generator state from the frame objects out into the generator
objects - my suspicion is that generator objects themselves need to be
maintaining a full "generator stack" independent of the frame stack in
the main eval loop in order to get the best of both worlds (i.e.
optimised suspend/resume with confusing debuggers).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Mark Lawrence

unread,

Oct 19, 2012, 10:05:54 AM10/19/12

to python...@python.org

On 19/10/2012 14:56, Nick Coghlan wrote:
> On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tis...@stackless.com> wrote:
>> I was curious and searched quite a lot.
>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>> from Marc Shannon, patched by Benjamin.
>>
>> Now I found the original implementation. That looks very much
>> as I'm thinking it should be.
>>
>> Quite a dramatic change which works well, but really seems to remove
>> what I would call "now I can emulate most of Stackless" efficiently.
>>
>> Maybe I should just try to think it would be implemented as before,
>> build an abstraction and just use it for now.
>>
>> I will spend my time at PyCon de for sprinting on "yield from".
>
> Yeah, if we can get Greg's original optimised behaviour while still
> supporting introspection properly, that's really where we want to be.
> That's the main reason I'm a fan of Mark's other patches moving more
> of the generator state from the frame objects out into the generator
> objects - my suspicion is that generator objects themselves need to be
> maintaining a full "generator stack" independent of the frame stack in
> the main eval loop in order to get the best of both worlds (i.e.
> optimised suspend/resume with confusing debuggers).
>
> Cheers,
> Nick.
>

There's nothing like confusing debuggers or have I read that wrong? :)

--
Cheers.

Mark Lawrence.

Nick Coghlan

unread,

Oct 19, 2012, 10:16:01 AM10/19/12

to Mark Lawrence, python...@python.org

On Sat, Oct 20, 2012 at 12:05 AM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> There's nothing like confusing debuggers or have I read that wrong? :)

Yeah, that was the main issue that resulted in the design change - the
optimised approach confused a lot of the introspection machinery. So
the challenge is to restore the optimisation while *also* adding in
mechanisms to preserve the introspection support.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Guido van Rossum

unread,

Oct 19, 2012, 12:07:24 PM10/19/12

to Christian Tismer, python...@python.org

On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tis...@stackless.com> wrote:
> On 19.10.12 07:15, Greg Ewing wrote:
>>
>> Christian Tismer wrote:
>>
>>> - generators are able to free the stack, when they yield. But when they
>>> are active, they use the full stack. At least when I follow the
>>> pattern
>>> "generator is calling sub-generator".
>>> A deeply nested recursion is therefore something to avoid. :-(
>>
>>
>> Only if yield-from chains aren't optimised the way they
>> used to be.
>
>
> Does that mean a very deep recursion would be efficient?

TBH, I am not interested in making very deep recursion work at all. If
you need that, you're doing it wrong in my opinion.

--
--Guido van Rossum (python.org/~guido)

Christian Tismer

unread,

Oct 19, 2012, 12:18:42 PM10/19/12

to Nick Coghlan, python...@python.org

On 19.10.12 15:56, Nick Coghlan wrote:
> On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tis...@stackless.com> wrote:
>> I was curious and searched quite a lot.
>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>> from Marc Shannon, patched by Benjamin.
>>
>> Now I found the original implementation. That looks very much
>> as I'm thinking it should be.
>>
>> Quite a dramatic change which works well, but really seems to remove
>> what I would call "now I can emulate most of Stackless" efficiently.
>>
>> Maybe I should just try to think it would be implemented as before,
>> build an abstraction and just use it for now.
>>
>> I will spend my time at PyCon de for sprinting on "yield from".
> Yeah, if we can get Greg's original optimised behaviour while still
> supporting introspection properly, that's really where we want to be.
> That's the main reason I'm a fan of Mark's other patches moving more
> of the generator state from the frame objects out into the generator
> objects - my suspicion is that generator objects themselves need to be
> maintaining a full "generator stack" independent of the frame stack in
> the main eval loop in order to get the best of both worlds (i.e.
> optimised suspend/resume with confusing debuggers).

That may be very true in order to get real generators.
The storm in my brain is quite intense the last days...

Actually I would like to have a python context where it gets into
"async mode" and interprets all functions defined in that mode as
generators.
In that mode, generators are not meant as generators, but async-enabled
functions.
I see "yield from" as a low-level construct that should not even be
exposed, but be applied automatically in async mode.
That way, we could write normal functions and could implement a real "Yield"
without the "yield from" helper visible everywhere.

Not sure how to do that right. I'm playing with AST a bit to get a feeling
for this.

To give you an idea where my thoughts are meandering around,
I would like to point you at

http://doc.pypy.org/en/latest/stackless.html

That is an implementation that comes close to what I'm thinking.
The drawback of the current PyPy implementation is that it
used greenlet style for its underlying switching. That is what
I want to replace with some "yield from" construct.

cheers - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Christian Tismer

unread,

Oct 19, 2012, 12:50:39 PM10/19/12

to Guido van Rossum, python...@python.org

On 19.10.12 18:07, Guido van Rossum wrote:
> On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tis...@stackless.com> wrote:
>> On 19.10.12 07:15, Greg Ewing wrote:
>>> Christian Tismer wrote:
>>>
>>>> - generators are able to free the stack, when they yield. But when they
>>>> are active, they use the full stack. At least when I follow the
>>>> pattern
>>>> "generator is calling sub-generator".
>>>> A deeply nested recursion is therefore something to avoid. :-(
>>>
>>> Only if yield-from chains aren't optimised the way they
>>> used to be.
>>
>> Does that mean a very deep recursion would be efficient?
> TBH, I am not interested in making very deep recursion work at all. If
> you need that, you're doing it wrong in my opinion.

Misunderstanding I think. Of course I don't want to use deep recursion.
But people might write things that happen several levels deep and
then iterating over lots of stuff. A true generator would have no
problem with that.
Assume just five layers of generators that have to be re-invoked
for a tight yielding loop is quite some overhead that can be avoided.

The reason why I care is that existing implementations that use
greenlet style could be turned into pure python, given that I manage
to write the right support functions, and replace all functions by
generators that emulate functions with async behavior.

It would just be great if that worked at the same speed, independent
from at which stack level an iteration happens.

Agreed that new code like that would be bad style.

ciao - chris

Guido van Rossum

unread,

Oct 19, 2012, 1:18:38 PM10/19/12

to Christian Tismer, python...@python.org

On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer <tis...@stackless.com> wrote:
> On 19.10.12 18:07, Guido van Rossum wrote:
>>
>> On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tis...@stackless.com>
>> wrote:
>>>
>>> On 19.10.12 07:15, Greg Ewing wrote:
>>>>
>>>> Christian Tismer wrote:
>>>>
>>>>> - generators are able to free the stack, when they yield. But when they
>>>>> are active, they use the full stack. At least when I follow the
>>>>> pattern
>>>>> "generator is calling sub-generator".
>>>>> A deeply nested recursion is therefore something to avoid. :-(
>>>>
>>>>
>>>> Only if yield-from chains aren't optimised the way they
>>>> used to be.
>>>
>>>
>>> Does that mean a very deep recursion would be efficient?
>>
>> TBH, I am not interested in making very deep recursion work at all. If
>> you need that, you're doing it wrong in my opinion.
>
>
> Misunderstanding I think. Of course I don't want to use deep recursion.
> But people might write things that happen several levels deep and
> then iterating over lots of stuff. A true generator would have no
> problem with that.

Okay, good. I agree that this use case should be as fast as possible
-- as long as we still see every frame involved when a traceback is
printed.

> Assume just five layers of generators that have to be re-invoked
> for a tight yielding loop is quite some overhead that can be avoided.
>
> The reason why I care is that existing implementations that use
> greenlet style could be turned into pure python, given that I manage
> to write the right support functions, and replace all functions by
> generators that emulate functions with async behavior.
>
> It would just be great if that worked at the same speed, independent
> from at which stack level an iteration happens.

Yup.

> Agreed that new code like that would be bad style.

Like "what"?

--
--Guido van Rossum (python.org/~guido)

Christian Tismer

unread,

Oct 19, 2012, 1:36:39 PM10/19/12

to Guido van Rossum, python...@python.org

On 19.10.12 19:18, Guido van Rossum wrote:
> On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer <tis...@stackless.com> wrote:
>> On 19.10.12 18:07, Guido van Rossum wrote:

>>> ...

>>> TBH, I am not interested in making very deep recursion work at all. If
>>> you need that, you're doing it wrong in my opinion.

>> ...

>> Agreed that new code like that would be bad style.
> Like "what"?
>

Like code that excercises deep recursion thoughtlessly ;-)
in contrast to code that happens to be quite nested because
of a systematic transformation.

So correctness first, big Oh later.

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Mark Shannon

unread,

Oct 19, 2012, 5:15:38 PM10/19/12

to python...@python.org

On 19/10/12 13:55, Christian Tismer wrote:
> Hi Nick,
>
> On 16.10.12 03:49, Nick Coghlan wrote:
>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
>> <greg....@canterbury.ac.nz> wrote:
>>> My original implementation of yield-from actually *did* avoid
>>> this, by keeping a C-level pointer chain of yielding-from frames.
>>> But that part was ripped out at the last minute when someone
>>> discovered that it had a detrimental effect on tracebacks.
>>>
>>> There are probably other ways the traceback problem could be
>>> fixed, so maybe we will get this optimisation back one day.
>> Ah, I thought I remembered something along those lines. IIRC, it was a
>> bug report on one of the alphas that prompted us to change it.
>>
>
> I was curious and searched quite a lot.
> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
> from Marc Shannon, patched by Benjamin.
>
> Now I found the original implementation. That looks very much
> as I'm thinking it should be.
>
> Quite a dramatic change which works well, but really seems to remove
> what I would call "now I can emulate most of Stackless" efficiently.
>
> Maybe I should just try to think it would be implemented as before,
> build an abstraction and just use it for now.
>
> I will spend my time at PyCon de for sprinting on "yield from".
>

The current implementation may not be much slower than Greg's original
version. One of the main costs of making a call is the creation of a new
frame. But calling into a generator does not need a new frame, so the
cost will be reduced.
Unless anyone has evidence to the contrary :)

Rather than increasing the performance of this special case, I would
suggest that improving the performance of calls & returns in general
would be a more worthwhile goal.
Calls and returns ought to be cheap.

Cheers,
Mark

Guido van Rossum

unread,

Oct 19, 2012, 5:31:17 PM10/19/12

to Mark Shannon, python...@python.org

I did a basic timing test using a simple recursive function and a
recursive PEP-380 coroutine computing the same value (see attachment).
The coroutine version is a little over twice as slow as the function
version. I find that acceptable. This went 20 deep, making 2 recursive
calls at each level (except at the deepest level).

Output on my MacBook Pro:

plain 2097151 0.5880069732666016
coro. 2097151 1.2958409786224365

This was a Python 3.3 built a few days ago from the 3.3 branch.

p3time.py

Christian Tismer

unread,

Oct 19, 2012, 6:31:15 PM10/19/12

to Guido van Rossum, python...@python.org

Hi Guido, Marc, all,

this is a veery promising result, telling
me that the big Oh can in fact be
neglected in real applications. 20 for
twi is good!

I will of course do an analysis and find
the parameters of the quadratic, but my
concern is pretty much tamed.

For me that means there will soon be
a library that contains real generators
and more building blocks.

I think using those would simplify the
design of the async API quite a lot.

I suggest to regard current generator
constructs as low-level helpers for
Implementing the real concurrency
building blocks.

Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract
from this now and think in terms of
higher level constructs.

Let's assume generators and coroutines,
and model concurrency from that. I
believe this unwinds the brains and
clarifies things a lot.

I will provide sone classes for that at
the pycon.de sprint, unless somebody
implements it earlier (please don't).

This email was written in non-linear
order, so please ignore logic inversions.

Cheers - chris

Sent from my Ei4Steve

> <p3time.py>

Christian Tismer

unread,

Oct 19, 2012, 6:45:15 PM10/19/12

to Guido van Rossum, python...@python.org

s/twi/two/

Sent from my Ei4Steve

Greg Ewing

unread,

Oct 19, 2012, 7:02:17 PM10/19/12

to python...@python.org

Christian Tismer wrote:

> Can you give me a hint how your initial implementation
> works, the initial patch source?

You can find my initial patches here:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yield_from.html

Essentially, an extra field f_yieldfrom is added to frame
objects. When a 'yield from' is started, the f_yieldfrom field
of the calling frame is set to point to the called frame.

The __next__ method of a generator first traverses the
f_yieldfrom chain to find the frame at the end, and then
resumes that frame. So most of the time, only the innermost
frame of a nested yield-from chain is actually entered in
response to a next() call.

(There are some complications due to the fact that you can
'yield from' something that's not a generator, but the above
is effectively what happens when all the objects in the
chain are generators.)

--
Greg

Greg Ewing

unread,

Oct 19, 2012, 7:50:20 PM10/19/12

to python...@python.org

Nick Coghlan wrote:
> my suspicion is that generator objects themselves need to be
> maintaining a full "generator stack" independent of the frame stack in
> the main eval loop in order to get the best of both worlds (i.e.
> optimised suspend/resume with confusing debuggers).

The f_yieldfrom chain effectively *is* a generator stack, it's
just linked in the opposite direction to the way stacks normally
are. While you probably could move f_yieldfrom out of the frame
object and into the generator-iterator object, I don't see how
it would make any difference to the traceback issue.

I'm not even sure why my original implementation was getting
tracebacks wrong. What *should* happen is that if an exception
comes out of a generator being yielded from, the tail is
chopped off the f_yieldfrom chain and the exception is thrown
into the next frame up, thereby adding its frame to the
traceback.

It may simply be that there was a minor bug in my implementation
that could be fixed without ditching the whole f_yieldfrom
idea. I may look into this if I find time.

--
Greg

Greg Ewing

unread,

Oct 19, 2012, 8:50:28 PM10/19/12

to python...@python.org

Christian Tismer wrote:
> Actually I would like to have a python context where it gets into
> "async mode" and interprets all functions defined in that mode as
> generators.

That sounds somewhat similar to another idea I proposed a while
ago:

There would be a special kind of function called a "cofunction",
that you define using "codef" instead of "def". A cofunction
is essentially a generator, but with a special property: when
one cofunction calls another, the call is implicitly made as
a "yield from" call.

This scheme wouldn't be completely transparent, since the
cofunctions have to be defined in a special way. But the calls
would look like ordinary calls.

There's a PEP describing a variation on the idea here:

http://www.python.org/dev/peps/pep-3152/

In that version, calls to cofunctions are specially marked
using a "cocall" keyword. But since writing that, I've come to
believe that my original idea (where the cocalls are implicit)
was better.

--
Greg

Christian Tismer

unread,

Oct 19, 2012, 9:17:02 PM10/19/12

to Guido van Rossum, python...@python.org

Errhm....,

What you are comparing seems to have a constant factor of about 2.5.

minimax:py3 tismer$ python3 p3time.py
plain 0 1 0.00000
coro. 0 1 0.00001
relat 0 1 8.50000
plain 1 3 0.00000
coro. 1 3 0.00001
relat 1 3 2.77778
plain 2 7 0.00000
coro. 2 7 0.00001
relat 2 7 3.62500
plain 3 15 0.00000
coro. 3 15 0.00001
relat 3 15 2.87500
plain 4 31 0.00001
coro. 4 31 0.00002
relat 4 31 2.42424
plain 5 63 0.00002
coro. 5 63 0.00004
relat 5 63 2.46032
plain 6 127 0.00003
coro. 6 127 0.00007
relat 6 127 2.52542
plain 7 255 0.00006
coro. 7 255 0.00014
relat 7 255 2.38272
plain 8 511 0.00011
coro. 8 511 0.00028
relat 8 511 2.49356
plain 9 1023 0.00022
coro. 9 1023 0.00055
relat 9 1023 2.50327
plain 10 2047 0.00042
coro. 10 2047 0.00106
relat 10 2047 2.50956
plain 11 4095 0.00083
coro. 11 4095 0.00204
relat 11 4095 2.44699
plain 12 8191 0.00167
coro. 12 8191 0.00441
relat 12 8191 2.64792
plain 13 16383 0.00340
coro. 13 16383 0.00855
relat 13 16383 2.51881
plain 14 32767 0.00876
coro. 14 32767 0.01823
relat 14 32767 2.08106
plain 15 65535 0.01419
coro. 15 65535 0.03507
relat 15 65535 2.47131
plain 16 131071 0.02669
coro. 16 131071 0.06874
relat 16 131071 2.57515
plain 17 262143 0.05448
coro. 17 262143 0.13699
relat 17 262143 2.51467
plain 18 524287 0.10843
coro. 18 524287 0.27395
relat 18 524287 2.52660
plain 19 1048575 0.21310
coro. 19 1048575 0.54573
relat 19 1048575 2.56095
plain 20 2097151 0.42802
coro. 20 2097151 1.06199
relat 20 2097151 2.48114
plain 21 4194303 0.86531
coro. 21 4194303 2.19048
relat 21 4194303 2.53143

ciao - chris

p3time.py

Terry Reedy

unread,

Oct 19, 2012, 9:55:22 PM10/19/12

to python...@python.org

On 10/19/2012 5:31 PM, Guido van Rossum wrote:

> I did a basic timing test using a simple recursive function and a
> recursive PEP-380 coroutine computing the same value (see attachment).
> The coroutine version is a little over twice as slow as the function
> version. I find that acceptable. This went 20 deep, making 2 recursive
> calls at each level (except at the deepest level).
>
> Output on my MacBook Pro:
>
> plain 2097151 0.5880069732666016
> coro. 2097151 1.2958409786224365
>
> This was a Python 3.3 built a few days ago from the 3.3 branch.

At the top level, the coroutine version adds 2097151 next() calls.
Suspecting that that, not the addition of 'yield from' was responsible
for most of the extra time, I added

def trivial():
for i in range(2097151):
yield
raise StopIteration(2097151)
...
t0 = time.time()
try:
g = trivial()
while True:
next(g)
except StopIteration as err:
k = err.value
t1 = time.time()
print('triv.', k, t1-t0)

The result supports the hypothesis.

plain 2097151 0.4590260982513428
coro. 2097151 0.9180529117584229
triv. 2097151 0.39902305603027344

I don't know what to make of this in the context of asynch operations,
but in 'traditional' use, the generator would not replace a function
returning a single number but one returning a list (of, in this case,
2097151 numbers), so each next replaces a .append method call.

--
Terry Jan Reedy

Nick Coghlan

unread,

Oct 19, 2012, 9:56:45 PM10/19/12

to Greg Ewing, python...@python.org

On Sat, Oct 20, 2012 at 10:50 AM, Greg Ewing
<greg....@canterbury.ac.nz> wrote:
> Christian Tismer wrote:
>>
>> Actually I would like to have a python context where it gets into
>> "async mode" and interprets all functions defined in that mode as
>> generators.
>
>
> That sounds somewhat similar to another idea I proposed a while
> ago:
>
> There would be a special kind of function called a "cofunction",
> that you define using "codef" instead of "def". A cofunction
> is essentially a generator, but with a special property: when
> one cofunction calls another, the call is implicitly made as
> a "yield from" call.
>
> This scheme wouldn't be completely transparent, since the
> cofunctions have to be defined in a special way. But the calls
> would look like ordinary calls.

Please don't lose sight of the fact that yield-based suspension points
looking like something other than an ordinary function call is a
*feature*, not a bug.

The idea is that the flow control, especially the fact that "other
code may run here, so the world may have changed before we get to the
next expression", is visible *locally* in each function, rather than
relying on global knowledge of which calls may lead to a task switch.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Steve Dower

unread,

Oct 19, 2012, 10:41:52 PM10/19/12

to python...@python.org

I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active.

I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented.

I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys".

The write-up is here: http://stevedower.id.au/blog/async-api-for-python/

I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at http://stevedower.id.au/blog/async-api-for-python/#thecode or http://stevedower.id.au/downloads/PythonAsync.zip (I based my example on Greg's socket spam, so thanks for that!)

And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth.

Let the tearing apart of my approach commence! :)

Cheers,
Steve

Greg Ewing

unread,

Oct 20, 2012, 1:37:54 AM10/20/12

to python...@python.org

Nick Coghlan wrote:

> Please don't lose sight of the fact that yield-based suspension points
> looking like something other than an ordinary function call is a
> *feature*, not a bug.

People keep asserting that, but I don't think we have enough
experience with the explicit-switching-point-markers-all-the-
way-up style of coding to tell whether it's a good idea or not.

My gut feeling is that the explicit markers will help at the
lowest levels, where you're trying to protect a critical section,
but at the upper levels they will just be noise that causes
unnecessary worry.

In one of Guido's earlier posts (which I can't find now,
unfortunately), he said something that made it sound like
he was coming around to that point of view too, but he
seems to have gone back on that recently.

--
Greg

Christian Tismer

unread,

Oct 20, 2012, 7:52:19 AM10/20/12

to Greg Ewing, python...@python.org

Hi Greg,

On 20.10.12 02:50, Greg Ewing wrote:
> Christian Tismer wrote:
>> Actually I would like to have a python context where it gets into
>> "async mode" and interprets all functions defined in that mode as
>> generators.
>
> That sounds somewhat similar to another idea I proposed a while
> ago:
>
> There would be a special kind of function called a "cofunction",
> that you define using "codef" instead of "def". A cofunction
> is essentially a generator, but with a special property: when
> one cofunction calls another, the call is implicitly made as
> a "yield from" call.
>

Whow, I had a look at the patch. Without talking about the syntax,
this is very close to what I'm trying without a patch.
No, it is almost identical.

> This scheme wouldn't be completely transparent, since the
> cofunctions have to be defined in a special way. But the calls
> would look like ordinary calls.
>
> There's a PEP describing a variation on the idea here:
>
> http://www.python.org/dev/peps/pep-3152/
>
> In that version, calls to cofunctions are specially marked
> using a "cocall" keyword. But since writing that, I've come to
> believe that my original idea (where the cocalls are implicit)
> was better.

Yes, without the keyword it looks better. Would you raise an
exception if something is called that is not a cofunction? Or
would that be an ordinary call?

The only difference is that I'm not aiming at coroutines in
the first place, but just having the concept of a *suspendable*
function.

What has happened to the PEP, was it rejected?

ciao - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

unread,

Oct 20, 2012, 8:16:46 AM10/20/12

to Christian Tismer, python...@python.org

On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer <tis...@stackless.com> wrote:
> What has happened to the PEP, was it rejected?

No, it's still open. We just wanted to give the yield from PEP a
chance to see some use on its own before we started trying to take it
any further, and Greg was amenable to that approach.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Christian Tismer

unread,

Oct 20, 2012, 8:54:00 AM10/20/12

to Greg Ewing, python...@python.org

I just saw that it is in flux and did not please you as well.

A rough idea would be to start the whole interpreter in suspendable
mode. Maybe that's too much. I'm seeking a way to tell a whole bunch
of functions that they should be suspendable.
What if we had a flag (unclear how) that function calls should behave
like cofunctions now. That flag would be initiated by a root call
and then propagated to the callees, without any syntax change?

In any case it should be possible to inquire/assert that a function
is running as cofunction.

Guido van Rossum

unread,

Oct 20, 2012, 12:30:16 PM10/20/12

to Greg Ewing, python...@python.org

On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing
<greg....@canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>
>> Please don't lose sight of the fact that yield-based suspension points
>> looking like something other than an ordinary function call is a
>> *feature*, not a bug.

(Ironically, Christian just revived an old thread where Nick was of a
different opinion.)

> People keep asserting that, but I don't think we have enough
> experience with the explicit-switching-point-markers-all-the-
> way-up style of coding to tell whether it's a good idea or not.

Hm. I would say I have a lot of real world experience with this style:
App Engine's NDB. It's in use by 1000s of people. I've written a lot
of complicated database code in it (integrating several layers of
caching). This style really does hold up well.

Now, I think that if it could yield from, NDB would be even better,
but for most code it would be a very minimal change, and the issue
here (the requirement to mark suspension points) is the same.

In C# they also have a lot of experience with this style -- functions
declared as async must be waited for using await, and the type checker
enforces that it's await all the way up (I think a function using
await must be declared as async, too).

> My gut feeling is that the explicit markers will help at the
> lowest levels, where you're trying to protect a critical section,
> but at the upper levels they will just be noise that causes
> unnecessary worry.

Actually, an earlier experience (like 25 years earier :-) suggests
that it's at the higher levels where you get in trouble without the
markers -- because you still have critical sections in end user code,
but it's impossible to remember which functions you call may cause a
task switch.

> In one of Guido's earlier posts (which I can't find now,
> unfortunately), he said something that made it sound like
> he was coming around to that point of view too, but he
> seems to have gone back on that recently.

I was probably more waxing philosophically on the reasons why people
like greenlets/gevent (if they like it). I feel I am pretty
consistently in favor of marking switch points, at least in the
context we are currently discussing (where high-speed async event
handling is the thing to do).

For less-performant situations I'm fine with writing classic
synchronous-looking code, and running it in multiple OS threads for
concurrency reasons. But the purpose of designing a new async API is
to break the barrier of one thread per connection.

--
--Guido van Rossum (python.org/~guido)

Christian Tismer

unread,

Oct 20, 2012, 1:48:33 PM10/20/12

to Guido van Rossum, python...@python.org

It is of course a bit confusing to find out who thought what and when ;-)

And yes, I see your point, but I have difficulties to see how it is done
best. If I take Stackless as an example, well, there would everything
potentially be marked as some codef, because it is simply everywhere
enabled.

But just for the fact that something _supports_ suspension or switching
is IMHO not enough reason to clutter the code everywhere.
What I think is need is a way to distinguish critical code paths.
Not sure how this should be.

Maybe it's my limited understanding.
The generator-based functions do not get switched from alone.
If they want to do that, they call some special function, and I would
mark them for doing that.
But all the tree up? I can't see the benefit so much.
Maybe it would be less verbose to have decorators that assert something
does _not_ switch, like guards? Or maybe add properties?

I agree that one needs certain information about the program that
can easily be extracted. Perhaps this could be done with an analysing
tool. This tool would only need additional hints if things are very dynamic,
like variables holding certain constructs which are known at runtime
only.

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Guido van Rossum

unread,

Oct 20, 2012, 2:12:37 PM10/20/12

to Steve Dower, python...@python.org

On Fri, Oct 19, 2012 at 7:41 PM, Steve Dower <Steve...@microsoft.com> wrote:
> I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active.
>
> I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented.
>
> I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys".
>
> The write-up is here: http://stevedower.id.au/blog/async-api-for-python/
>
> I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at http://stevedower.id.au/blog/async-api-for-python/#thecode or http://stevedower.id.au/downloads/PythonAsync.zip (I based my example on Greg's socket spam, so thanks for that!)
>
> And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth.
>
> Let the tearing apart of my approach commence! :)

Couple of questions and comments.

- You mention a query interface a few times but there are no details
in your example code; can you elaborate? (Or was that a typo for
queue?)

- This is almost completely isomorphic with NDB's tasklets, except
that you borrow the Future class implementation from
concurrent.futures -- I think that's the wrong building block to start
with, because it is linked too closely to threads.

- There is a big speed difference between yield from <generator> and
yield <future>. With yield <future>, the scheduler has to do
significant work for each yield at an intermediate level, whereas with
yield from, the schedule is only involved when actual blocking needs
to be performed. In my experience, real code has lots of intermediate
levels. Therefore I would like to use yield from. You can already do
most things with yield from that you can do with Futures; there are a
few operations that need a helper (in particular spawning truly
concurrent tasks), but the helper code can be much simpler than the
Future object, and isn't needed as often, so it's still a bare win.

- Nit: I don't like calling the event loop context; there are too many
things called context (e.g. context managers in Python), so I prefer
to call it what it is -- event loop or I/O loop.

- Nittier nit: you have a few stray colons, e.g. "it = iter(fn(*args,
**kwargs)):" .

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 20, 2012, 2:29:09 PM10/20/12

to Christian Tismer, python...@python.org

Maybe it would help if I was more straightforward.

I do not want to solve this problem by introducing yet another
language mechanism. This rules out codef, greenlets, and Stackless.

I want to solve it using what we have in Python 3.3. And I want to
solve it for all platforms where Python 3.3 runs and for all Python
3.3 implementations (assuming Jython, IronPython and PyPy will
eventually get there).

Basically this leaves as options OS threads, callbacks (+ Deferred),
or yield [from].

Using OS threads the problem is solved without writing any code, but
does not scale, so it does not really solve the problem.

To scale we need either callbacks or yield [from], or both.

I accept that some people prefer to use callbacks and Deferred. I want
to respect this choice and I want to see integration with this style
at the event loop level.

But for myself, I know that I want to write *most* code without
callbacks (or Deferreds), and I am quite comfortable to use yield or
yield from instead. (I have written a lot of code using yield
<future>, and I am now practicing yield from <generator> -- the
transition is quite easy and I like what I see.)

If you are not happy with what we can do in (portable) Python 3.3, we
are not talking about solving the same problem.

If you are happy using OS threads, we are also not talking about
solving the same problem. (To be sure, there is a place for them in my
solution -- but it is only needed for things we cannot run
asynchronously, such as socket.getaddrinfo().)

If you are not happy using callbacks/Deferred nor using yield[from],
you're welcome to use greenlets or Stackless. But they will not be in
the standard library.

--
--Guido van Rossum (python.org/~guido)

Jasper St. Pierre

unread,

Oct 20, 2012, 3:25:44 PM10/20/12

to Guido van Rossum, python...@python.org

I'm curious now... you keep mentioning Futures and Deferreds like
they're two separate entities. What distinction between the two do you
see?

--
Jasper

Steve Dower

unread,

Oct 20, 2012, 3:31:12 PM10/20/12

to Guido van Rossum, python...@python.org

> - Nit: I don't like calling the event loop context; there are too many
> things called context (e.g. context managers in Python), so I prefer
> to call it what it is -- event loop or I/O loop.

The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.

> - You mention a query interface a few times but there are no details
> in your example code; can you elaborate? (Or was that a typo for
> queue?)

I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type. The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)

> - This is almost completely isomorphic with NDB's tasklets, except
> that you borrow the Future class implementation from
> concurrent.futures -- I think that's the wrong building block to start
> with, because it is linked too closely to threads.

As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.

> - There is a big speed difference between yield from <generator> and
> yield <future>. With yield <future>, the scheduler has to do
> significant work for each yield at an intermediate level, whereas with
> yield from, the schedule is only involved when actual blocking needs
> to be performed. In my experience, real code has lots of intermediate
> levels. Therefore I would like to use yield from. You can already do
> most things with yield from that you can do with Futures; there are a
> few operations that need a helper (in particular spawning truly
> concurrent tasks), but the helper code can be much simpler than the
> Future object, and isn't needed as often, so it's still a bare win.

I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created. The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.

yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)

Christian Tismer

unread,

Oct 20, 2012, 4:06:26 PM10/20/12

to Nick Coghlan, python...@python.org

Thanks!
Then I have another short one...

Sent from my Ei4Steve

On Oct 20, 2012, at 14:16, Nick Coghlan <ncog...@gmail.com> wrote:

> On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer <tis...@stackless.com> wrote:
>> What has happened to the PEP, was it rejected?
>
> No, it's still open. We just wanted to give the yield from PEP a
> chance to see some use on its own before we started trying to take it
> any further, and Greg was amenable to that approach.

I often see the phrase "coroutine" but
without explicit mention if symmetric
(greenlet) or asymmetric (Lua).

Then when I hear myself quibbling
about "full generators" then I mean
generators that are made up of a few
functions that all can yield to the
caller of this generator.

Question: is "full generators" equivalent
to "asymmetric coroutine"?

Question 2: when people talk about
coroutines, is "asymmetric" the default
here?

The reason that I ask is my fear to
create problems when answering to
messages with different default meanings
in mind.

Thanks - chris

Guido van Rossum

unread,

Oct 20, 2012, 6:38:52 PM10/20/12

to Jasper St. Pierre, python...@python.org

On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre
<jstp...@mecheye.net> wrote:
> I'm curious now... you keep mentioning Futures and Deferreds like
> they're two separate entities. What distinction between the two do you
> see?

They have different interfaces and you end up using them differently.
In particular, quoting myself from another thread, here is how I use
the terms:

- Future: something with roughly the interface but not necessarily the
implementation of PEP 3148.

- Deferred: the Twisted Deferred class or something with very similar
functionality (there are some in the JavaScript world).

The big difference between Futures and Deferreds is that Deferreds can
easily be chains together to create multiple stages, and each callback
is called with the value returned from the previous stage; also,
Deferreds have separate callback chains for regular values and errors.

Guido van Rossum

unread,

Oct 20, 2012, 6:39:59 PM10/20/12

to Christian Tismer, python...@python.org

On Sat, Oct 20, 2012 at 1:06 PM, Christian Tismer <tis...@stackless.com> wrote:
> I often see the phrase "coroutine" but
> without explicit mention if symmetric
> (greenlet) or asymmetric (Lua).
>
> Then when I hear myself quibbling
> about "full generators" then I mean
> generators that are made up of a few
> functions that all can yield to the
> caller of this generator.
>
> Question: is "full generators" equivalent
> to "asymmetric coroutine"?
>
> Question 2: when people talk about
> coroutines, is "asymmetric" the default
> here?
>
> The reason that I ask is my fear to
> create problems when answering to
> messages with different default meanings
> in mind.

I believe I just answered this is another thread (that you also started).

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 20, 2012, 7:11:15 PM10/20/12

to Steve Dower, python...@python.org

On Sat, Oct 20, 2012 at 12:31 PM, Steve Dower <Steve...@microsoft.com> wrote:
>> - Nit: I don't like calling the event loop context; there are too many
>> things called context (e.g. context managers in Python), so I prefer
>> to call it what it is -- event loop or I/O loop.
>
> The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.

I'm happy to settle on event loop. (Terminology in this area seems
fraught with conflicting conventions; Twisted calls it a reactor,
after the reactor pattern, but I've been chided by others for using
this term without explanation; Tornado calls it I/O loop.)

>> - You mention a query interface a few times but there are no details
>> in your example code; can you elaborate? (Or was that a typo for
>> queue?)
>
> I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type.

Then what is the use? What *is* its contract?

> The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)

I'm all for declaring select() an implementation detail. It doesn't
scale on any platform; on Windows it only works for sockets; the
properly scaling alternative varies per platform. (It is IOCP on
Windows, right?)

This *probably* also means that the concept of file descriptor is out
the window (even though Tornado apparently cannot do anything without
it -- it's probably not used on Windows at all). And I suspect that it
means that the implementation of the socket abstraction will vary per
platform. The collection of other implementations of the same
abstraction available, and even available other abstractions, will
also vary per platform -- on Unix, there are pseudo ttys, pipes, named
pipes, and unix domain sockets; I don't recall the set available on
Windows, but I'm sure it is different. Then there is SSL/TLS, which
feels like it requires special handling but in the end implements an
abstraction similar to sockets.

I assume that in many cases it is easy to bridge from the various
platform-specific abstractions and implementation to more
cross-platform abstractions; this is where the notions of transports
and protocols seem most important. I haven't explored those enough,
sadly.

One note inspired by my mention of SSL, but also by discussions about
GUI event loops in other threads: it is easy to think that everything
is reducible to a file descriptor, but often it is not that easy. E.g.
with something like SSL, you can't just select on the underlying
socket, and then when it's ready call the read() method of the SSL
layer -- it's possible that the read() will still block because the
socket didn't have enough bytes to be able to decrypt the next block
of data. Similar for sockets associated with e.g. GUI event management
(e.g. X).

>> - This is almost completely isomorphic with NDB's tasklets, except
>> that you borrow the Future class implementation from
>> concurrent.futures -- I think that's the wrong building block to start
>> with, because it is linked too closely to threads.
>
> As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.

I wish it was true. But the Future class contains a condition
variable, and the Waiter class used by the implementation uses an
event. Both are directly imported from the threading module, and if
you block on either of these, it is a hard block (not even
interruptable by a signal).

Don't worry too much about this -- it's just the particular
implementation (concurrent.futures.Future). We can define a better
Future class for our purposes elsewhere, with the same interface (or a
subset -- I don't care much for the whole cancellation feature) but
without references to threading. For those Futures, we'll have to
decide what should happen if you call result() when the Future isn't
done yet -- raise an error (similar to EWOULDBLOCK), or somehow block,
possibly running a recursive event loop? (That's what NDB does, but
not everybody likes it.)

I think the normal approach would be to ask the scheduler to suspend
the current task until the Future is ready -- it can easily arrange
for that by adding a callback. In NDB this is spelled "yield
<future>". In the yield-from <generator> world we could spell it that
way too (i.e. yield, not yield from), or we could make it so that we
can write yield from <future>, or perhaps we need a helper call: yield
from wait(<future>) or maybe a method on the Future class (since it is
our own), yield from <future>.wait(). These are API design details.

(I also have a need to block for the Futures returned by
ThreadPoolExecutor and ProcessPoolExecutor -- those are handy when you
really can't run something inline in the event loop -- the simplest
example being getaddrinfo(), which may block for DNS.)

>> - There is a big speed difference between yield from <generator> and
>> yield <future>. With yield <future>, the scheduler has to do
>> significant work for each yield at an intermediate level, whereas with
>> yield from, the schedule is only involved when actual blocking needs
>> to be performed. In my experience, real code has lots of intermediate
>> levels. Therefore I would like to use yield from. You can already do
>> most things with yield from that you can do with Futures; there are a
>> few operations that need a helper (in particular spawning truly
>> concurrent tasks), but the helper code can be much simpler than the
>> Future object, and isn't needed as often, so it's still a bare win.
>
> I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created.

IIUC every yield must pass a Future, and every time that happens the
scheduler gets it and must arrange for a callback on that Future which
resumes the generator. I have code like that in NDB and you have very
similar code like that in your version (wrapper in @async, and later
_Awaiter._step()).

> The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.

Interesting that you always run the first step immediately. I don't do
this in NDB. Can you explain why you think you need it? (It may simply
be an optimization I've overlooked. :-)

> yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)

Yeah, this is what I do in NDB, as I mentioned above (the recursive
event loop call).

But I suspect it would be very easy to write a helper function that
you give a generator and which runs it to completion. It would also
have to invoke the event loop, but that seems unavoidable, and
otherwise the event loop isn't running in interactive mode, right?
(Unless it runs in a separate thread, in which case the helper
function should just communicate with that thread.)

Final remark: I keep wondering if it's better to try and stay "pure"
in the public API and use only yield from, plus some helpers like
spawn(), join() and par(), or if a decent, pragmatic public API can
offer a combination. I worry that most users will have a hard time
remembering when to use yield and when yield from.

--
--Guido van Rossum (python.org/~guido)

Greg Ewing

unread,

Oct 20, 2012, 7:52:40 PM10/20/12

to python...@python.org

Christian Tismer wrote:
> Would you raise an
> exception if something is called that is not a cofunction? Or
> would that be an ordinary call?

A cofunction calling a non-cofunction is fine, it just makes
an ordinary call.

But if a non-cofunction tries to call a cofunction using an
ordinary call, an exception raised. Effectively, cofunctions
do *not* implement __call__ (instead they implement a new
protocol __cocall__).

> The only difference is that I'm not aiming at coroutines in
> the first place, but just having the concept of a *suspendable*
> function.

I'm not sure what the distinction is.

> What has happened to the PEP, was it rejected?

No, its status is still listed as "draft". It's probably too
soon to consider whether it should be accepted or rejected;
we need more experience with yield-from based task systems
first.

--
Greg

Greg Ewing

unread,

Oct 20, 2012, 8:09:04 PM10/20/12

to python...@python.org

Christian Tismer wrote:

> A rough idea would be to start the whole interpreter in suspendable
> mode. Maybe that's too much. I'm seeking a way to tell a whole bunch
> of functions that they should be suspendable.

I'm not sure it's really feasible to do that. It seems easy
enough at first sight, but keep in mind that it would only work
for pure Python code called directly from other pure Python
code. There are many corners where it wouldn't work -- all the
operator methods, for example, and anything else called through
a type slot -- unless you went to a *lot* of work to provide
alternative suspendable versions of all the type slots.

--
Greg

Nick Coghlan

unread,

Oct 20, 2012, 10:18:31 PM10/20/12

to Guido van Rossum, python...@python.org

On Sun, Oct 21, 2012 at 2:30 AM, Guido van Rossum <gu...@python.org> wrote:
> On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing
> <greg....@canterbury.ac.nz> wrote:
>> Nick Coghlan wrote:
>>
>>> Please don't lose sight of the fact that yield-based suspension points
>>> looking like something other than an ordinary function call is a
>>> *feature*, not a bug.
>
> (Ironically, Christian just revived an old thread where Nick was of a
> different opinion.)

I like greenlets too, just for the ease of converting the scaling
constraints of existing concurrent code from
number-of-threads-per-process to number-of-open-sockets-per-process.

I've come to the conclusion that they're no substitute for explicitly
asynchronous code, though, and the assembler magic needed to make them
work with arbitrary C code (either in the language core or in C
extensions) makes them a poor fit for the standard library.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Christian Tismer

unread,

Oct 21, 2012, 12:40:51 AM10/21/12

to Greg Ewing, python...@python.org

On 21.10.12 01:52, Greg Ewing wrote:
> Christian Tismer wrote:

> ...

>> The only difference is that I'm not aiming at coroutines in
>> the first place, but just having the concept of a *suspendable*
>> function.
>
> I'm not sure what the distinction is.

This comes maybe from my use of 'coroutine', 'tasklet', 'generator'
etc. which differs from the meaning where others are thinking of.
I'm mostly talking in the PyPy and Stackless community, which
creates confusion.
In that world, 'generator' for instance means a whole bunch of
functions that can play together and yield to the caller of _the_
generator. The same holds for coroutines in that world.

In python-world, things seem to be more often made of single functions.
Switching context to that:

'coroutine' implies to think about coroutines, the intended action.
'suspendable' instead is neutral without any a-priori intent to
switch or something. It just tells the ability that it can be suspended.
That sounds more like a property.

The 'suspendable' is meant as a building block for higher-level
things, which include for instance coroutines (in any flavor).

Technically the same, when we're talking about one single function
that implements it.

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

unread,

Oct 21, 2012, 1:19:46 AM10/21/12

to python...@python.org

Guido van Rossum wrote:
> In the yield-from <generator> world we could spell it that
> way too (i.e. yield, not yield from), or we could make it so that we
> can write yield from <future>, or perhaps we need a helper call: yield
> from wait(<future>) or maybe a method on the Future class (since it is
> our own), yield from <future>.wait().

This will depend to some extent on whether Futures are considered
part of the tasks layer or part of the callbacks layer. If they're
considered part of the callbacks layer, they shouldn't have any
methods that must be called with yield-from.

> Final remark: I keep wondering if it's better to try and stay "pure"
> in the public API and use only yield from, plus some helpers like
> spawn(), join() and par(), or if a decent, pragmatic public API can
> offer a combination. I worry that most users will have a hard time
> remembering when to use yield and when yield from.

As I've said, I think it would be better to have only 'yield from'
calls in the public API, because it gives the implementation the
greatest freedom.

--
Greg

Steve Dower

unread,

Oct 21, 2012, 12:47:04 PM10/21/12

to Greg Ewing, python...@python.org

Greg Ewing wrote:
> This will depend to some extent on whether Futures are considered
> part of the tasks layer or part of the callbacks layer. If they're
> considered part of the callbacks layer, they shouldn't have any
> methods that must be called with yield-from.

I put Futures very firmly in the callbacks layer (I guess the easiest reasoning for this is the complete lack of threading/async code in their implementation). Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not.

> As I've said, I think it would be better to have only 'yield from'
> calls in the public API, because it gives the implementation the
> greatest freedom.

I agree with this, though I still feel that we should be aiming for only 'yield' in the public API and leaving 'yield from' as a generalisation of this. For example, the two following pieces of code are basically equivalent:

@async
def task1():
yield do_something_async_returning_a_future()

@async
def task2():
yield task1()
yield task1()

@async
def task3():
yield task2()

task3().result()

And doing the same thing with yield from:

def task1():
yield do_something_async_returning_a_future()

def task2():
yield from task1()
yield from task1()

@async
def task3():
yield from task2()

task3().result()

This is also equivalent to this code:

@async
def task3():
yield do_something_async_returning_a_future()
yield do_something_async_returning_a_future()

task3().result()

And this:

def task():
f = Future()
do_something_async_returning_a_future().add_done_callback(
lambda _: do_something_async_returning_a_future().add_done_callback(
lambda _: f.set_result(None)
)
)
return f

My point is that once we are using yield, yield from automatically becomes an option for composing operations. Teaching and validating this style is also easier, because the rule can be 'always use @async/yield in public APIs and just yield from in private APIs', and the biggest problem with not using yield from is that more Future objects are created. (The upsides were in my essay, but include compatibility with other Future-based APIs and composability between code from different sources.)

Cheers,
Steve

Guido van Rossum

unread,

Oct 21, 2012, 1:08:42 PM10/21/12

to Steve Dower, Python-Ideas

On Oct 21, 2012 9:48 AM, "Steve Dower" <Steve...@microsoft.com> wrote:
>
> Greg Ewing wrote:
> > This will depend to some extent on whether Futures are considered
> > part of the tasks layer or part of the callbacks layer. If they're
> > considered part of the callbacks layer, they shouldn't have any
> > methods that must be called with yield-from.
>
> I put Futures very firmly in the callbacks layer (I guess the easiest reasoning for this is the complete lack of threading/async code in their implementation).

Did you check the source? That's simply incorrect. It uses locks, of the threading variety.

( However one could write an implementation with the same interface that doesn't.)

> Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not.

This I like.

Hm. I think it'll be confusing. And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

--Guido van Rossum (sent from Android phone)

Steve Dower

unread,

Oct 21, 2012, 4:07:49 PM10/21/12

to Guido van Rossum, Python-Ideas

> Did you check the source? That's simply incorrect. It uses locks, of the threading variety.

Yes, I've spent a lot of time in the source for Future while working on this. It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

> ( However one could write an implementation with the same interface that doesn't.)

And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted. (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.)

(*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.)

> Hm. I think it'll be confusing.

I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

> And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Guido van Rossum

unread,

Oct 21, 2012, 8:23:52 PM10/21/12

to Steve Dower, Python-Ideas

On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve...@microsoft.com> wrote:
>> Did you check the source? That's simply incorrect. It uses locks, of the threading variety.
>
> Yes, I've spent a lot of time in the source for Future while working on this.

Sorry, I should have realized this, since your code example contained
monkey-patching that Future class...

> It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

I don't see it that way. Any time you acquire a lock, you may be
blocked for a long time. In a typical event loop that's an absolute
no-no. Typically, to wait for another thread, you give the other
thread a callback that adds a new event for *this* thread.

Now, it's possible that in Windows, when using IOCP, the philosophy is
different -- I think I've read in
http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that
there can be multiple threads reading events from a single queue.

But AFAIK, in Twisted and Tornado and similar systems, and probably
even in gevent and Stackless, there is a strong culture around having
only a single thread handling events (at least only one thread at a
time), since the assumption is that as long as you don't suspend, you
can trust that the world doesn't change, and that assumption becomes
invalid when other threads may also be handling events from the same
queue. It's possible to design a world where different threads have
their own event queues, and this assumption would only be valid for
events belonging to the same queue; however that seems complicated.
And you still don't want to ever attempt to acquire a *threading*
lock, because you end up blocking the entire event loop.

>> ( However one could write an implementation with the same interface that doesn't.)
>
> And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted.

Yes, here I think we are in (possibly violent :-) agreement.

> (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.)

Perhaps. Lots of possibilities in this design space.

> (*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.)

Agreed. I don't see much use for the cancellation stuff and all the
extra complexity that adds to the interface. BTW, I think
concurrent.futures.Future doesn't stop you from calling set_result()
or set_exception() more than once, which I think is a mistake -- I do
enforce that in NDB's Futures.

[Here you snipped some context. You proposed having public APIs that
use "yield <future>" and leaving "yield from <generator>" as something
the user can use in her own program. To which I replied:]

>> Hm. I think it'll be confusing.
>
> I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

Let's agree to disagree on this. I think they are both valid design
choices with different trade-offs. We should explore both directions
further so as to form a better opinion.

>> And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
>
> Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Surely "whatever you like" is constrained by whatever the Task type
defines. Maybe it looks like a Future and has a blocking method to
wait for the result, like .result() on concurrent.futures.Future? If
you want that functionality for generators you just have to call some
function, passing it the generator as an argument. Remember, Python
doesn't consider that an inferior choice of API design compared to
making something a method of the object itself -- witness len(),
repr() and many others.

FWIW, if I may sound antagonistic, I actually think that we're mostly
in violent agreement, and I think we're getting closer to coming up
with a sensible set of requirements and possibly even an API proposal.
Keep it coming!

--
--Guido van Rossum (python.org/~guido)

Eric V. Smith

unread,

Oct 21, 2012, 9:18:35 PM10/21/12

to python...@python.org

On 10/21/2012 8:23 PM, Guido van Rossum wrote:
> I don't see it that way. Any time you acquire a lock, you may be
> blocked for a long time. In a typical event loop that's an absolute
> no-no. Typically, to wait for another thread, you give the other
> thread a callback that adds a new event for *this* thread.
>
> Now, it's possible that in Windows, when using IOCP, the philosophy is
> different -- I think I've read in
> http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that
> there can be multiple threads reading events from a single queue.

Correct. The typical usage of an IOCP is that you create as many threads
as you have CPUs (or cores, or execution units, or whatever the kids
call them these days), then they can all wait on the same IOCP. So if
you have, say 4 CPUs so 4 threads, they can all be woken up to do useful
work if the IOCP has work items for them.

--
Eric.

Guido van Rossum

unread,

Oct 22, 2012, 12:10:57 AM10/22/12

to Eric V. Smith, python...@python.org

On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith <er...@trueblade.com> wrote:
> On 10/21/2012 8:23 PM, Guido van Rossum wrote:
>> I don't see it that way. Any time you acquire a lock, you may be
>> blocked for a long time. In a typical event loop that's an absolute
>> no-no. Typically, to wait for another thread, you give the other
>> thread a callback that adds a new event for *this* thread.
>>
>> Now, it's possible that in Windows, when using IOCP, the philosophy is
>> different -- I think I've read in
>> http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that
>> there can be multiple threads reading events from a single queue.
>
> Correct. The typical usage of an IOCP is that you create as many threads
> as you have CPUs (or cores, or execution units, or whatever the kids
> call them these days), then they can all wait on the same IOCP. So if
> you have, say 4 CPUs so 4 threads, they can all be woken up to do useful
> work if the IOCP has work items for them.

So what's the typical way to do locking in such a system? Waiting for
a lock seems bad; and you can't assume that no other callbacks may run
while you are running. What synchronization primitives are typically
used?

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 22, 2012, 1:30:24 AM10/22/12

to Guido van Rossum, Python-Ideas

(Sorry about cutting context, I'll try not to do that again, but I also try to avoid reposting an entire email.)

> > It has synchronisation which is _aware_ of threads, but it never
> > creates, requires or uses them. It simply ensures thread-safe
> > reentrancy, which will be required for any general solution unless
> > it is completely banned from interacting across CPU threads.
>
> I don't see it that way. Any time you acquire a lock, you may be
> blocked for a long time. In a typical event loop that's an absolute
> no-no. Typically, to wait for another thread, you give the other
> thread a callback that adds a new event for *this* thread.

Agreed, but when you're waiting for another thread to stop reading its queue so you can add to it, how are you supposed to queue an event while you wait?

The lock in Future is only an issue in result() where we wait for another thread to complete the event, but that is the entire point of that function. FWIW I don't see any scheduler ever calling result(), but there are valid situations for a user to call it (REPL, already on a worker thread, unit tests). Everywhere else the lock is required for thread safety. It could be a different lock to the one in result, but I don't think anything is gained from that. Rewriting Future in C and using CPU CAS primitives might be possible, but probably only of limited value.

> Now, it's possible that in Windows, when using IOCP, the philosophy is
> different -- I think I've read in
> http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that
> there can be multiple threads reading events from a single queue.
> But AFAIK, in Twisted and Tornado and similar systems, and probably
> even in gevent and Stackless, there is a strong culture around having
> only a single thread handling events (at least only one thread at a
> time), since the assumption is that as long as you don't suspend, you
> can trust that the world doesn't change, and that assumption becomes
> invalid when other threads may also be handling events from the same
> queue.

This is true, and my understanding is that IOCP is basically just a thread pool, and the 'single queue' means that all the threads are waiting on all the events and you can't guarantee which thread will get which. This is better than creating a new thread for each file, but I think that's all it is meant to be. We can easily write a single thread that can wait on all I/O, scheduling callbacks on the main thread, if necessary. I'm pretty sure that all platforms have better ways to do this though, but because they're all different it will need different implementations.

> It's possible to design a world where different threads have
> their own event queues, and this assumption would only be valid for
> events belonging to the same queue; however that seems complicated.
> And you still don't want to ever attempt to acquire a *threading*
> lock, because you end up blocking the entire event loop.

Multiple threads with independent queues should be okay, though definitely an advanced scenario. I'm sure this would be preferable to having multiple processes with one thread/queue each in some cases. In any case, this is easy enough to implement with TLS.

> > (*I'm inclined to define [the required Future interface] as 'result()', 'done()',

> > 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()'
> > functions. Maybe more, but I think that's sufficient. The current '_waiters'
> > list is an optimisation for add_done_callback(), and doesn't need to be part
> > of the interface.)
>
> Agreed. I don't see much use for the cancellation stuff and all the
> extra complexity that adds to the interface. BTW, I think
> concurrent.futures.Future doesn't stop you from calling set_result()
> or set_exception() more than once, which I think is a mistake -- I do
> enforce that in NDB's Futures.

I agree, there should be no way to set the result or exception more than once.

On cancellation, while there is some complexity involved I do think we can make use of 'cancel' and 'cancelled' functions to pass a signal back into the worker:

op = do_something_async() # not yielded
button.on_click += lambda: op.cancel()
try:
result = yield op
except CancelledError:
return False

def do_something_async():
f = Future()

def threadproc():
total = 0
for i in range(10000):
if f.cancelled(): raise CancelledError
total += i
f.set_result(total)

Thread(target=threadproc).run()
return f

I certainly would not want to see the CancelledError be raised automatically - this is no thread.abort() call - but it may be convenient to have an interface for "self._cancelled = True" and "return self._cancelled" that at least saves people from coming up with their own way of passing it in. The worker may completely ignore it, or complete anyway, but for long running operations it may be very handy.

(I'll stop before I start thinking about partial results... :) )

> [Here you snipped some context. You proposed having public APIs that
> use "yield <future>" and leaving "yield from <generator>" as something
> the user can use in her own program. To which I replied:]
>
> >> Hm. I think it'll be confusing.
> >
> > I think the basic case ("just make it work") will be simpler, and the advanced
> > case ("minimise memory/CPU usage") will be more complicated.
>
> Let's agree to disagree on this. I think they are both valid design
> choices with different trade-offs. We should explore both directions
> further so as to form a better opinion.

Probably we need some working implementations to code against.

> > > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
> >
> > Personally, I'd prefer developers to get a correct solution without having to
> > understand how the whole thing works (the "pit of success"). I'm also sceptical
> > of any other rule being as portable and composable - I don't think a standard
> > library should have APIs where "you must only call this function with yield-from".
> > ('await' in C# is not compulsory - you can take the Task returned from an async
> > method and do whatever you like with it.)
>
> Surely "whatever you like" is constrained by whatever the Task type
> defines. Maybe it looks like a Future and has a blocking method to
> wait for the result, like .result() on concurrent.futures.Future? If
> you want that functionality for generators you just have to call some
> function, passing it the generator as an argument. Remember, Python
> doesn't consider that an inferior choice of API design compared to
> making something a method of the object itself -- witness len(),
> repr() and many others.

I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value.

(And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.)

> FWIW, if I may sound antagonistic, I actually think that we're mostly
> in violent agreement, and I think we're getting closer to coming up
> with a sensible set of requirements and possibly even an API proposal.
> Keep it coming!

I do my best work when someone is arguing with me :)

Cheers,
Steve

Guido van Rossum

unread,

Oct 22, 2012, 10:59:56 AM10/22/12

to Steve Dower, Python-Ideas

On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve...@microsoft.com> wrote:
[Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event
loop, where you can rely on the properties of cooperative scheduling
to avoid ever touching (OS) threading locks. I think such a system
should be "pure" and all interaction with threads should be mediated
by the event loop. (It's okay if this means that the implementation of
the event loop must at some point acquire a threading lock.) The
Futures used by the tasks to coordinate amongst themselves should not
require locking -- they should themselves be able to rely on the
guarantees of the event loop not to invoke multiple callbacks in
parallel.

IIUC you can do this on Windows with IOCP too, simply by only having a
single thread reading events.

>> > > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
>> >
>> > Personally, I'd prefer developers to get a correct solution without having to
>> > understand how the whole thing works (the "pit of success"). I'm also sceptical
>> > of any other rule being as portable and composable - I don't think a standard
>> > library should have APIs where "you must only call this function with yield-from".
>> > ('await' in C# is not compulsory - you can take the Task returned from an async
>> > method and do whatever you like with it.)
>>
>> Surely "whatever you like" is constrained by whatever the Task type
>> defines. Maybe it looks like a Future and has a blocking method to
>> wait for the result, like .result() on concurrent.futures.Future? If
>> you want that functionality for generators you just have to call some
>> function, passing it the generator as an argument. Remember, Python
>> doesn't consider that an inferior choice of API design compared to
>> making something a method of the object itself -- witness len(),
>> repr() and many others.
>
> I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value.

Sorry, probably just got distracted (I was reading on three different
devices while on a family outing :-).

But my answer is short: to me, the PEP 380 style is perfectly portable
and composable. If you think it isn't, please elaborate.

> (And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.)

Same here.

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 22, 2012, 11:55:27 AM10/22/12

to Guido van Rossum, python...@python.org

> Personally, I'm interested in designing a system, including an event loop,
> where you can rely on the properties of cooperative scheduling to avoid
> ever touching (OS) threading locks. I think such a system should be "pure"
> and all interaction with threads should be mediated by the event loop.
> (It's okay if this means that the implementation of the event loop must at
> some point acquire a threading lock.) The Futures used by the tasks to
> coordinate amongst themselves should not require locking -- they should
> themselves be able to rely on the guarantees of the event loop not to
> invoke multiple callbacks in parallel.

Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

> IIUC you can do this on Windows with IOCP too, simply by only having a
> single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

[ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]
> To me, the PEP 380 style is perfectly portable and composable. If you think

> it isn't, please elaborate.

I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread). Here, the user still gets the benefits like:

def not_an_async_func():
ops = list(map(get_url_async, list_of_urls))
# all URLs are now downloading in parallel, let's do some other synchronous stuff
results = list(map(Future.result, ops))

Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

(And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

Cheers,
Steve

Eric V. Smith

unread,

Oct 22, 2012, 7:21:07 AM10/22/12

to Guido van Rossum, python...@python.org

On 10/22/2012 12:10 AM, Guido van Rossum wrote:
> On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith <er...@trueblade.com> wrote:
>> On 10/21/2012 8:23 PM, Guido van Rossum wrote:
>>> I don't see it that way. Any time you acquire a lock, you may be
>>> blocked for a long time. In a typical event loop that's an absolute
>>> no-no. Typically, to wait for another thread, you give the other
>>> thread a callback that adds a new event for *this* thread.
>>>
>>> Now, it's possible that in Windows, when using IOCP, the philosophy is
>>> different -- I think I've read in
>>> http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that
>>> there can be multiple threads reading events from a single queue.
>>
>> Correct. The typical usage of an IOCP is that you create as many threads
>> as you have CPUs (or cores, or execution units, or whatever the kids
>> call them these days), then they can all wait on the same IOCP. So if
>> you have, say 4 CPUs so 4 threads, they can all be woken up to do useful
>> work if the IOCP has work items for them.
>
> So what's the typical way to do locking in such a system? Waiting for
> a lock seems bad; and you can't assume that no other callbacks may run
> while you are running. What synchronization primitives are typically
> used?

When I've done it (admittedly 10 years ago) we just used critical
sections, since we weren't blocking for long (mostly memory management).
I'm not sure if that's a best practice or not. The IOCP will actually
let you block, then it will release another thread. So if you know
you're going to block, you should create more threads than you have CPUs.

Here's the relevant paragraph from the IOCP link you posted above:

"The system also allows a thread waiting in GetQueuedCompletionStatus to
process a completion packet if another running thread associated with
the same I/O completion port enters a wait state for other reasons, for
example the SuspendThread function. When the thread in the wait state
begins running again, there may be a brief period when the number of
active threads exceeds the concurrency value. However, the system
quickly reduces this number by not allowing any new active threads until
the number of active threads falls below the concurrency value. This is
one reason to have your application create more threads in its thread
pool than the concurrency value. Thread pool management is beyond the
scope of this topic, but a good rule of thumb is to have a minimum of
twice as many threads in the thread pool as there are processors on the
system. For additional information about thread pooling, see Thread Pools."

Jasper St. Pierre

unread,

Oct 22, 2012, 12:46:47 PM10/22/12

to Guido van Rossum, python...@python.org

On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum <gu...@python.org> wrote:
> On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre
> <jstp...@mecheye.net> wrote:
>> I'm curious now... you keep mentioning Futures and Deferreds like
>> they're two separate entities. What distinction between the two do you
>> see?
>
> They have different interfaces and you end up using them differently.

Who is "you" supposed to refer to?

> In particular, quoting myself from another thread, here is how I use
> the terms:
>
> - Future: something with roughly the interface but not necessarily the
> implementation of PEP 3148.
>
> - Deferred: the Twisted Deferred class or something with very similar
> functionality (there are some in the JavaScript world).
>
> The big difference between Futures and Deferreds is that Deferreds can
> easily be chains together to create multiple stages, and each callback
> is called with the value returned from the previous stage; also,
> Deferreds have separate callback chains for regular values and errors.

Chaining is an add-on to the system and not necessarily required.
Dojo's Deferreds, modelled directly after Twisted's, don't have direct
chaining with multiple callbacks per Deferred, but instead addCallback
returns a new Deferred, which it may pass on to. This means that each
Deferred has one result, and chaining is done slightly differently.

The whole point of chaining is just convenience of mutating a value
before it's passed to the caller. It's possible to live without it.
Compare:

from async_http_client import fetch_page
from some_xml_library import parse_xml

def fetch_xml(url):
d = fetch_page(url)
d.add_callback(parse_xml)
return d

with:

def fetch_xml(url):
def parse_page(result):
d.callback(parse_xml(result))

d = Deferred()
page = fetch_page(url)
page.add_callback(parse_page)
return d

The two functions, treated as a black box, are equivalent. The
distinction is convenience.

> --
> --Guido van Rossum (python.org/~guido)

--
Jasper

Guido van Rossum

unread,

Oct 22, 2012, 1:03:53 PM10/22/12

to Jasper St. Pierre, python...@python.org

On Mon, Oct 22, 2012 at 9:46 AM, Jasper St. Pierre

Jasper, I don't know you. You may be a wizard-levelTwisted user, or
maybe you once saw a Twisted tutorial. All I know is that when I
started this discussion I used the term Future thinking Deferreds were
just Futures, and then Twisted core developers started explaining me
that Deferreds are so much more than Futures (I think it may have been
Glyph himself, in one of his longer posts). So please go argue the
distinction or similarity with the Twisted core developers, not with
me.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 22, 2012, 1:34:38 PM10/22/12

to Steve Dower, python...@python.org

On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve...@microsoft.com> wrote:
>> Personally, I'm interested in designing a system, including an event loop,
>> where you can rely on the properties of cooperative scheduling to avoid
>> ever touching (OS) threading locks. I think such a system should be "pure"
>> and all interaction with threads should be mediated by the event loop.
>> (It's okay if this means that the implementation of the event loop must at
>> some point acquire a threading lock.) The Futures used by the tasks to
>> coordinate amongst themselves should not require locking -- they should
>> themselves be able to rely on the guarantees of the event loop not to
>> invoke multiple callbacks in parallel.
>
> Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

I'm curious what the Twisted folks have to say about this. Or the
folks using gevent.

I think your world view is colored by Windows; that's fine, we need
input from experienced Windows users. But I can certainly imagine
other ways of dealing with this.

For example, in CPython, at least, a callback that is called directly
by the OS cannot call straight into Python anyway -- you have to
acquire the GIL first. This pretty much means that an unconstrained
callback directly from the OS cannot call straight into Python -- it
has to put something into a queue, and the bytecode interpreter will
eventuall call it (possibly in another thread). This is how signal
handlers are invoked too.

> That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code

If interrupt-scheduled means what I think it means, this can only be C
code. For the Python callback, see above.

> queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

I don't actually believe we should avoid all locks. I do believe that
there should be a separate mechanism, likely OS-specific, whereby the
"pure" async world and the "messy" threading world can hand off data
to each other. It is probably unavoidable that the implementation of
this mechanism touches a threading lock. But this does not mean that
the rest of the "pure" world should need to use a Future class that
touches threading locks.

> (Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

This sounds like a potentially fatal clash in terminology. In the way
I use 'async', Twisted, Tornado and gevent certainly qualify, and all
those have huge parts of their API where there is no non-deterministic
switching in sight -- in fact, they all carefully fence off the part
that does interact with threads. For example, the Twisted folks have
argued that one of the big advantages of using Twisted's Deferred
class is that while a callback is running, the state of the world
remains constant (except for actions made by the callback itself,
obviously).

What other term should we use to encompass this world view (which IMO
is a perfectly valid abstraction for a lot of I/O-related
concurrency)?

>> IIUC you can do this on Windows with IOCP too, simply by only having a
>> single thread reading events.
>
> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

It does sound like this may be unique to Windows, or at least not
shared with most of the UNIX world (UNIX ports of IOCP
notwithstanding).

> [ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]
>> To me, the PEP 380 style is perfectly portable and composable. If you think
>> it isn't, please elaborate.
>
> I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

And that is by design -- Greg *wants* it to be that way, and so far I
haven't found a reason to disagree with him. It seems you just
fundamentally disagree with the design, but your arguments come from a
fundamentally different world view.

> By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread).

Note that in the YF world, there are also ways to stop the yield to
bubble all the way to the top. You simply call the generator function,
which gives you a generator object, and the scheduler module or class
can offer a variety of APIs to do things with it -- e.g. run it
without waiting for it (yet), run several of these in parallel until
one of them (or all of them) completes, etc.

> Here, the user still gets the benefits like:
>
> def not_an_async_func():
> ops = list(map(get_url_async, list_of_urls))
> # all URLs are now downloading in parallel, let's do some other synchronous stuff
> results = list(map(Future.result, ops))

And in the YF world you can do that too.

> Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

I don't see that. The user just has to be able to get a reference to
the schedule, which should be part of the scheduler's API (e.g. a
function in its module that returns the current scheduler instance).

> (And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

With YF it doesn't have to be the base of the stack. It just usually is.

I feel we are going around in circles.

> However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

I don't know what OpenCV is, but assuming it is something that doesn't
know about YF, then it needs to run in a thread of its own (or a
threadpool). It is perfectly possible to add a primitive operation to
the YF scheduler that says "run this in a threadpool and wake me up
when it produces a result". The public API for that primitive can
certainly use YF itself -- the messing interface with threads can be
completely hidden from view. IMO YF scheduler worth using for real
work must provide such a primitive (it was one of the first things I
had to do in my own prototype, to be able to call
socket.getaddrinfo()).

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 22, 2012, 3:18:02 PM10/22/12

to Guido van Rossum, python...@python.org

>>> Personally, I'm interested in designing a system, including an event
>>> loop, where you can rely on the properties of cooperative scheduling
>>> to avoid ever touching (OS) threading locks. I think such a system should be "pure"
>>> and all interaction with threads should be mediated by the event loop.
>>> (It's okay if this means that the implementation of the event loop
>>> must at some point acquire a threading lock.) The Futures used by the
>>> tasks to coordinate amongst themselves should not require locking --
>>> they should themselves be able to rely on the guarantees of the event
>>> loop not to invoke multiple callbacks in parallel.
>>
>> Unfortunately, a "pure" system means that no async operation can ever have an OS
>> provided callback (or one that comes from outside the world of the scheduler). The purity
>> in this case becomes infectious and limits what operations can be continued from(/waited
>> on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for
>> that loop, whether by modifying a queue or setting a Future. This kind of system does not
>> help with callback-based I/O.
>
> I'm curious what the Twisted folks have to say about this. Or the folks using gevent.

So am I, but my guess would be that as long as you stay within their 'world' everything is fine (I haven't seen any Twisted code to make me believe otherwise, but happy to accept examples - I have no experience with it directly, though I believe I've used similar concepts before). This is fine for a library or framework, but I don't think it's appropriate for a standard library - maybe this is where our views differ?

> I think your world view is colored by Windows; that's fine, we need input from experienced
> Windows users. But I can certainly imagine other ways of dealing with this.

Coloured by threads is probably more accurate, but then again, throwing threads around wildly is definitely a Windows thing :). I also have a background in microcontrollers, including writing my own pre-emptive and cooperative schedulers that worked with external devices, so I'm trying to draw on that as much as my Windows experience.

> For example, in CPython, at least, a callback that is called directly by the OS cannot
> call straight into Python anyway -- you have to acquire the GIL first. This pretty much
> means that an unconstrained callback directly from the OS cannot call straight into Python
> -- it has to put something into a queue, and the bytecode interpreter will eventuall call
> it (possibly in another thread). This is how signal handlers are invoked too.

I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).

>> That's not to say that I want big heavy locks everywhere, but as soon
>> as you potentially have two interrupt-scheduled pieces of code
>
> If interrupt-scheduled means what I think it means, this can only be C code. For the
> Python callback, see above.

I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.

>> queuing to the same loop you need to synchronise access to the data structure. As soon
>> as you get the state and result of a future non-atomically, you need synchronization. I
>> don't doubt there are ways around this (CAS goes a long way, also the GIL will probably
>> help, assuming it's all Python code), and the current implementation of Future is a bit on
>> the heavy side (but also suitable for much more arbitrary uses), but I really believe that
>> avoiding all locks is a bad idea.
>
> I don't actually believe we should avoid all locks. I do believe that there should be a
> separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy"
> threading world can hand off data to each other. It is probably unavoidable that the
> implementation of this mechanism touches a threading lock. But this does not mean that the
> rest of the "pure" world should need to use a Future class that touches threading locks.

We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.

>> (Also, I don't consider cooperative multitasking to be "async" - async
>> requires at least two simultaneous (or at least non-deterministically
>> switching) tasks, whether these are CPU threads or hardware-controlled
>> I/O.)
>
> This sounds like a potentially fatal clash in terminology. In the way I use 'async',
> Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API
> where there is no non-deterministic switching in sight -- in fact, they all carefully
> fence off the part that does interact with threads. For example, the Twisted folks have
> argued that one of the big advantages of using Twisted's Deferred class is that while a
> callback is running, the state of the world remains constant (except for actions made by
> the callback itself, obviously).
>
> What other term should we use to encompass this world view (which IMO is a perfectly valid
> abstraction for a lot of I/O-related concurrency)?

It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)

>>> IIUC you can do this on Windows with IOCP too, simply by only having
>>> a single thread reading events.
>>
>> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any
> more completions) you need to schedule it back to another thread. This requires
> synchronization.
>
> It does sound like this may be unique to Windows, or at least not shared with most of the
> UNIX world (UNIX ports of IOCP notwithstanding).

IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.

>> However, as you pointed out earlier, if all you are trying to achieve is "pure"
>> coroutines, then YF is perfectly appropriate. But this is because of the high level of
>> cooperation required between the involved tasklets. As I understand it, coroutines gain me
>> nothing once I call into a long OpenCV operation, because OpenCV does not know that it is
>> supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are
>> great for within a program, but they don't extend so well into libraries, and certainly
>> provide no compatibility with existing ones (whereas, at worst, I can always write "yield
>> thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library
>> [except maybe a threading library... don't take that "any" too literally]).
>
> I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then
> it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add
> a primitive operation to the YF scheduler that says "run this in a threadpool and wake me
> up when it produces a result". The public API for that primitive can certainly use YF
> itself -- the messing interface with threads can be completely hidden from view. IMO YF
> scheduler worth using for real work must provide such a primitive (it was one of the first
> things I had to do in my own prototype, to be able to call socket.getaddrinfo()).

Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these).

Cheers,
Steve

Guido van Rossum

unread,

Oct 22, 2012, 4:26:06 PM10/22/12

to Steve Dower, python...@python.org

On Mon, Oct 22, 2012 at 12:18 PM, Steve Dower <Steve...@microsoft.com> wrote:
[Quoting me]

>> For example, in CPython, at least, a callback that is called directly by the OS cannot
>> call straight into Python anyway -- you have to acquire the GIL first. This pretty much
>> means that an unconstrained callback directly from the OS cannot call straight into Python
>> -- it has to put something into a queue, and the bytecode interpreter will eventuall call
>> it (possibly in another thread). This is how signal handlers are invoked too.
>
> I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).

Ok, forget the GIL (though PyPy has one). Anyway, the existing
mechanism I was referring to does *not* guarantee that the callback
keeps the GIL as long as it runs. The GIL is used to emulate
preemptive scheduling while still protecting CPython's internal data
structures from concurrent access. It makes no guarantees for user
data. Even "x = d[key]" may release the GIL if the dict contains keys
whose __eq__ is implemented in Python.

But the crucial point of the mechanism is that you don't call straight
into Python from the OS-level callback (which is written in C or some
other low-level language). You arrange for the interpreter to call the
Python-level callback at some later time. So you might as well use
this to enforce single-threading, if that's the way of your world.

>>> That's not to say that I want big heavy locks everywhere, but as soon
>>> as you potentially have two interrupt-scheduled pieces of code
>>
>> If interrupt-scheduled means what I think it means, this can only be C code. For the
>> Python callback, see above.
>
> I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.

At least in CPython, it's not just the GIL. The queue I'm talking
about above must exist even in a CPython version that has no threading
support (and hence no GIL). You still cannot call into Python from a
signal handler or other callback called directly by the OS kernel. You
must delay it until the bytecode interpreter is at a good stopping
point. Check out this code:
http://hg.python.org/cpython/file/daad150b4670/Python/ceval.c#l496
(AddPendingCall and friends).

>>> queuing to the same loop you need to synchronise access to the data structure. As soon
>>> as you get the state and result of a future non-atomically, you need synchronization. I
>>> don't doubt there are ways around this (CAS goes a long way, also the GIL will probably
>>> help, assuming it's all Python code), and the current implementation of Future is a bit on
>>> the heavy side (but also suitable for much more arbitrary uses), but I really believe that
>>> avoiding all locks is a bad idea.
>>
>> I don't actually believe we should avoid all locks. I do believe that there should be a
>> separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy"
>> threading world can hand off data to each other. It is probably unavoidable that the
>> implementation of this mechanism touches a threading lock. But this does not mean that the
>> rest of the "pure" world should need to use a Future class that touches threading locks.
>
> We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.

Yes, this sounds find. I note that the existing APIs already encourage
leaving the creation of the Future to library code -- you don't
construct a Future, typically, but call an executor's submit() method.

>>> (Also, I don't consider cooperative multitasking to be "async" - async
>>> requires at least two simultaneous (or at least non-deterministically
>>> switching) tasks, whether these are CPU threads or hardware-controlled
>>> I/O.)
>>
>> This sounds like a potentially fatal clash in terminology. In the way I use 'async',
>> Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API
>> where there is no non-deterministic switching in sight -- in fact, they all carefully
>> fence off the part that does interact with threads. For example, the Twisted folks have
>> argued that one of the big advantages of using Twisted's Deferred class is that while a
>> callback is running, the state of the world remains constant (except for actions made by
>> the callback itself, obviously).
>>
>> What other term should we use to encompass this world view (which IMO is a perfectly valid
>> abstraction for a lot of I/O-related concurrency)?
>
> It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)

Hm. I guess you are talking about the low-level (or should I say
OS-kernel-called) callback; most event frameworks for Python (except
perhaps gevent?) use user-level callback extensively -- in fact that's
where Twisted wants you to do all the work.

So, again a clash of terminology...

(Aside: please don't use 'continuation' for 'task'. The use of this
term in Scheme has forever tainted the word for me.)

>>>> IIUC you can do this on Windows with IOCP too, simply by only having
>>>> a single thread reading events.
>>>
>>> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any
>> more completions) you need to schedule it back to another thread. This requires
>> synchronization.
>>
>> It does sound like this may be unique to Windows, or at least not shared with most of the
>> UNIX world (UNIX ports of IOCP notwithstanding).
>
> IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.

So maybe IOCP is not all that relevant. Very early on in this
discussion, IOCP was brought up as an important example of a system
for async I/O that had a significantly *different* API than the
typical select/poll/etc.-based systems found on UNIX platforms. But
its relevance may well decompose into a few separable concerns:

- Don't assume everything is a file descriptor.

- On some systems, the natural way to do async I/O is *not* to wait
until the socket (or other event source) is ready, but to ask it to
perform a specific operation in "overlapping" (or async) mode, and you
will get an event back when it is done.

- Event queues are powerful.

- You cannot ignore threads everywhere.

>>> However, as you pointed out earlier, if all you are trying to achieve is "pure"
>>> coroutines, then YF is perfectly appropriate. But this is because of the high level of
>>> cooperation required between the involved tasklets. As I understand it, coroutines gain me
>>> nothing once I call into a long OpenCV operation, because OpenCV does not know that it is
>>> supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are
>>> great for within a program, but they don't extend so well into libraries, and certainly
>>> provide no compatibility with existing ones (whereas, at worst, I can always write "yield
>>> thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library
>>> [except maybe a threading library... don't take that "any" too literally]).
>>
>> I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then
>> it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add
>> a primitive operation to the YF scheduler that says "run this in a threadpool and wake me
>> up when it produces a result". The public API for that primitive can certainly use YF
>> itself -- the messing interface with threads can be completely hidden from view. IMO YF
>> scheduler worth using for real work must provide such a primitive (it was one of the first
>> things I had to do in my own prototype, to be able to call socket.getaddrinfo()).
>
> Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these).

But @async must imported from *somewhere*, and that's where the
decisions are made on how the scheduler works. If you want to use a
different scheduler you still have to import a different @async.

(TBH I don't understand your with_options() thing. If that's how you
propose switching scheduler implementations, there's still a default
behavior that you'd have to change on a per-call basis.)

And about threads and futures: I am making a principled stance that
you shouldn't have to use threads, and you shouldn't have to use a
future implementation that's tied to threads. But maybe we should hear
from some Twisted folks...

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 22, 2012, 4:58:14 PM10/22/12

to Steve Dower, python...@python.org

Steve, I realize that continued point-by-point rebuttals probably are
getting pointless. Maybe your enthusiasm and energy would be better
spent trying to propose and implement (a prototype) of an API in the
style that you prefer? Maybe we can battle it out in code more
easily...

Steve Dower

unread,

Oct 22, 2012, 5:32:42 PM10/22/12

to Guido van Rossum, python...@python.org

Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples.

Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples.

Guido van Rossum

unread,

Oct 22, 2012, 5:41:55 PM10/22/12

to Steve Dower, python...@python.org

On Mon, Oct 22, 2012 at 2:32 PM, Steve Dower <Steve...@microsoft.com> wrote:
> Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples.
>
> Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples.

I have a prototype implementing an async web client that fetches a
page given a URL. Primitives I have in mind include running several of
these concurrently and waiting for the first to come up with a result,
or waiting for all results, or getting the results as they are ready.
I have an event loop that can use select, poll, epoll, and kqueue
(though I've only lightly tested it, on Linux and OSX, so I'm sure
I've missed some corner cases and optimization possibilities). The
fetcher calls socket.getaddrinfo() in a threadpool.

Greg Ewing

unread,

Oct 22, 2012, 6:09:41 PM10/22/12

to python...@python.org

Guido van Rossum wrote:
> On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve...@microsoft.com> wrote:

>> Yes, but unless you run all subsequent code on the IOCP thread (thereby
>> blocking any more completions) you need to schedule it back to another thread.
>> This requires synchronization.

I think there's an assumption behind this whole async tasks discussion
that the tasks being scheduled are I/O bound. We're trying to overlap
CPU activity with I/O, and different I/O activities with each other.
We're *not* trying to achieve concurrency of CPU-bound tasks -- the
GIL prevents that anyway for pure Python code.

The whole Windows IOCP thing, on the other hand, seems to be geared
towards having a pool of threads, any of which can handle any I/O
operation. That's not helpful for us; when one of our tasks blocks
waiting for I/O, the completion of that I/O must wake up *that particular
task*, and it must be run using the same OS thread that was running
it before.

I gather that Windows provides a way of making an async I/O request
and specifying a callback for that request. If that's the case, do
we need to bother with an IOCP at all? Just have the callback wake
up the associated task directly.

--
Greg

Guido van Rossum

unread,

Oct 22, 2012, 6:30:46 PM10/22/12

to Greg Ewing, python...@python.org

On Mon, Oct 22, 2012 at 3:09 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> I think there's an assumption behind this whole async tasks discussion
> that the tasks being scheduled are I/O bound. We're trying to overlap
> CPU activity with I/O, and different I/O activities with each other.
> We're *not* trying to achieve concurrency of CPU-bound tasks -- the
> GIL prevents that anyway for pure Python code.

Right. Of course.

> The whole Windows IOCP thing, on the other hand, seems to be geared
> towards having a pool of threads, any of which can handle any I/O
> operation. That's not helpful for us; when one of our tasks blocks
> waiting for I/O, the completion of that I/O must wake up *that particular
> task*, and it must be run using the same OS thread that was running
> it before.

The reason we can't ignore IOCP is that it is apparently the *only*
way to do async I/O in a scalable way. The only other polling
primitive available is select() which does not scale. (Or so it is
asserted by many folks; I haven't tested this, but I believe the
argument against select() scaling in general.)

> I gather that Windows provides a way of making an async I/O request
> and specifying a callback for that request. If that's the case, do
> we need to bother with an IOCP at all? Just have the callback wake
> up the associated task directly.

AFAICT the way to do that goes through IOCP...

--
--Guido van Rossum (python.org/~guido)

Greg Ewing

unread,

Oct 22, 2012, 6:33:00 PM10/22/12

to python...@python.org

Guido van Rossum wrote:

> (Aside: please don't use 'continuation' for 'task'. The use of this
> term in Scheme has forever tainted the word for me.)

It has a broader meaning than the one in Scheme; essentially
it's a synonym for "callback".

I agree it shouldn't be used as a synonym for "task", though.
In any of its forms, a continuation isn't an entire task, it's
something that you call to cause the resumption of a task
from a particular suspension point.

--
Greg

Steve Dower

unread,

Oct 22, 2012, 6:31:10 PM10/22/12

to Greg Ewing, python...@python.org

>>> Yes, but unless you run all subsequent code on the IOCP thread
>>> (thereby blocking any more completions) you need to schedule it back to another thread.
>>> This requires synchronization.
>
> I think there's an assumption behind this whole async tasks discussion that the tasks
> being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and
> different I/O activities with each other.
> We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that
> anyway for pure Python code.

Sure, but it's easy enough to slip it in for (nearly) free. The only other option is complete exclusion of CPU-bound concurrency, which also rules out running C functions (outside the GIL) on a separate thread.

> The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool
> of threads, any of which can handle any I/O operation. That's not helpful for us; when one
> of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that
> particular task*, and it must be run using the same OS thread that was running it before.
>
> I gather that Windows provides a way of making an async I/O request and specifying a
> callback for that request. If that's the case, do we need to bother with an IOCP at all?
> Just have the callback wake up the associated task directly.

IOCP is probably not useful at all, and as Guido said, it was brought up as an example of a non-select style of waiting. APIs like ReadFileEx/WriteFileEx let you provide the callback directly without using IOCP. In any case, even if we did use IOCP it would be an implementation detail and would not affect how the API is exposed.

(Also, love your work on PEP 380. Despite my hesitation about using yield from for this API, I do really like using it with generators.)

Cheers,
Steve

Guido van Rossum

unread,

Oct 22, 2012, 6:35:12 PM10/22/12

to Greg Ewing, python...@python.org

On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
>> (Aside: please don't use 'continuation' for 'task'. The use of this
>> term in Scheme has forever tainted the word for me.)
>
> It has a broader meaning than the one in Scheme; essentially
> it's a synonym for "callback".

(Off-topic:) But does that meaning apply to Scheme? If so, I wish
someone would have told me 15 years ago...

> I agree it shouldn't be used as a synonym for "task", though.
> In any of its forms, a continuation isn't an entire task, it's
> something that you call to cause the resumption of a task
> from a particular suspension point.

I guess that was just Steve showing off. :-)

--
--Guido van Rossum (python.org/~guido)

Steve Dower

unread,

Oct 22, 2012, 6:49:40 PM10/22/12

to Guido van Rossum, Greg Ewing, python...@python.org

Alertable I/O (<http://msdn.microsoft.com/en-us/library/windows/desktop/aa363772.aspx>) and overlapped I/O are two alternatives to IOCP on Windows.

>> I agree [continuation] shouldn't be used as a synonym for "task", though.

>> In any of its forms, a continuation isn't an entire task, it's
>> something that you call to cause the resumption of a task
>> from a particular suspension point.
>
> I guess that was just Steve showing off. :-)

Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...).

Guido van Rossum

unread,

Oct 22, 2012, 6:56:48 PM10/22/12

to Steve Dower, python...@python.org

On Mon, Oct 22, 2012 at 3:49 PM, Steve Dower <Steve...@microsoft.com> wrote:
> Alertable I/O (<http://msdn.microsoft.com/en-us/library/windows/desktop/aa363772.aspx>) and overlapped I/O are two alternatives to IOCP on Windows.
>
>>> I agree [continuation] shouldn't be used as a synonym for "task", though.
>>> In any of its forms, a continuation isn't an entire task, it's
>>> something that you call to cause the resumption of a task
>>> from a particular suspension point.
>>
>> I guess that was just Steve showing off. :-)
>
> Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...).

Yeah, I have the same terminology hang-up with the term
"continuation-passing-style" for web callbacks.

Reading back what you wrote, you were indeed trying to distinguish
between the "callback" (which you consider the thing that's directly
invoked by the OS) and "the rest of the task" (e.g. the code that runs
when the yield is resumed), which you were referring to as
"continuation". I'd just use "the rest of the task" here.

--
--Guido van Rossum (python.org/~guido)

Greg Ewing

unread,

Oct 22, 2012, 7:04:57 PM10/22/12

to python...@python.org

Guido van Rossum wrote:

> The reason we can't ignore IOCP is that it is apparently the *only*
> way to do async I/O in a scalable way. The only other polling
> primitive available is select() which does not scale.

There seems to be an alternative to polling, though. There are
functions called ReadFileEx and WriteFileEx that allow you to
pass in a routine to be called when the operation completes:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx

Is there some reason that this doesn't scale either?

--
Greg

Guido van Rossum

unread,

Oct 22, 2012, 7:09:28 PM10/22/12

to Greg Ewing, python...@python.org

On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
>> The reason we can't ignore IOCP is that it is apparently the *only*
>> way to do async I/O in a scalable way. The only other polling
>> primitive available is select() which does not scale.
>
>
> There seems to be an alternative to polling, though. There are
> functions called ReadFileEx and WriteFileEx that allow you to
> pass in a routine to be called when the operation completes:
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%29.aspx
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx
>
> Is there some reason that this doesn't scale either?

I don't know, we've reached territory I don't know at all. Are there
also similar calls for Accept() and Connect() on sockets? Those seem
the other major blocking primitives that are frequently used.

FWIW, here is where I read about IOCP being the only scalable way on
Windows: http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x-only.html

--
--Guido van Rossum (python.org/~guido)

Tom Hoover

unread,

Oct 22, 2012, 7:36:35 PM10/22/12

to Guido van Rossum, python...@python.org

On Mon, Oct 22, 2012 at 4:09 PM, Guido van Rossum <gu...@python.org> wrote:
>
> On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> > Guido van Rossum wrote:
> >
> >> The reason we can't ignore IOCP is that it is apparently the *only*
> >> way to do async I/O in a scalable way. The only other polling
> >> primitive available is select() which does not scale.
> >
> >
> > There seems to be an alternative to polling, though. There are
> > functions called ReadFileEx and WriteFileEx that allow you to
> > pass in a routine to be called when the operation completes:
> >
> > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%29.aspx
> > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx
> >
> > Is there some reason that this doesn't scale either?
>
> I don't know, we've reached territory I don't know at all. Are there
> also similar calls for Accept() and Connect() on sockets? Those seem
> the other major blocking primitives that are frequently used.
>
> FWIW, here is where I read about IOCP being the only scalable way on
> Windows: http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x-only.html

It's been years since I've looked at this stuff, but I believe that
you want to use AcceptEx and ConnectEx in conjunction with IOCP.

event_iocp.c and listener.c in libevent 2.0.x could help shed some
light on the details.

Greg Ewing

unread,

Oct 22, 2012, 7:48:39 PM10/22/12

to python...@python.org

Guido van Rossum wrote:
> On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
>
>>It has a broader meaning than the one in Scheme; essentially
>>it's a synonym for "callback".
>
> (Off-topic:) But does that meaning apply to Scheme? If so, I wish
> someone would have told me 15 years ago...

It does, in the sense that a continuation appears to the
Scheme programmer as a callable object.

The connection goes deeper as well. There's a style of
programming called "continuation-passing style", in which
nothing ever returns -- every function is passed another
function to be called with its result. In a language such
as Scheme that supports tail calls, you can use this style
extensively without fear of overflowing the call stack.

You're using this style whenever you chain callbacks
together using Futures or Deferreds. The callbacks don't
return values; instead, each callback arranges for another
callback to be called, passing it the result.

This is also the way monadic I/O works in Haskell. None
of the I/O functions ever return, they just call another
function and pass it the result. A combination of currying
and syntactic sugar is used to hide the fact that you're
passing callbacks -- aka continuations -- around all
over the place.

Now, it turns out that you can define all the semantics
of Scheme, including its continuations, by writing a Scheme
interpreter in Scheme that doesn't itself use Scheme
continuations. You do it by writing the whole interpereter
in continuation-passing style, and it becomes clear that
at that level, the "continuations" are just ordinary
functions, relying on lexical scoping to capture all of the
necessary state.

> I guess that was just Steve showing off. :-)

Not really -- to someone with a Scheme or FP background,
it's near-impossible to look at something like a chain
of Deferred callbacks without the word "continuation"
springing to mind. I agree that it's not helpful to
anyone without such a background, however.

--
Greg

Guido van Rossum

unread,

Oct 22, 2012, 7:54:41 PM10/22/12

to Greg Ewing, python...@python.org

And, predictably, that gave me a headache... :-)

--Guido van Rossum (sent from Android phone)

Greg Ewing

unread,

Oct 22, 2012, 8:07:01 PM10/22/12

to python...@python.org

Guido van Rossum wrote:
> And, predictably, that gave me a headache... :-)

Oops, sorry, Guido -- I shouldn't have mentioned
the M-word. :-)

Dino Viehland

unread,

Oct 23, 2012, 12:46:43 AM10/23/12

to Greg Ewing, python...@python.org

Greg wrote:
> Guido van Rossum wrote:
>
> > The reason we can't ignore IOCP is that it is apparently the *only*
> > way to do async I/O in a scalable way. The only other polling
> > primitive available is select() which does not scale.
>
> There seems to be an alternative to polling, though. There are functions called
> ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when
> the operation completes:
>
> http://msdn.microsoft.com/en-
> us/library/windows/desktop/aa365468%28v=vs.85%29.aspx
> http://msdn.microsoft.com/en-
> us/library/windows/desktop/aa365748%28v=vs.85%29.aspx
>
> Is there some reason that this doesn't scale either?

I suspect it's because it has the completion routine is being invoked on the same
thread that issued the I/O. The thread has to first block in an alertable wait (e.g.
WaitForMultipleObjectsEx or WSAWaitForMultipleEvents). So you'll only get 1
thread doing I/Os and CPU work vs IOCP's where many threads can share both
workloads.

Eric Snow

unread,

Oct 23, 2012, 1:07:17 AM10/23/12

to Steve Dower, python...@python.org

On Mon, Oct 22, 2012 at 9:55 AM, Steve Dower <Steve...@microsoft.com> wrote:
> I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to
> delegate part of its operations to another generator." Using 'yield from' (YF, for convenience)
> requires (a) that the caller is a generator and (b) that the callee is a generator.

Rather, the callee must be some iterable:

def f():
yield from [1, 2, 3]

for x in f():
print(x)

-eric

Jim Jewett

unread,

Oct 23, 2012, 3:17:01 AM10/23/12

to Guido van Rossum, python...@python.org

On 10/19/12, Guido van Rossum <gu...@python.org> wrote:

> I did a basic timing test using a simple recursive function and a
> recursive PEP-380 coroutine computing the same value (see attachment).
> The coroutine version is a little over twice as slow as the function
> version. I find that acceptable. This went 20 deep, making 2 recursive
> calls at each level (except at the deepest level).

Note that the co-routine code (copied below) does not involve a
scheduler that unwraps futures; there is no scheduler, and nothing
runs concurrently.

def coroutine(n):
if n <= 0:
return 1
l = yield from coroutine(n-1)
r = yield from coroutine(n-1)
return l + 1 + r

I like the above code; my concern was that yield might get co-opted
for use with scheduler loops, which would have to track the parent
task explicitly, and prevent it from being rescheduled too early.

-jJ

Benoit Chesneau

unread,

Oct 23, 2012, 3:19:59 AM10/23/12

to Guido van Rossum, Python-Ideas

On Oct 22, 2012, at 4:59 PM, Guido van Rossum <gu...@python.org> wrote:

> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve...@microsoft.com> wrote:
> [Stuff about Futures and threads]

>
> Personally, I'm interested in designing a system, including an event
> loop, where you can rely on the properties of cooperative scheduling
> to avoid ever touching (OS) threading locks. I think such a system
> should be "pure" and all interaction with threads should be mediated
> by the event loop. (It's okay if this means that the implementation of
> the event loop must at some point acquire a threading lock.) The
> Futures used by the tasks to coordinate amongst themselves should not
> require locking -- they should themselves be able to rely on the
> guarantees of the event loop not to invoke multiple callbacks in
> parallel.
>

> IIUC you can do this on Windows with IOCP too, simply by only having a
> single thread reading events.
>

Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, …) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:

- It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems
- Have a job queue system for threds that is working mostly like the Futures but using the event loop

In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]

I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:

- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.

- For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want.

In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it.

In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that…

- benoît

[1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv
[2] https://github.com/saghul/pyuv
[3] https://github.com/saghul/twisted-pyuv
[4] https://github.com/benoitc/flower
[5] http://golang.org/ref/mem
[6] http://doc.pypy.org/en/latest/stackless.html#continulets

Jim Jewett

unread,

Oct 23, 2012, 3:34:58 AM10/23/12

to Guido van Rossum, Python-Ideas

On 10/21/12, Guido van Rossum <gu...@python.org> wrote:
> On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve...@microsoft.com>
> wrote:

>> It has synchronisation which is _aware_ of threads, but it never creates,
>> requires or uses them. It simply ensures thread-safe reentrancy, which
>> will be required for any general solution unless it is completely banned
>> from interacting across CPU threads.

> I don't see it that way. Any time you acquire a lock, you may be
> blocked for a long time. In a typical event loop that's an absolute
> no-no. Typically, to wait for another thread, you give the other
> thread a callback that adds a new event for *this* thread.

That (with or without rescheduling this thread to actually process the
event) is a perfectly reasonable solution, but I'm not sure how
obvious it is. People willing to deal with the conventions and
contortions of twisted are likely to just use twisted. A general API
should have a straightforward way to weight for a result; even
explicitly calling wait() may be too much to ask if you want to keep
assuming that other events will cooperate.

> Perhaps. Lots of possibilities in this design space.
>
>> (*I'm inclined to define this [the Future interface] as 'result()', 'done()',
>> 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()'
>> functions. Maybe more, but I think that's sufficient. The current
>> '_waiters' list is an optimisation for add_done_callback(), and doesn't
>> need to be part of the interface.)

> Agreed. I don't see much use for the cancellation stuff and all the
> extra complexity that adds to the interface.

wait_for_any may well be launching different strategies to solve the
same problem, and intending to ignore all but the fastest. It makes
sense to go ahead and cancel the slower strategies. (That said, I
agree that the API shouldn't guarantee that other tasks are actually
cancelled, let alone that they are cancelled before side effects
occur.)

-jJ

Guido van Rossum

unread,

Oct 23, 2012, 10:44:31 AM10/23/12

to Jim Jewett, python...@python.org

On Tue, Oct 23, 2012 at 12:17 AM, Jim Jewett <jimjj...@gmail.com> wrote:
> On 10/19/12, Guido van Rossum <gu...@python.org> wrote:
>
>> I did a basic timing test using a simple recursive function and a
>> recursive PEP-380 coroutine computing the same value (see attachment).
>> The coroutine version is a little over twice as slow as the function
>> version. I find that acceptable. This went 20 deep, making 2 recursive
>> calls at each level (except at the deepest level).
>
> Note that the co-routine code (copied below) does not involve a
> scheduler that unwraps futures; there is no scheduler, and nothing
> runs concurrently.
>
> def coroutine(n):
> if n <= 0:
> return 1
> l = yield from coroutine(n-1)
> r = yield from coroutine(n-1)
> return l + 1 + r
>
> I like the above code; my concern was that yield might get co-opted
> for use with scheduler loops, which would have to track the parent
> task explicitly, and prevent it from being rescheduled too early.

Don't worry. There is no way that a scheduler can change the meaning
of yield from. All its power stems from its ability to decide when to
call next(), and that is the same power that the app has itself.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 23, 2012, 10:48:36 AM10/23/12

to Benoit Chesneau, Python-Ideas

Thanks for the pointer to and description of libuv; it had come up in
my research yet but so far I have not looked it up actively. Now I
will. Also thanks for your reminder of the Goroutine model -- this is
definitely something to look at for inspiration as well. (Though does
Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)

--Guido

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,

Oct 23, 2012, 10:54:46 AM10/23/12

to Jim Jewett, Python-Ideas

On Tue, Oct 23, 2012 at 12:34 AM, Jim Jewett <jimjj...@gmail.com> wrote:
> On 10/21/12, Guido van Rossum <gu...@python.org> wrote:
>> On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve...@microsoft.com>
>> wrote:
>
>>> It has synchronisation which is _aware_ of threads, but it never creates,
>>> requires or uses them. It simply ensures thread-safe reentrancy, which
>>> will be required for any general solution unless it is completely banned
>>> from interacting across CPU threads.
>
>> I don't see it that way. Any time you acquire a lock, you may be
>> blocked for a long time. In a typical event loop that's an absolute
>> no-no. Typically, to wait for another thread, you give the other
>> thread a callback that adds a new event for *this* thread.
>
> That (with or without rescheduling this thread to actually process the
> event) is a perfectly reasonable solution, but I'm not sure how
> obvious it is. People willing to deal with the conventions and
> contortions of twisted are likely to just use twisted.

I think part of my point is that we can package all this up in a way
that is a lot less scary than Twisted's reputation. And remember,
there are many other frameworks that use similar machinery. There's
Tornado, Monocle (which runs on top of Tornado *or* Twisted), and of
course the stdlib's asyncore, which is antiquated but still much used
-- AFAIL Zope is still built around it.

> A general API
> should have a straightforward way to wait for a result; even

> explicitly calling wait() may be too much to ask if you want to keep
> assuming that other events will cooperate.

Here I have some real world relevant experience: NDB, App Engine's new
Datastore API (which I wrote). It is async under the hood (yield + its
own flavor of Futures), and users who want the most performance from
their app are encouraged to use the async APIs directly -- but users
who don't care can ignore their existence completely. There are
thousands of users, and I've seen people explain the async stuff to
each other on StackOverflow, so I think it is quite accessible.

>> Agreed. I don't see much use for the cancellation stuff and all the
>> extra complexity that adds to the interface.
>
> wait_for_any may well be launching different strategies to solve the
> same problem, and intending to ignore all but the fastest. It makes
> sense to go ahead and cancel the slower strategies. (That said, I
> agree that the API shouldn't guarantee that other tasks are actually
> cancelled, let alone that they are cancelled before side effects
> occur.)

Agreed. And it's not hard to implement a custom cancellation mechanism either.

--
--Guido van Rossum (python.org/~guido)

Brett Cannon

unread,

Oct 23, 2012, 12:09:56 PM10/23/12

to Guido van Rossum, Python-Ideas

Go is available for Windows: http://golang.org/doc/install#windows