current stackless coroutines proposal

500 views
Skip to first unread message

Michael Kilburn

unread,
Mar 6, 2018, 5:02:02 PM3/6/18
to ISO C++ Standard - Future Proposals
Hi,

We all know about stackless coroutines proposal Gor Nishanov is working on right now -- there is a good chance it is going to land in C++20.

It is essentially a composition of various tools that causes compiler to generate a C++ object (a state machine) that optionally allows caller coroutine to "subscribe" to a "ready" event (via co_await). All of this is hidden behind plain function declaration. And to make it efficient a few compiler optimizations are available (if coroutine body is visible) -- heap elision, etc.

More info can be found in discussion here:

I find that approach taken isn't ideal:
- if for optimizations we need body to be visible -- what is the benefit of hiding coroutine behind "plain function" facade?
- all this transformations lead to creation of a plain C++ object -- instead of inventing new semantics of interacting with coroutine why don't we simply expose that object interface to end user?
- having to propagate 'co_await's down the call tree doesn't seem ideal
- for coroutines that can be located on stack you have to rely on compiler to notice that
- and etc

So instead of hiding coroutine behind 'plain function' interface and teaching compiler how to work around its limitations -- why don't we use approach similar to one used by templates? They do the similar thing -- generate C++ entities (functions and objects) from template code... Like this:

coroutine int mycoro(int a, char* b) { ... }  // has to be defined in declaration

int main()
{
    for(auto x: mc(1,"abc")) ... ;    // we explicitly allocate coroutine on stack
}

In this way:
- each client "sees" entire coroutine and knows how big is it's frame
- you explicitly control coroutine frame allocation
- no need for new compiler optimizations -- in fact, only portion of compiler that performs template translation needs to be changed
- generated object can be very similar to one in current proposal -- main difference is how coroutine is represented in the code
    - resulting object will have resume() method (to switch machine to next state) and on_ready(ready_cb) (to enable "awaiting", visible only from other coroutine)
    - note that "awaitable" coroutine is the one that isn't always "ready" after each "resume()" call -- can be auto-detected by compiler, or specified explicitly
- since coroutine clearly differs from function -- no need for a keyword when one coroutine calls another

What do you think about this approach?

Regards,
Michael.


P.S. Note that coroutine generation doesn't even require any changes to the language -- it can theoretically be done by some sort of tool that takes coroutine code and builds a state machine C++ class, that later can be compiled with C++ compiler and used by your code.

Nicol Bolas

unread,
Mar 7, 2018, 12:33:15 AM3/7/18
to ISO C++ Standard - Future Proposals


On Tuesday, March 6, 2018 at 5:02:02 PM UTC-5, Michael Kilburn wrote:
Hi,

We all know about stackless coroutines proposal Gor Nishanov is working on right now -- there is a good chance it is going to land in C++20.

It is essentially a composition of various tools that causes compiler to generate a C++ object (a state machine) that optionally allows caller coroutine to "subscribe" to a "ready" event (via co_await). All of this is hidden behind plain function declaration. And to make it efficient a few compiler optimizations are available (if coroutine body is visible) -- heap elision, etc.

More info can be found in discussion here:

I find that approach taken isn't ideal:
- if for optimizations we need body to be visible -- what is the benefit of hiding coroutine behind "plain function" facade?

Which optimizations are we talking about? Does everyone who wants to use coroutines want those optimizations? At all times? If not, why force them on those who just want easily written async code?

- all this transformations lead to creation of a plain C++ object -- instead of inventing new semantics of interacting with coroutine why don't we simply expose that object interface to end user?

Because the whole point of the Coroutines TS proposal is to make asynchronous code look almost identical to synchronous code. There have been many discussions of alternative coroutine systems, and while I prefer many designs on the basis of better functionality, not one of these alternatives can offer the interface simplicity from an end-user perspective as the Coroutines TS.

Oh yes, writing an awaitable type is a nightmarish netherworld of pain, suffering, and subtle bugs. But using such a type is quick and easy.
 
- having to propagate 'co_await's down the call tree doesn't seem ideal

Yes, as has been noted by many others.

- for coroutines that can be located on stack you have to rely on compiler to notice that

Yes, again, we know that. We knew that years ago.

- and etc

So instead of hiding coroutine behind 'plain function' interface and teaching compiler how to work around its limitations -- why don't we use approach similar to one used by templates? They do the similar thing -- generate C++ entities (functions and objects) from template code... Like this:

coroutine int mycoro(int a, char* b) { ... }  // has to be defined in declaration

int main()
{
    for(auto x: mc(1,"abc")) ... ;    // we explicitly allocate coroutine on stack
}

P0939 makes it clear that the committee is adamant about getting the Coroutines TS in C++20 with its current design, and similarly makes it clear that 11th hour redesigns aren't going to be allowed to completely derail existing proposals that have had years of prior design. As such, any discussion in this direction is entirely academic.

Your design appears to be similar to the various "resumable functions" proposals: P0114 and so forth. They look great for generators, but they require a lot more effort to write genuine asynchronous function calls. Or at least, not without effort on the part of the caller.

Remember: the "Coroutines TS" is really the "Continuations TS": it's all about halting a function's execution and using the result of an expression to schedule the continuation of that function's execution (and unpacking the result of the expression when the function is resumed). At its core, it's all about putting `.then` in the language. This usage pattern is the Coroutines TS system working at its best. So if you want to challenge it, you need to find a design that can accomplish what it does better than it currently does.

Your design handles generators well. But generators have never been a serious aspect of the Coroutines TS design. Oh sure, it can handle generators to some degree, but it's not very good at them due to the lack of call stack preservation of more than itself. It's exceedingly terrible at mixing generators and so forth.

But you're not going to stop the Coroutines TS from becoming part of the standard by presenting a solution that only solves a minor part of the problem the TS is intended to solve.

Michael Kilburn

unread,
Mar 7, 2018, 1:29:30 AM3/7/18
to std-pr...@isocpp.org
On Tue, Mar 6, 2018 at 11:33 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Tuesday, March 6, 2018 at 5:02:02 PM UTC-5, Michael Kilburn wrote:
But you're not going to stop the Coroutines TS from becoming part of the standard by presenting a solution that only solves a minor part of the problem the TS is intended to solve.

Damn it! My evil plans got foiled again!

Well, all I wanted to hear is opinions on given approach -- because fundamentally it changes very little in current proposal. Only instead of hiding behind opaque "just a function" facade this coroutine hides behind transparent "template" facade. The way I see it -- it makes implementation (on compiler side) easier. Was this approach ever considered? If yes -- why it was rejected?


Which optimizations are we talking about?

There is a talk given by Gor about "disappearing coroutines" where (assuming compiler has required optimizations) coroutine gets completely "inlined". He mentioned "heap allocation elision" optimization there -- whih to me looks like a big mistake (just like copy elision was).


They look great for generators, but they require a lot more effort to write genuine asynchronous function calls.

No, not really... At least I don't see it.

Ville Voutilainen

unread,
Mar 7, 2018, 3:08:08 AM3/7/18
to ISO C++ Standard - Future Proposals
On 7 March 2018 at 07:33, Nicol Bolas <jmck...@gmail.com> wrote:
> P0939 makes it clear that the committee is adamant about getting the
> Coroutines TS in C++20 with its current design, and similarly makes it clear

The authors of P0939 are not the committee, they don't speak for the committee,
and they don't make decisions for the committee. It's a recommendation.

> that 11th hour redesigns aren't going to be allowed to completely derail
> existing proposals that have had years of prior design. As such, any
> discussion in this direction is entirely academic.

That remains to be seen, because such discussions are going to happen, in the
committee.

Todd Fleming

unread,
Mar 7, 2018, 9:29:21 AM3/7/18
to ISO C++ Standard - Future Proposals
On Wednesday, March 7, 2018 at 1:29:30 AM UTC-5, Michael Kilburn wrote:
There is a talk given by Gor about "disappearing coroutines" where (assuming compiler has required optimizations) coroutine gets completely "inlined". He mentioned "heap allocation elision" optimization there -- whih to me looks like a big mistake (just like copy elision was).


Heap elision has been around for quite a while in clang, long before coroutines. What's the big mistake with heap elision and copy elision? My code has benefited from both.

Nicol Bolas

unread,
Mar 7, 2018, 9:51:20 AM3/7/18
to ISO C++ Standard - Future Proposals
On Wednesday, March 7, 2018 at 1:29:30 AM UTC-5, Michael Kilburn wrote:
On Tue, Mar 6, 2018 at 11:33 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Tuesday, March 6, 2018 at 5:02:02 PM UTC-5, Michael Kilburn wrote:
But you're not going to stop the Coroutines TS from becoming part of the standard by presenting a solution that only solves a minor part of the problem the TS is intended to solve.

Damn it! My evil plans got foiled again!

Well, all I wanted to hear is opinions on given approach -- because fundamentally it changes very little in current proposal. Only instead of hiding behind opaque "just a function" facade this coroutine hides behind transparent "template" facade. The way I see it -- it makes implementation (on compiler side) easier. Was this approach ever considered? If yes -- why it was rejected?

Resumable function-style coroutines weren't "rejected", as I understand it. They just stopped running the race. The people proposing it never implemented it, while Gor took the time and effort to get a decent implementation into a shipping compiler for people to use. Gor iterated heavily on the proposal, while the others never really improved or changed.

Coroutines TS has won essentially by default; the people behind it put in the work to prove that the idea functions, and its competition didn't.

Which optimizations are we talking about?

There is a talk given by Gor about "disappearing coroutines" where (assuming compiler has required optimizations) coroutine gets completely "inlined". He mentioned "heap allocation elision" optimization there -- whih to me looks like a big mistake (just like copy elision was).

Sure, but not every case of coroutines relies on that. Coroutines being completely inlined is primarily for generator coroutines, which again aren't exactly the primary use case for the feature.

They look great for generators, but they require a lot more effort to write genuine asynchronous function calls.

No, not really... At least I don't see it.

OK, take this synchronous code:

int sync_call(...)
{
 
auto val = some_function(...);
 
return val + 1;
}

This is the async version of that, using `co_await`:

std::future<int> async_call(...)
{
 
auto val = co_await std::async(some_function, ...);
  co_return val
+ 1;
}

Write the equivalent of `async_call` using your coroutines system. Here are the rules. You must invoke `std::async`. You must use `future::then` to resume the rest of `async_call`. Your function must return a `future<int>` (which the caller themselves can use `::then` on). Oh, and don't forget: the future returned from the `async` call may not be a `future<int>`; merely some type that can have one added to it, the result of which is convertible to `int`.

Lee Howes

unread,
Mar 7, 2018, 12:18:01 PM3/7/18
to std-pr...@isocpp.org
Gor iterated heavily on the proposal, while the others never really improved or changed.
 
Oliver Kowalke has updated his series of fibers papers: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0876r0.pdf so there is continued work on the stackful coroutines approach, at least in the sense of the required primitives.


Michael:
On "having to propagate 'co_await's down the call tree doesn't seem ideal" for many of us this is absolutely ideal, it's so ideal that we've experimented with enforcing it even in fiber-based code that doesn't need it and this feature is one reason that may cause us to port some of our fiber-based code to these coroutines!  The safety benefits of doing that are enormous in terms of enforcement of asynchronous interfaces, guaranteeing where code runs under complete control of caller and callee library constructs, and ensuring that different libraries do not interfere with each others' synchronisation primitives. We've actively been *removing* continuation support from futures in some parts of our codebase for similar reasons - explicitness has enormous value. A few weeks of extra work strengthening a library to make it trivial to use by a caller, while maintaining all the safety my library needs, is well worth the effort.

We clearly need to be able to co_await on opaque library calls - so there are going to be cases all over the codebase where the compiler can not, and indeed should not, have any visibility into the library code. We even want coroutines to sit behind dynamic dispatch - is there any reason why a virtual function should not be a coroutine? Enforcing visibility to the compiler would break that. Those are cases where we will not get automatic heap elision. We may want explicit stack allocation instead and this can be implemented on top of the current TS, though some modifications would make it cleaner. We may not care because the odd heap allocation across asynchronous library interfaces is trivial, and we have worse than that now with synchronising promise/future pairs. 

A lot of work has gone into this, and while it is not perfect and there are certainly concerns from some parties with rushing it into the standard, on balance long years of discussions around similar questions to those you are asking have got us to this point.


--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/33409afd-5bf6-40e0-a214-720535f9b54d%40isocpp.org.

Nicol Bolas

unread,
Mar 7, 2018, 4:17:21 PM3/7/18
to ISO C++ Standard - Future Proposals
On Wednesday, March 7, 2018 at 12:18:01 PM UTC-5, Lee Howes wrote:
Gor iterated heavily on the proposal, while the others never really improved or changed.
 
Oliver Kowalke has updated his series of fibers papers: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0876r0.pdf so there is continued work on the stackful coroutines approach, at least in the sense of the required primitives.

I was talking about the language-based proposals. The library coroutines proposal has had a number of iterations over the years. Though some of the changes have been pretty radical.

Michael Kilburn

unread,
Mar 7, 2018, 9:40:07 PM3/7/18
to std-pr...@isocpp.org
Copy elision is an illusion, unnecessary complication -- all (decent) compilers since forever generated code that constructs a value at certain location (according to calling conventions/etc) and callee (or caller in case of RVO) simply assumes ownership of that object. In language it was represented as optional optimization and this was constantly PITA for everyone. Finally, after 3 decades it was made mandatory -- which is still unnecessary complication. Related semantic (caller constructs and passes ownership to callee) should simply be enshrined in the standard and this idea that parameters somehow gets copied into function scope (but compiler probably will optimize it away) -- forgotten like a bad dream.

Heap allocation elision looks very similar to copy elision -- in both cases it is an optional optimization that you can't rely on. I bet in 30 years after much of hand wringing it will be made mandatory.

There is another side effect of heap elision -- it makes calculating stack size requirement hard. I like the idea of writing completely correct code -- that is code that will run forever and gracefully handle any circumstance (including OOMs) as long as language runtime, hardware and all other layers don't break promises they made. In C++ this (almost) can be achieved by calculating maximum stack size (via code analysis) and reserving it at the start of the program. Heap allocation elision makes it harder. Me don't like it.

--
Sincerely yours,
Michael.

Michael Kilburn

unread,
Mar 7, 2018, 10:07:37 PM3/7/18
to std-pr...@isocpp.org
On Wed, Mar 7, 2018 at 8:51 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Wednesday, March 7, 2018 at 1:29:30 AM UTC-5, Michael Kilburn wrote:
On Tue, Mar 6, 2018 at 11:33 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Tuesday, March 6, 2018 at 5:02:02 PM UTC-5, Michael Kilburn wrote:
But you're not going to stop the Coroutines TS from becoming part of the standard by presenting a solution that only solves a minor part of the problem the TS is intended to solve.

Damn it! My evil plans got foiled again!

Well, all I wanted to hear is opinions on given approach -- because fundamentally it changes very little in current proposal. Only instead of hiding behind opaque "just a function" facade this coroutine hides behind transparent "template" facade. The way I see it -- it makes implementation (on compiler side) easier. Was this approach ever considered? If yes -- why it was rejected?

Resumable function-style coroutines weren't "rejected", as I understand it. They just stopped running the race. The people proposing it never implemented it, while Gor took the time and effort to get a decent implementation into a shipping compiler for people to use. Gor iterated heavily on the proposal, while the others never really improved or changed.

Coroutines TS has won essentially by default; the people behind it put in the work to prove that the idea functions, and its competition didn't.

I think you (and others) misunderstood my idea -- I do not advocate against current proposal, I am aiming only at one aspect of it -- namely hiding coroutine behind "plain function" declaration. In my idea compiler can generate precisely same language constructs during coroutine "instantiation" (as it does in current proposal) -- same Awaitable<T> as return types, etc. 

 
Which optimizations are we talking about?

There is a talk given by Gor about "disappearing coroutines" where (assuming compiler has required optimizations) coroutine gets completely "inlined". He mentioned "heap allocation elision" optimization there -- whih to me looks like a big mistake (just like copy elision was).

Sure, but not every case of coroutines relies on that. Coroutines being completely inlined is primarily for generator coroutines, which again aren't exactly the primary use case for the feature.

And yet it is one of selling points -- unfortunately it requires optional compiler optimization, i.e. you can't rely on it. Also, it requires compiler to be able to observe coroutine body, which naturally leads to a question -- "why hiding it behind 'plain function' facade at all?". Why don't we allow user to make related decisions explicitly (like where coroutine frame will be allocated).

Current proposal leads to a situation where I can't use coroutine in noexcept function -- because I can't rely on compiler to use heap allocation elision.

 
They look great for generators, but they require a lot more effort to write genuine asynchronous function calls.

No, not really... At least I don't see it.

OK, take this synchronous code:

int sync_call(...)
{
 
auto val = some_function(...);
 
return val + 1;
}

This is the async version of that, using `co_await`:

std::future<int> async_call(...)
{
 
auto val = co_await std::async(some_function, ...);
  co_return val
+ 1;
}

Write the equivalent of `async_call` using your coroutines system. Here are the rules. You must invoke `std::async`. You must use `future::then` to resume the rest of `async_call`. Your function must return a `future<int>` (which the caller themselves can use `::then` on). Oh, and don't forget: the future returned from the `async` call may not be a `future<int>`; merely some type that can have one added to it, the result of which is convertible to `int`.

coroutine int coro_call(...)
{
    auto val = std::coro_async(some_function, ...)();
    return val + 1;
}

or (if compiler treats certain function calls made from coroutine in a special way (based on function signature, for example)):

coroutine int coro_call(...)
{
    auto val = std::async(some_function, ...);
    return val + 1;
}

you can come up with a bunch of interesting transformation for "coroutine translation" step, but discussing them was not the goal of my original post.

--
Sincerely yours,
Michael.

Nicol Bolas

unread,
Mar 7, 2018, 10:33:53 PM3/7/18
to ISO C++ Standard - Future Proposals
On Wednesday, March 7, 2018 at 10:07:37 PM UTC-5, Michael Kilburn wrote:
On Wed, Mar 7, 2018 at 8:51 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Wednesday, March 7, 2018 at 1:29:30 AM UTC-5, Michael Kilburn wrote:
On Tue, Mar 6, 2018 at 11:33 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Tuesday, March 6, 2018 at 5:02:02 PM UTC-5, Michael Kilburn wrote:
But you're not going to stop the Coroutines TS from becoming part of the standard by presenting a solution that only solves a minor part of the problem the TS is intended to solve.

Damn it! My evil plans got foiled again!

Well, all I wanted to hear is opinions on given approach -- because fundamentally it changes very little in current proposal. Only instead of hiding behind opaque "just a function" facade this coroutine hides behind transparent "template" facade. The way I see it -- it makes implementation (on compiler side) easier. Was this approach ever considered? If yes -- why it was rejected?

Resumable function-style coroutines weren't "rejected", as I understand it. They just stopped running the race. The people proposing it never implemented it, while Gor took the time and effort to get a decent implementation into a shipping compiler for people to use. Gor iterated heavily on the proposal, while the others never really improved or changed.

Coroutines TS has won essentially by default; the people behind it put in the work to prove that the idea functions, and its competition didn't.

I think you (and others) misunderstood my idea -- I do not advocate against current proposal, I am aiming only at one aspect of it -- namely hiding coroutine behind "plain function" declaration.

But that's practically the point of the Coroutines TS design: that the compiler generates the coroutine machinery based entirely on what is going on inside of the function, not how the outside world uses it. And as will be discussed below, the ramifications of changing "only one aspect of it" fundamentally changes the nature of what you're talking about.

In my idea compiler can generate precisely same language constructs during coroutine "instantiation" (as it does in current proposal) -- same Awaitable<T> as return types, etc.  

 
Which optimizations are we talking about?

There is a talk given by Gor about "disappearing coroutines" where (assuming compiler has required optimizations) coroutine gets completely "inlined". He mentioned "heap allocation elision" optimization there -- whih to me looks like a big mistake (just like copy elision was).

Sure, but not every case of coroutines relies on that. Coroutines being completely inlined is primarily for generator coroutines, which again aren't exactly the primary use case for the feature.

And yet it is one of selling points -- unfortunately it requires optional compiler optimization, i.e. you can't rely on it. Also, it requires compiler to be able to observe coroutine body, which naturally leads to a question -- "why hiding it behind 'plain function' facade at all?". Why don't we allow user to make related decisions explicitly (like where coroutine frame will be allocated).

Current proposal leads to a situation where I can't use coroutine in noexcept function -- because I can't rely on compiler to use heap allocation elision.

If it's a concern, trap the exception so that it doesn't try to exit the `noexcept` function.

They look great for generators, but they require a lot more effort to write genuine asynchronous function calls.

No, not really... At least I don't see it.

OK, take this synchronous code:

int sync_call(...)
{
 
auto val = some_function(...);
 
return val + 1;
}

This is the async version of that, using `co_await`:

std::future<int> async_call(...)
{
 
auto val = co_await std::async(some_function, ...);
  co_return val
+ 1;
}

Write the equivalent of `async_call` using your coroutines system. Here are the rules. You must invoke `std::async`. You must use `future::then` to resume the rest of `async_call`. Your function must return a `future<int>` (which the caller themselves can use `::then` on). Oh, and don't forget: the future returned from the `async` call may not be a `future<int>`; merely some type that can have one added to it, the result of which is convertible to `int`.

coroutine int coro_call(...)
{
    auto val = std::coro_async(some_function, ...)();
    return val + 1;
}

You broke the rules. You didn't invoke `std::async`; you made a new function. Under this design, any asynchronous library will have to have `coro_` versions of all of its asynchronous functions, rather than just using `future`s or similar such types that allow you to apply continuations to them.

Also, note that this function doesn't return a `future<int>`. Which means that if the user wants to use a continuation, they actually can't. Indeed, the user can't even call it like a regular function, can they? Since it may halt mid-stream, you have to either qualify the call to it in some synchronization primitive or the caller themselves must be a `coroutine` function that can therefore be halted.

By contrast, a coroutine function can *always* be called just like a regular function. It has the specified return value, and behaves exactly like its signature says it does. This fact is a fundamental part of the system's design.

The kind of design you've presented is exactly what the resumable expressions proposal used. Only it was a bit cleverer about it, such that you would just have a resumable function called "await" that could be overloaded for some "awaitable" type, which would do the scheduling and unpacking, returning the unpacked value once resumed.

You really should look at that proposal; it's clearly what you want. And yes, it was looked at, but it didn't move forward past P0114R0.

Michael Kilburn

unread,
Mar 8, 2018, 1:15:38 AM3/8/18
to std-pr...@isocpp.org
On Wed, Mar 7, 2018 at 9:33 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Wednesday, March 7, 2018 at 10:07:37 PM UTC-5, Michael Kilburn wrote:
I think you (and others) misunderstood my idea -- I do not advocate against current proposal, I am aiming only at one aspect of it -- namely hiding coroutine behind "plain function" declaration.

But that's practically the point of the Coroutines TS design: that the compiler generates the coroutine machinery based entirely on what is going on inside of the function, not how the outside world uses it. And as will be discussed below, the ramifications of changing "only one aspect of it" fundamentally changes the nature of what you're talking about.

I am sticking to my guns (i.e. going to claim I am being misunderstood). I'll try to explain it again:
- stackless coroutines is when your code gets transformed int a C++ object which represents a state machine (and associated fluff to tie it with the rest of your code)
- the idea is to use template-like approach to "transformation" step -- as far as I am concerned it could generate exactly the same declarations as you write in MSVC version that supports Gor's proposal. I.e.

coroutine int mc(...) { ...; coro2(); ... }

could result in generating:

future<int> mc(...) { ...; co_await  coro2(); ... }

or any other representation -- I really don't care.

But the difference is that all translation units will see everything, unlike current state where (typically) only one will have a full vision. This would allow every call site to know everything about generated state machine (e.g. it's size). This will allow certain features:
- caller can explicitly control location of coroutine frame
- a coroutine will be clearly different from a function -- which would allow to avoid necessity of co_await or co_return. Compiler can see that current coroutine calls another coroutine and generate required fluff automatically. You can introduce a new keyword (or convention) that will allow you to call coroutine in other way (but I doubt this need will be great).
- inlining of some of generated C++ object methods
- you need to update only one portion of compiler -- one that transforms coroutine declarations into state machine class declaration. No need to introduce additional logic in other areas.
- etc

i.e. idea is not about coroutine implementation or semantic of generated functions -- it is about having every client to see entire coroutine. At this stage I really don't care exactly how resulting state machine will behave or what methods it will expose.

 
And yet it is one of selling points -- unfortunately it requires optional compiler optimization, i.e. you can't rely on it. Also, it requires compiler to be able to observe coroutine body, which naturally leads to a question -- "why hiding it behind 'plain function' facade at all?". Why don't we allow user to make related decisions explicitly (like where coroutine frame will be allocated).

Current proposal leads to a situation where I can't use coroutine in noexcept function -- because I can't rely on compiler to use heap allocation elision.

If it's a concern, trap the exception so that it doesn't try to exit the `noexcept` function.

Nope, I know I can build noexcept function that can count to ten. But I can't do it with coroutines -- because I can't force it's frame to be on stack (it is an optional optimization). Catching std::bad_alloc gives me nothing -- what I am going to do when I caught it -- call std::terminate()?


You broke the rules. You didn't invoke `std::async`; you made a new function. Under this design, any asynchronous library will have to have `coro_` versions of all of its asynchronous functions, rather than just using `future`s or similar such types that allow you to apply continuations to them.

I see... one of the aims is to integrate coroutines with future/async stuff... I am probably behind on all this. All async libs I looked at used same approach -- public API consists of a function (async_foo) that takes a callback. Then someone somewhere cranks the event loop which eventually calls "process_events()" library function that in turn calls aforementioned callback.

So, for this library to add coroutine support you'd have to create second inline function (or macro) coro_foo() that takes address of resume() method of current coroutine, registers it using async_foo() and suspends current coroutine. I.e. in this design you still have to add second version of your async_foo() -- a coroutine-aware wrapper coro_foo(). I see no fault is this approach -- everything is clear and clean.

Now, if library instead exports "future<int> foo()" -- there is no need for coro_foo(), but implementation becomes less efficient and more complex. Because of type erasures, having to allocate memory, having to move values and other stuff (like synchronization). Am I right? Is it a good price for not having to add a (rather simple) coro_foo()?


Also, note that this function doesn't return a `future<int>`. Which means that if the user wants to use a continuation, they actually can't. Indeed, the user can't even call it like a regular function, can they? Since it may halt mid-stream, you have to either qualify the call to it in some synchronization primitive or the caller themselves must be a `coroutine` function that can therefore be halted.

it isn't a function -- it is a coroutine. Transforming it may result in a function that returns future<int> (I don't have a "new design" -- all I presented was a simple idea that is very far from being a "design").

Yes, you can't invoke coro_call() from a normal function (unless it is inline and is called from another coroutine). But you could create an instance of it (i.e. create an instance of state machine class that coro_call's transformation produces). And (depending on generated class interface) call it's public methods.

 
By contrast, a coroutine function can *always* be called just like a regular function. It has the specified return value, and behaves exactly like its signature says it does. This fact is a fundamental part of the system's design.

Yes, because from courotine is hidden behind "plain function" facade. Thus hiding state machine details from the caller -- it's frame size, it's methods, etc. All you have is "future<int> myname(...);". Some details get hidden in futures<int> which introduce an (arguably unnecessary) layer of indirection. This layer disappears if "transformation" step described above generates state machine class visible in it's entirety in every TU that uses it.

 
The kind of design you've presented is exactly what the resumable expressions proposal used. Only it was a bit cleverer about it, such that you would just have a resumable function called "await" that could be overloaded for some "awaitable" type, which would do the scheduling and unpacking, returning the unpacked value once resumed. 

You really should look at that proposal; it's clearly what you want. And yes, it was looked at, but it didn't move forward past P0114R0.

Thank you, I will read it. But as I said -- it isn't what I was talking about. I am probably not the best communicator.

In any case -- I am not insisting that this is a brilliant idea that will turn the world on it's head. Just asking for opinions and if it was already considered.


--
Sincerely yours,
Michael.

Michael Kilburn

unread,
Mar 8, 2018, 2:09:23 AM3/8/18
to std-pr...@isocpp.org
On Wed, Mar 7, 2018 at 11:17 AM, Lee Howes <xri...@gmail.com> wrote:
Gor iterated heavily on the proposal, while the others never really improved or changed.
 
Oliver Kowalke has updated his series of fibers papers: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0876r0.pdf so there is continued work on the stackful coroutines approach, at least in the sense of the required primitives.

Well, as I understand Microsoft sponsors Gor's work -- it isn't surprising that his proposal progresses faster and more favored by committee who's activities are coincidentally sponsored by MS too. :-)

But "evil corporation" jokes aside -- I don't see stackful coroutine as a competitor to stackless one. Latter one is essentially a C++ state machine class generated at compile time, while in former everything happens dynamically at runtime. Plus it allows execution flow to jump around in a "non-linear" way (i.e. A -> B -> C -> A switching).


Michael:
On "having to propagate 'co_await's down the call tree doesn't seem ideal" for many of us this is absolutely ideal, it's so ideal that we've experimented with enforcing it even in fiber-based code that doesn't need it and this feature is one reason that may cause us to port some of our fiber-based code to these coroutines!  The safety benefits of doing that are enormous in terms of enforcement of asynchronous interfaces, guaranteeing where code runs under complete control of caller and callee library constructs, and ensuring that different libraries do not interfere with each others' synchronisation primitives.

It seems I am missing a lot of background here -- how all these things are related to fibers or co_await?

 
We've actively been *removing* continuation support from futures in some parts of our codebase for similar reasons - explicitness has enormous value. A few weeks of extra work strengthening a library to make it trivial to use by a caller, while maintaining all the safety my library needs, is well worth the effort.

Same here -- how "continuation support of futures" are related to "co_await"? Which (afaik) is a suspension marker for compiler and place where we subscribe to "data-is-ready" event of another coroutine.


We clearly need to be able to co_await on opaque library calls - so there are going to be cases all over the codebase where the compiler can not, and indeed should not, have any visibility into the library code.

Nothing prevents you from hiding state machine object (generated from couroutine declaration) behind a function manually -- in a similar way current proposal does.


We even want coroutines to sit behind dynamic dispatch - is there any reason why a virtual function should not be a coroutine?

Ultimately, stackless coroutine is a C++ object with methods like resume()/etc. I don't see why you shouldn't be able to treat it as such -- i.e. storing it as member variable of another class and access in a virtual function.


Enforcing visibility to the compiler would break that. Those are cases where we will not get automatic heap elision. We may want explicit stack allocation instead and this can be implemented on top of the current TS, though some modifications would make it cleaner. We may not care because the odd heap allocation across asynchronous library interfaces is trivial, and we have worse than that now with synchronising promise/future pairs. 

Yes, I think having ability to explicitly control where state machine is allocated is a must.

Hmm... You know, I've spent more than 2 decades writing relatively complex multithreaded (occasionally lockless) code and I never got into future/async paradigm... Never really needed it.


A lot of work has gone into this, and while it is not perfect and there are certainly concerns from some parties with rushing it into the standard, on balance long years of discussions around similar questions to those you are asking have got us to this point.

Well, in this case I am trying to catch up with you. Can't do it without asking questions.


Regards,
Michael.

Michael Kilburn

unread,
Mar 8, 2018, 4:54:24 AM3/8/18
to std-pr...@isocpp.org
On Thu, Mar 8, 2018 at 12:15 AM, Michael Kilburn <crusad...@gmail.com> wrote:
i.e. idea is not about coroutine implementation or semantic of generated functions -- it is about having every client to see entire coroutine. At this stage I really don't care exactly how resulting state machine will behave or what methods it will expose.

One more argument -- consider template class and stackless coroutine. Both of them are language provided mechanisms for transforming some code into a C++ class -- i.e. fundamentally they do the same thing. In case of template class there is no way to hide it's declaration from users -- at any point entire class definition is visible to every user (even though you may tuck away definition of individual member functions into some TU). Stackless coroutine (in current form) hides resulting class -- this lack of symmetry bothers me. It forces designers to come up with mechanisms designed to workaround resulting problems (e.g. heap allocation elision allows us to delegate allocation to a party that knows state machine frame size and move allocation to stack, if certain criteria are met).


Regards,
Michael.



Todd Fleming

unread,
Mar 8, 2018, 8:43:16 AM3/8/18
to ISO C++ Standard - Future Proposals
On Wednesday, March 7, 2018 at 10:07:37 PM UTC-5, Michael Kilburn wrote:
I think you (and others) misunderstood my idea -- I do not advocate against current proposal, I am aiming only at one aspect of it -- namely hiding coroutine behind "plain function" declaration. In my idea compiler can generate precisely same language constructs during coroutine "instantiation" (as it does in current proposal) -- same Awaitable<T> as return types, etc. 

 
And yet it is one of selling points -- unfortunately it requires optional compiler optimization, i.e. you can't rely on it. Also, it requires compiler to be able to observe coroutine body, which naturally leads to a question -- "why hiding it behind 'plain function' facade at all?". Why don't we allow user to make related decisions explicitly (like where coroutine frame will be allocated).

Current proposal leads to a situation where I can't use coroutine in noexcept function -- because I can't rely on compiler to use heap allocation elision.

 
Instead of using templates as an analogy, how about lambda functions?

If lambda functions were like the current stackless coroutine proposal:
  • Type erasure would be mandatory, not opt-in.
  • Lambda functions would always live on the heap, except when the optimizer finds a way not to.
If the proposal was more like lambda functions:
  • The user could opt into type erasure.
  • The coroutine object could live on the stack, or inside another object, or anywhere else.

Other notes:
  • An unwrapped coroutine object probably wouldn't be callable. It may have an inconvenient interface that's only useful for wrappers to use.
  • The current proposal needs changes to std::future; this alternate idea may need additional changes. Maybe std::promise could do the type erasure for the async case?
  • My initial guess is the object couldn't be copyable or movable; it'd have to be RVO'ed to its final location.
Todd

Todd Fleming

unread,
Mar 8, 2018, 8:56:40 AM3/8/18
to ISO C++ Standard - Future Proposals
There's a potential problem with doing this inside clang:
  • To not be type erased, clang's front end would have to define the object, since it handles layout issues, sizeof(), and other things.
  • The object would contain the optimized state, but the front end can't predict this.
Todd

inkwizyt...@gmail.com

unread,
Mar 8, 2018, 9:08:50 AM3/8/18
to ISO C++ Standard - Future Proposals
 
If I understand this problem correctly then we could alter current proposition to allow optional embedding coroutine state into return object instead of using allocation.
We then would have 3 kinds: always heap, heap and fixed, only fixed.
Return value will need have something like `std::aligned_storage` with some arbitrary size. Then if values of coroutine fit this storage then whole function compile and use this storage for variables, if not and we do not have enabled option for heap allocation then function fall to compile.

This will have some drawback, one is you will need be careful when you stack generators you will need calculate size of things used inside (that are live during `co_yeld`):
generator<int, 30 + sizeof(bar(0))> foo(int i)
{
   
int j = i;
   
for (auto k : bar(i)) co_yeld (++j) * k;
}
Another is that return value will be not movable or copyable. Because otherwise you will need define how it will handle "local variables" when storage will be copied.

Lee Howes

unread,
Mar 8, 2018, 11:55:08 AM3/8/18
to std-pr...@isocpp.org
> Ultimately, stackless coroutine is a C++ object with methods like resume()/etc. I don't see why you shouldn't be able to treat it as such -- i.e. storing it as member variable of another class and access in a virtual  function.

Maybe I'm missing something here. You want the compiler to be able to see the body of the called coroutine, as I understand it. To be clear I am saying I need to be able to do something like:

class Foo {
  virtual Awaitable bar();
};

co_await my_foo->bar();

Now, you also said:
> Nothing prevents you from hiding state machine object (generated from couroutine declaration) behind a function manually

So I *think* you are saying that you can have a strictly visible coroutine that you can wrap in an async call to heap allocate it in case you want to pass the awaitable around, do some bulk collect operation or whatever. Is that right?

That's a reasonable point of view. At the moment we are pretty happy with the state of inlining when the functions are visible, and are more interested in ensuring there are no heap allocations in the example above, when they are not visible. We really can't afford to significantly increase the amount of compiler visible code we have, optimising for separate compilation is really the only option.

It seems I am missing a lot of background here -- how all these things are related to fibers or co_await?

It isn't directly, I just wanted to make it clear that your statement that "having to propagate 'co_await's down the call tree doesn't seem ideal" is absolutely not the position my library developers have. It is not universal - there are many languages in which this approach is a conscious design choice.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Nicol Bolas

unread,
Mar 8, 2018, 12:12:14 PM3/8/18
to ISO C++ Standard - Future Proposals
On Thursday, March 8, 2018 at 1:15:38 AM UTC-5, Michael Kilburn wrote:
On Wed, Mar 7, 2018 at 9:33 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Wednesday, March 7, 2018 at 10:07:37 PM UTC-5, Michael Kilburn wrote:
I think you (and others) misunderstood my idea -- I do not advocate against current proposal, I am aiming only at one aspect of it -- namely hiding coroutine behind "plain function" declaration.

But that's practically the point of the Coroutines TS design: that the compiler generates the coroutine machinery based entirely on what is going on inside of the function, not how the outside world uses it. And as will be discussed below, the ramifications of changing "only one aspect of it" fundamentally changes the nature of what you're talking about.

I am sticking to my guns (i.e. going to claim I am being misunderstood).I'll try to explain it again:
- stackless coroutines is when your code gets transformed int a C++ object which represents a state machine (and associated fluff to tie it with the rest of your code)
- the idea is to use template-like approach to "transformation" step -- as far as I am concerned it could generate exactly the same declarations as you write in MSVC version that supports Gor's proposal. I.e.

You are not being misunderstood. You're trying to equate all "stackless coroutine" proposals; you're claiming that they're all just minor variations of the same concept.

They are not. Coroutines TS is not just creating a resumable function; it's a lot more than that. It's implementing a specific model of coroutines, which is different from the model you're defining.

If you want "just resumable functions", then you need to understand that this really is a completely different proposal with a completely different design from the Coroutines TS. It's not a slight modification of Coroutines TS (which is evidence in the fact that your design literally removes all of the Coroutines TS's keywords).

Your design is a suspend-down coroutine model. When your kind of coroutine yields, it always returns to its nearest non-coroutine caller, who is responsible for scheduling its resumption at some point. Coroutines TS is a suspend-up coroutine model: the code responsible for scheduling its resumption is the code inside the coroutine, not necessarily its caller. If the code inside the coroutine suspends to the caller, that's because the particular coroutine chooses to.

Generators are the classic case of suspend-down. That's why your design works especially well with them, and why the Coroutines TS works so poorly with them. Continuations however are a classic case of suspend-up. This is why your design requires lots of extra work to use them, while the Coroutines TS makes it look like synchronous code.

Coroutines TS will never be as good at generators as your idea is. But neither will your idea be as good at asynchronous code transformations as the Coroutines TS is.

But the difference is that all translation units will see everything, unlike current state where (typically) only one will have a full vision. This would allow every call site to know everything about generated state machine (e.g. it's size). This will allow certain features:
- caller can explicitly control location of coroutine frame
- a coroutine will be clearly different from a function -- which would allow to avoid necessity of co_await or co_return.

No, you still need `co_await` and such (or something of a similar nature), because not all asychronous functions should have to be visible. So you still need to be able to suspend your function and schedule its resumption based on the return value from some other function.

The way the resumable expressions system handled this was to force you to do what the Coroutines TS effectively does when you invoke `co_await` and such: manually wrap your resumable function in a hidden lambda that stores a promise type and returns a future hooked into it.

The entire point of the Coroutines TS is to keep you from having to write such boilerplate. Every design decision is based on that.

Compiler can see that current coroutine calls another coroutine and generate required fluff automatically. You can introduce a new keyword (or convention) that will allow you to call coroutine in other way (but I doubt this need will be great).
- inlining of some of generated C++ object methods

Inlining is only relevant when dealing with suspend-down style coroutines. That is, generators and the like. Inlining is not relevant (or possible) when dealing with suspend-up continuations. This is why such inlining is an optimization to Coroutines TS rather than a requirement. It's a suspend-up model, so it does things in a suspend-up way.

- you need to update only one portion of compiler -- one that transforms coroutine declarations into state machine class declaration. No need to introduce additional logic in other areas.
- etc

i.e. idea is not about coroutine implementation or semantic of generated functions -- it is about having every client to see entire coroutine. At this stage I really don't care exactly how resulting state machine will behave or what methods it will expose.

You broke the rules. You didn't invoke `std::async`; you made a new function. Under this design, any asynchronous library will have to have `coro_` versions of all of its asynchronous functions, rather than just using `future`s or similar such types that allow you to apply continuations to them.

I see... one of the aims is to integrate coroutines with future/async stuff... I am probably behind on all this. All async libs I looked at used same approach -- public API consists of a function (async_foo) that takes a callback. Then someone somewhere cranks the event loop which eventually calls "process_events()" library function that in turn calls aforementioned callback.

So, for this library to add coroutine support you'd have to create second inline function (or macro) coro_foo() that takes address of resume() method of current coroutine, registers it using async_foo() and suspends current coroutine. I.e. in this design you still have to add second version of your async_foo() -- a coroutine-aware wrapper coro_foo(). I see no fault is this approach -- everything is clear and clean.

Now, if library instead exports "future<int> foo()" -- there is no need for coro_foo(), but implementation becomes less efficient and more complex. Because of type erasures, having to allocate memory, having to move values and other stuff (like synchronization). Am I right? Is it a good price for not having to add a (rather simple) coro_foo()?

It all depends: is the person calling the async routine the one who actually has the continuation? The `future.then`-style interface allows anyone at any time to hook a continuation into the process. Your `coro_foo` style requires that the exact caller be the one who provides the continuation.

And what if the continuation function itself needs a continuation? The caller has to provide that too. And what if the inner continuation needs to access something from the outer continuation? Well, you have to allocate that explicitly. And so forth.

`co_await`-style coding handles this with no manual intervention. Herb Sutter made a great presentation a while back (skip ahead to around 51 minutes) on the failings of explicit callback-style continuations through lambdas and such, and demonstrated how use of `await` can make asynchronous code look exactly like synchronous code.

I'll reproduce his example here, in case you're unwilling to watch the video.

Here's the synchronous code. It reads from a given filename, appending `suffix` to each "chunk" in the file. It's based on one of Microsoft's asynchronous file IO APIs (note that I've slightly adjusted some of the code):

string read(string filename, string suffix)
{
  istream
fi = open(filename).get();
 
string ret, chunk;
 
while((chunk = fi.read().get()).size())
    ret
+= chunk + suffix;

 
return ret;
}

All of the `.get()` calls are there to convert asynchronous tasks into synchronous operations. Our goal is to take this code and make it asynchronous to the caller.

That `while` loop is the pernicious part. You have to invoke `fi.read()`, but then you have to provide a continuation function to it. That continuation function must provoke additional `fi.read()` calls as needed. And each of those calls must pass a continuation function. Namely itself. That pretty much requires heap allocation, lambdas, and heap allocation of lambdas. Nobody wants to write that code, and nobody wants to debug it.

The Coroutines TS equivalent would look like this:

task<string> read(string filename, string suffix)
{
  istream
fi = co_await open(filename);
 
string ret, chunk;
 
while((chunk = co_await fi.read()).size())
    ret
+= chunk + suffix;

 
return ret;
}

You cannot get code that is simpler than that. And your `coro_func` version would not work, due to the need of this code to continue itself. You'd have to write a lambda, heap-allocate it, and do a bunch of other stuff to make the explicit continuation code work.

Coroutines TS exists to make that explicit work unnecessary.

The kind of design you've presented is exactly what the resumable expressions proposal used. Only it was a bit cleverer about it, such that you would just have a resumable function called "await" that could be overloaded for some "awaitable" type, which would do the scheduling and unpacking, returning the unpacked value once resumed. 

You really should look at that proposal; it's clearly what you want. And yes, it was looked at, but it didn't move forward past P0114R0.

Thank you, I will read it. But as I said -- it isn't what I was talking about.

How can you say it isn't what you were talking about when you haven't read it? It is essentially your design, only more fully fleshed out. Like your design, it marks coroutine functions. Unlike your design, it recognizes that within a coroutine, you might want to call a coroutine that suspends to you rather than to your caller, so it has syntax for that. Coroutines have to be defined inline, just like your design. And so forth.

And it has all of the downsides of your proposal. `co_await`-style continuations are harder to use, requiring explicit coding of wrappers and the like.

I am probably not the best communicator.

In any case -- I am not insisting that this is a brilliant idea that will turn the world on it's head. Just asking for opinions and if it was already considered.

And as previously stated, a design almost identical to yours was considered. As far as I can tell, it wasn't "rejected" so much as the people behind it stopped working on it. They took the paper design as far as they could, but implementing it would require a great deal of time and effort. Gor and Microsoft were willing/able to put that time and effort in; the people behind the resumable expressions proposal weren't.

Arthur O'Dwyer

unread,
Mar 8, 2018, 6:04:13 PM3/8/18
to ISO C++ Standard - Future Proposals
On Thursday, March 8, 2018 at 9:12:14 AM UTC-8, Nicol Bolas wrote:
[...]

The way the resumable expressions system handled this was to force you to do what the Coroutines TS effectively does when you invoke `co_await` and such: manually wrap your resumable function in a hidden lambda that stores a promise type and returns a future hooked into it.

The entire point of the Coroutines TS is to keep you from having to write such boilerplate. Every design decision is based on that.
[...]
`co_await`-style coding handles this with no manual intervention. Herb Sutter made a great presentation a while back (skip ahead to around 51 minutes) on the failings of explicit callback-style continuations through lambdas and such, and demonstrated how use of `await` can make asynchronous code look exactly like synchronous code.

I'll reproduce his example here, in case you're unwilling to watch the video.

Here's the synchronous code. It reads from a given filename, appending `suffix` to each "chunk" in the file. It's based on one of Microsoft's asynchronous file IO APIs (note that I've slightly adjusted some of the code):

string read(string filename, string suffix)
{
  istream
fi = open(filename).get();
 
string ret, chunk;
 
while((chunk = fi.read().get()).size())
    ret
+= chunk + suffix;

 
return ret;
}

All of the `.get()` calls are there to convert asynchronous tasks into synchronous operations. Our goal is to take this code and make it asynchronous to the caller.

That `while` loop is the pernicious part. You have to invoke `fi.read()`, but then you have to provide a continuation function to it. That continuation function must provoke additional `fi.read()` calls as needed. And each of those calls must pass a continuation function. Namely itself. That pretty much requires heap allocation, lambdas, and heap allocation of lambdas. Nobody wants to write that code, and nobody wants to debug it.

The Coroutines TS equivalent would look like this:

task<string> read(string filename, string suffix)
{
  istream
fi = co_await open(filename);
 
string ret, chunk;
 
while((chunk = co_await fi.read()).size())
    ret
+= chunk + suffix;

 
return ret;
}

You cannot get code that is simpler than that.

IMHO, this is a good example of a problem that is hard with futures and easier with co_foo syntax.
For the record, let me tackle your original example,

std::future<int> async_call(...) {
   
auto val = co_await std::async(some_function, ...);
    co_return val
+ 1;
}

This one is easy to do with something like Boost.Future. With my own toy futures library, it looks like this:

nonstd::future<int> async_call(...) {
   
return nonstd::async(some_function, ...)
       
.on_value([](auto val) { return val + 1; });
}

I even find this future-based version easier to understand (at the moment): I find it easier to trace the exceptional control flow "past" the on_value call, and just say "aha, if the future was not satisfied with a value, then we'll propagate the exception."  With the co_foo-based version, IIUC, you have to remember that every "co_await" includes an implicit "co_throw", so that the operand of the "co_return" statement will not be evaluated if some_function throws. (But most likely this is exactly as obvious in hindsight as the idea of exceptional codepaths to begin with, and I just need a bit more time to wrap my head around it!)

Now here's the tricky example. With co_foo syntax:

task<string> read(string filename, string suffix)
{
  istream
fi = co_await open(filename);
 
string ret, chunk;
 
while((chunk = co_await fi.read()).size())
    ret
+= chunk + suffix;

 
return ret;
}

(I'm guessing that task<T> is just another way to spell nonstd::future<T>.)
In future-based syntax, this loop would be rather messy:

nonstd::future<string> read_impl(string ret, istream fi)
{
 
return fi.read().on_value_f(
   
[ret = std::move(ret), fi = std::move(fi)](string chunk) mutable
   
{
     
if (chunk.size()) {

        ret
+= chunk + suffix;

       
return read_impl(std::move(ret), fi);
     
} else {
       
return nonstd::make_ready_future<string>(std::move(ret));
     
}
   
}
 
);
}

nonstd
::future<string> read(string filename, string suffix)
{
 
return open(filename).on_value_f([](istream fi) {
   
return read_impl("", std::move(fi));
 
});
}


(Notice that in both Herb's co_foo example and in mine, we ignore the fact that the initialization of `istream fi` from the awaited value of `open(...)` is highly likely to be slicing away important information.  Let's assume that this is some STL2-ish "value-semantic istream" that doesn't care about slicing issues.)

Personally I'm also skeptical of the Coroutines TS package (weird keywords, lots and lots of compiler magic, unclear customizability), but I do have to say that the "future.then" approach is... suboptimal. We desperately need some kind of idiom for working with continuations in C++, and Coroutines TS is gamely attempting to tackle that problem head-on.

–Arthur

Nicol Bolas

unread,
Mar 8, 2018, 8:49:15 PM3/8/18
to ISO C++ Standard - Future Proposals
On Thursday, March 8, 2018 at 6:04:13 PM UTC-5, Arthur O'Dwyer wrote:
(Notice that in both Herb's co_foo example and in mine, we ignore the fact that the initialization of `istream fi` from the awaited value of `open(...)` is highly likely to be slicing away important information.  Let's assume that this is some STL2-ish "value-semantic istream" that doesn't care about slicing issues.)

I don't think he was talking about std::istream. I'm guessing that's some Microsoft-defined asynchronous stream type.
 
Personally I'm also skeptical of the Coroutines TS package (weird keywords, lots and lots of compiler magic, unclear customizability), but I do have to say that the "future.then" approach is... suboptimal. We desperately need some kind of idiom for working with continuations in C++, and Coroutines TS is gamely attempting to tackle that problem head-on.

The only thing I disagree with here is that the "future.then" approach is suboptimal. I think it's perfectly reasonable to tie "thing that generates a value" with "apply process which manipulates a generated value, producing a new value". Now granted, I wouldn't put that in `future<T>`; I think it would make far more sense in a `task<T>` type, which represents a potentially asynchronous execution. Having such a division would allow us to avoid the terrible mistake of `std::async`'s return type, as well as represent ways of providing executors and so forth.

The problem with `.then` and similar continuation interfaces is that it is such a terrible interface to use. Despite how logical it is conceptually, it leads to such difficult to manage code in non-trivial cases that we have to introduce whole new language features just to be able to work with it in a reasonable fashion.

I don't much care for the Coroutines TS. It doesn't really mirror the way I normally work with tasks, and the lack of a form of "future.then" that allows control over exactly where asychronous tasks get executed makes it a non-starter to me. I personally would like to see resumable functions or its equivalent get into the standard at some point, even if it is alongside the Coroutines TS, just so that we can have generators that don't suck.

But I simply cannot deny that the Coroutines TS, from a human readability perspective, is absolutely peerless at its intended use case.

Michael Kilburn

unread,
Mar 9, 2018, 3:19:12 AM3/9/18
to std-pr...@isocpp.org
On Thu, Mar 8, 2018 at 7:43 AM, Todd Fleming <tbfl...@gmail.com> wrote:
On Wednesday, March 7, 2018 at 10:07:37 PM UTC-5, Michael Kilburn wrote:
I think you (and others) misunderstood my idea -- I do not advocate against current proposal, I am aiming only at one aspect of it -- namely hiding coroutine behind "plain function" declaration. In my idea compiler can generate precisely same language constructs during coroutine "instantiation" (as it does in current proposal) -- same Awaitable<T> as return types, etc. 

 
And yet it is one of selling points -- unfortunately it requires optional compiler optimization, i.e. you can't rely on it. Also, it requires compiler to be able to observe coroutine body, which naturally leads to a question -- "why hiding it behind 'plain function' facade at all?". Why don't we allow user to make related decisions explicitly (like where coroutine frame will be allocated).

Current proposal leads to a situation where I can't use coroutine in noexcept function -- because I can't rely on compiler to use heap allocation elision.

 
Instead of using templates as an analogy, how about lambda functions?

If lambda functions were like the current stackless coroutine proposal:
  • Type erasure would be mandatory, not opt-in.
  • Lambda functions would always live on the heap, except when the optimizer finds a way not to.
If the proposal was more like lambda functions:
  • The user could opt into type erasure.
  • The coroutine object could live on the stack, or inside another object, or anywhere else.

Yep, that is a good analogy too. And my point was that it opens few doors (if we need them), namely:
- ability to deal with stack machine object directly (i.e. allocate it on the stack, call it's methods,etc), which makes heap elision optimization irrelevant
- those methods can be inlined
- treat other coroutine calls made from this couroutine differently, i.e.:

we could flip this convention:

future<void> coro1()
    auto x = co_await coro2();  // suspend here until coro2 is in "data ready" state
    auto y = coro2();           // no suspend
}

to this one (note that fundamentally we change nothing -- we still need a way to distinguish between two types of calls):

coroutine void coro1()
    auto x = coro2()();             // can suspend, if coro2 is awaitable
    coro2 c2;
    auto y = co_nowait c2();        // no suspend, I suspect this use case will be rare
    // or  auto y = c2.resume();    // kick off coro2 until first suspend
    return;                         // no need for co_return
}

- and few others I no longer remember :-)

Keep in mind I don't insist on these changes, but idea was presented to with aim to make these options available in context of current Coroutine TS.



Other notes:
  • An unwrapped coroutine object probably wouldn't be callable. It may have an inconvenient interface that's only useful for wrappers to use.
  • The current proposal needs changes to std::future; this alternate idea may need additional changes. Maybe std::promise could do the type erasure for the async case?
  • My initial guess is the object couldn't be copyable or movable; it'd have to be RVO'ed to its final location.
Todd

--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/jJIO4ChPf-0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

Michael Kilburn

unread,
Mar 9, 2018, 3:23:42 AM3/9/18
to std-pr...@isocpp.org
On Thu, Mar 8, 2018 at 8:08 AM, <inkwizyt...@gmail.com> wrote:
On Thursday, March 8, 2018 at 10:54:24 AM UTC+1, Michael Kilburn wrote:
On Thu, Mar 8, 2018 at 12:15 AM, Michael Kilburn <crusad...@gmail.com> wrote:
i.e. idea is not about coroutine implementation or semantic of generated functions -- it is about having every client to see entire coroutine. At this stage I really don't care exactly how resulting state machine will behave or what methods it will expose.

One more argument -- consider template class and stackless coroutine. Both of them are language provided mechanisms for transforming some code into a C++ class -- i.e. fundamentally they do the same thing. In case of template class there is no way to hide it's declaration from users -- at any point entire class definition is visible to every user (even though you may tuck away definition of individual member functions into some TU). Stackless coroutine (in current form) hides resulting class -- this lack of symmetry bothers me. It forces designers to come up with mechanisms designed to workaround resulting problems (e.g. heap allocation elision allows us to delegate allocation to a party that knows state machine frame size and move allocation to stack, if certain criteria are met).

If I understand this problem correctly then we could alter current proposition to allow optional embedding coroutine state into return object instead of using allocation.
We then would have 3 kinds: always heap, heap and fixed, only fixed.
Return value will need have something like `std::aligned_storage` with some arbitrary size. Then if values of coroutine fit this storage then whole function compile and use this storage for variables, if not and we do not have enabled option for heap allocation then function fall to compile.

This will have some drawback, one is you will need be careful when you stack generators you will need calculate size of things used inside (that are live during `co_yeld`):
generator<int, 30 + sizeof(bar(0))> foo(int i)
{
   
int j = i;
   
for (auto k : bar(i)) co_yeld (++j) * k;
}
Another is that return value will be not movable or copyable. Because otherwise you will need define how it will handle "local variables" when storage will be copied.

As I said before -- what you are going to do with newly available design options is a different story. As far as I am concerned -- coroutine translation step may generate exactly same code (with keywords and all) that current TS suggests. I.e. the only effect would be that all coroutines will become inline. Though, I certainly would like to be able to control frame allocation.


--
Sincerely yours,
Michael.

Michael Kilburn

unread,
Mar 9, 2018, 3:35:17 AM3/9/18
to std-pr...@isocpp.org
On Thu, Mar 8, 2018 at 10:55 AM, Lee Howes <xri...@gmail.com> wrote:
> Ultimately, stackless coroutine is a C++ object with methods like resume()/etc. I don't see why you shouldn't be able to treat it as such -- i.e. storing it as member variable of another class and access in a virtual  function.

Maybe I'm missing something here. You want the compiler to be able to see the body of the called coroutine, as I understand it.

yes, it will be able to see body, resulting state machine object, frame size, etc

 
To be clear I am saying I need to be able to do something like:

class Foo {
  virtual Awaitable bar();
};

co_await my_foo->bar();

Hmm... I guess it should be possible:

coroutine void coro_bar(...) { ... }

class Foo {
    virtual Awaitable bar() { return cb_(); }
    coro_bar cb_;  // an instance of state-machine class generated from coro_bar's definition
};

does it make sense? I didn't think about it this way, but it should probably work.


Now, you also said:
> Nothing prevents you from hiding state machine object (generated from couroutine declaration) behind a function manually

So I *think* you are saying that you can have a strictly visible coroutine that you can wrap in an async call to heap allocate it in case you want to pass the awaitable around, do some bulk collect operation or whatever. Is that right?

Yes, similar to code above. Not sure what do you mean by "bulk collection operation", though.

 
That's a reasonable point of view. At the moment we are pretty happy with the state of inlining when the functions are visible, and are more interested in ensuring there are no heap allocations in the example above, when they are not visible. We really can't afford to significantly increase the amount of compiler visible code we have, optimising for separate compilation is really the only option.

You mean compilation time will suffer because now every TU will be translating that coroutine? It is the same problem as with templates. Plus, if you really worry about these -- move coroutine declarions into separate TU and hide them behind plain function.

--
Sincerely yours,
Michael.

Michael Kilburn

unread,
Mar 9, 2018, 4:10:36 AM3/9/18
to std-pr...@isocpp.org
On Thu, Mar 8, 2018 at 11:12 AM, Nicol Bolas <jmck...@gmail.com> wrote:
You are not being misunderstood. You're trying to equate all "stackless coroutine" proposals; you're claiming that they're all just minor variations of the same concept.

No, I don't. But you correctly noted that some of existing proposals already have "always visible" feature. That is ok -- I presented this idea in context of current proposal in order to enable options of:
- manually controlling generated state machine object (allocation, etc) to make heap elision optimization irrelevant
- changing convention -- e.g. use co_nowait instead of c_await (see my answer to Todd)
- few others

If you don't like any of these options -- naturally, that idea is a no go. But maybe some of them can be appealing?


They are not. Coroutines TS is not just creating a resumable function; it's a lot more than that. It's implementing a specific model of coroutines, which is different from the model you're defining.

If you want "just resumable functions", then you need to understand that this really is a completely different proposal with a completely different design from the Coroutines TS. It's not a slight modification of Coroutines TS (which is evidence in the fact that your design literally removes all of the Coroutines TS's keywords).

keywords maybe removed but their intended effect stays unchanged.

 
Your design is a suspend-down coroutine model. When your kind of coroutine yields, it always returns to its nearest non-coroutine caller, who is responsible for scheduling its resumption at some point. Coroutines TS is a suspend-up coroutine model: the code responsible for scheduling its resumption is the code inside the coroutine, not necessarily its caller. If the code inside the coroutine suspends to the caller, that's because the particular coroutine chooses to.

Generators are the classic case of suspend-down. That's why your design works especially well with them, and why the Coroutines TS works so poorly with them. Continuations however are a classic case of suspend-up. This is why your design requires lots of extra work to use them, while the Coroutines TS makes it look like synchronous code.

Coroutines TS will never be as good at generators as your idea is. But neither will your idea be as good at asynchronous code transformations as the Coroutines TS is.

Again, the idea presented doesn't fundamentally change anything in current proposal with respect to returned types (future/etc) or the way it suspends. All it does is exposes generated state machine class to every user, thus enabling certain options that designers may or may not take.


Compiler can see that current coroutine calls another coroutine and generate required fluff automatically. You can introduce a new keyword (or convention) that will allow you to call coroutine in other way (but I doubt this need will be great).
- inlining of some of generated C++ object methods

Inlining is only relevant when dealing with suspend-down style coroutines. That is, generators and the like. Inlining is not relevant (or possible) when dealing with suspend-up continuations. This is why such inlining is an optimization to Coroutines TS rather than a requirement. It's a suspend-up model, so it does things in a suspend-up way.

True, I was thinking about other methods given coroutine object may have. Maybe this would allow to avoid type erasure? I didn't think about it, tbh.
 

I see... one of the aims is to integrate coroutines with future/async stuff... I am probably behind on all this. All async libs I looked at used same approach -- public API consists of a function (async_foo) that takes a callback. Then someone somewhere cranks the event loop which eventually calls "process_events()" library function that in turn calls aforementioned callback.

So, for this library to add coroutine support you'd have to create second inline function (or macro) coro_foo() that takes address of resume() method of current coroutine, registers it using async_foo() and suspends current coroutine. I.e. in this design you still have to add second version of your async_foo() -- a coroutine-aware wrapper coro_foo(). I see no fault is this approach -- everything is clear and clean.

Now, if library instead exports "future<int> foo()" -- there is no need for coro_foo(), but implementation becomes less efficient and more complex. Because of type erasures, having to allocate memory, having to move values and other stuff (like synchronization). Am I right? Is it a good price for not having to add a (rather simple) coro_foo()?

It all depends: is the person calling the async routine the one who actually has the continuation? The `future.then`-style interface allows anyone at any time to hook a continuation into the process. Your `coro_foo` style requires that the exact caller be the one who provides the continuation.

And what if the continuation function itself needs a continuation? The caller has to provide that too. And what if the inner continuation needs to access something from the outer continuation? Well, you have to allocate that explicitly. And so forth.

How often it will happen? Your typical caller of a coroutine is probably another coroutine that awaits on it. In this case it works just fine. In others -- related boilerplate can be buried into some utility function(s).

 
`co_await`-style coding handles this with no manual intervention. Herb Sutter made a great presentation a while back (skip ahead to around 51 minutes) on the failings of explicit callback-style continuations through lambdas and such, and demonstrated how use of `await` can make asynchronous code look exactly like synchronous code.

I'll reproduce his example here, in case you're unwilling to watch the video.

Ok, now that was below the belt. I did watch it.


Here's the synchronous code. It reads from a given filename, appending `suffix` to each "chunk" in the file. It's based on one of Microsoft's asynchronous file IO APIs (note that I've slightly adjusted some of the code):
 
<lots of snipping> 

And your `coro_func` version would not work, due to the need of this code to continue itself. You'd have to write a lambda, heap-allocate it, and do a bunch of other stuff to make the explicit continuation code work.

Why it wouldn't work? coro_foo() will end up calling async_foo() passing my current coroutine resume() as a callback and suspending (i.e. returning). Code will be practically the same. I either fantastically missing something or we are not on the same page.


The kind of design you've presented is exactly what the resumable expressions proposal used. Only it was a bit cleverer about it, such that you would just have a resumable function called "await" that could be overloaded for some "awaitable" type, which would do the scheduling and unpacking, returning the unpacked value once resumed. 

You really should look at that proposal; it's clearly what you want. And yes, it was looked at, but it didn't move forward past P0114R0.

Thank you, I will read it. But as I said -- it isn't what I was talking about.

How can you say it isn't what you were talking about when you haven't read it?

I meant I did not propose a new design -- I presented an idea (for current design) that may change it a bit opening up some options.

--
Sincerely yours,
Michael.

Lee Howes

unread,
Mar 9, 2018, 12:30:31 PM3/9/18
to std-pr...@isocpp.org
> coroutine void coro_bar(...) { ... }
> class Foo {
>   virtual Awaitable bar() { return cb_(); }
>     coro_bar cb_;  // an instance of state-machine class generated from coro_bar's definition
> };

> does it make sense? I didn't think about it this way, but it should probably work.

You've added shared state. Now what happens if I want to do something like this:

Foo f;
thread t([&](){await  f.bar());
thread t2([&](){await  f.bar());
t.join();
t2.join();

I need two copies of cb_, or for cb_ to use heap allocation magic to hide that, or at the very least to synchronize the coroutine state.

 It is the same problem as with templates. 

It certainly is. It is a huge problem.
move coroutine declarions into separate TU and hide them behind plain function.



--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

Lee Howes

unread,
Mar 9, 2018, 12:33:24 PM3/9/18
to std-pr...@isocpp.org
Ok, enter sent the message. I wanted to also add:

> move coroutine declarions into separate TU and hide them behind plain function.

but then I am back needing hooks like the coroutines TS provides - but unlike the coroutines TS I don't see from your example what type that plain function should return. So now, I wrap my coroutine in a free function foo.

something foo();

bar() {
  await foo()? I'm not sure what the syntax would be here for awaiting on a plain function... do I need a coroutine type defined here? 
}

Nicol Bolas

unread,
Mar 9, 2018, 12:38:20 PM3/9/18
to ISO C++ Standard - Future Proposals
On Friday, March 9, 2018 at 4:10:36 AM UTC-5, Michael Kilburn wrote:
On Thu, Mar 8, 2018 at 11:12 AM, Nicol Bolas <jmck...@gmail.com> wrote:
You are not being misunderstood. You're trying to equate all "stackless coroutine" proposals; you're claiming that they're all just minor variations of the same concept.

No, I don't. But you correctly noted that some of existing proposals already have "always visible" feature. That is ok -- I presented this idea in context of current proposal in order to enable options of:
- manually controlling generated state machine object (allocation, etc) to make heap elision optimization irrelevant
- changing convention -- e.g. use co_nowait instead of c_await (see my answer to Todd)
- few others

If you don't like any of these options -- naturally, that idea is a no go. But maybe some of them can be appealing?

It's not a question of liking or disliking the options. These options are irrelevant for the use cases that the Coroutines TS is intended to work with. They force the system to move away from the optimal syntax for suspend-up-style programming and towards a suspend-down model.

The basic concepts you're describing are appealing; I implore you to read P0114. But they are not appropriate for this proposal. Your idea is trying to turn a suspend-up system into a suspend-down one.

They are not. Coroutines TS is not just creating a resumable function; it's a lot more than that. It's implementing a specific model of coroutines, which is different from the model you're defining.

If you want "just resumable functions", then you need to understand that this really is a completely different proposal with a completely different design from the Coroutines TS. It's not a slight modification of Coroutines TS (which is evidence in the fact that your design literally removes all of the Coroutines TS's keywords).

keywords maybe removed but their intended effect stays unchanged.

How can something which no longer exist have an effect?

For example, how does one of your coroutines yield values? What's funny is that, since I've read P0114, I have a pretty good idea what your answer will be ;)

Your design is a suspend-down coroutine model. When your kind of coroutine yields, it always returns to its nearest non-coroutine caller, who is responsible for scheduling its resumption at some point. Coroutines TS is a suspend-up coroutine model: the code responsible for scheduling its resumption is the code inside the coroutine, not necessarily its caller. If the code inside the coroutine suspends to the caller, that's because the particular coroutine chooses to.

Generators are the classic case of suspend-down. That's why your design works especially well with them, and why the Coroutines TS works so poorly with them. Continuations however are a classic case of suspend-up. This is why your design requires lots of extra work to use them, while the Coroutines TS makes it look like synchronous code.

Coroutines TS will never be as good at generators as your idea is. But neither will your idea be as good at asynchronous code transformations as the Coroutines TS is.

Again, the idea presented doesn't fundamentally change anything in current proposal with respect to returned types (future/etc) or the way it suspends. All it does is exposes generated state machine class to every user, thus enabling certain options that designers may or may not take.

And by forcing every user to have to interact with such state (even if that interaction is just to pass it to a utility function), it makes it more difficult to deal with suspend-up programming.

I see... one of the aims is to integrate coroutines with future/async stuff... I am probably behind on all this. All async libs I looked at used same approach -- public API consists of a function (async_foo) that takes a callback. Then someone somewhere cranks the event loop which eventually calls "process_events()" library function that in turn calls aforementioned callback.

So, for this library to add coroutine support you'd have to create second inline function (or macro) coro_foo() that takes address of resume() method of current coroutine, registers it using async_foo() and suspends current coroutine. I.e. in this design you still have to add second version of your async_foo() -- a coroutine-aware wrapper coro_foo(). I see no fault is this approach -- everything is clear and clean.

Now, if library instead exports "future<int> foo()" -- there is no need for coro_foo(), but implementation becomes less efficient and more complex. Because of type erasures, having to allocate memory, having to move values and other stuff (like synchronization). Am I right? Is it a good price for not having to add a (rather simple) coro_foo()?

It all depends: is the person calling the async routine the one who actually has the continuation? The `future.then`-style interface allows anyone at any time to hook a continuation into the process. Your `coro_foo` style requires that the exact caller be the one who provides the continuation.

And what if the continuation function itself needs a continuation? The caller has to provide that too. And what if the inner continuation needs to access something from the outer continuation? Well, you have to allocate that explicitly. And so forth.

How often it will happen?

Then-style continuations have flexibility as a big part of their design. You're not required to put all continuations in a single call graph. You can pass `future`s or whatever to whomever wants to add to them, and they can add as many functions as they like, mutating the value with each added functor.

This happens as often as the user wants it to happen. It is not your place to interfere in that.

Your typical caller of a coroutine is probably another coroutine that awaits on it. In this case it works just fine. In others -- related boilerplate can be buried into some utility function(s).

Please read P0114; everything you just said is almost exactly what it says. A keyword to prevent awaiting on a coroutine within a coroutine. A keyword to explicitly declare a function to be resumable/coroutine. Having to use utility function(s) to support suspend-up programming. All of these are core aspects of P0114.

You're just using different words, but the design you've come up with is the same. And that's important because that design is very much not a minor change to Coroutines TS.

`co_await`-style coding handles this with no manual intervention. Herb Sutter made a great presentation a while back (skip ahead to around 51 minutes) on the failings of explicit callback-style continuations through lambdas and such, and demonstrated how use of `await` can make asynchronous code look exactly like synchronous code.

I'll reproduce his example here, in case you're unwilling to watch the video.

Ok, now that was below the belt. I did watch it.


Here's the synchronous code. It reads from a given filename, appending `suffix` to each "chunk" in the file. It's based on one of Microsoft's asynchronous file IO APIs (note that I've slightly adjusted some of the code):
 
<lots of snipping> 

And your `coro_func` version would not work, due to the need of this code to continue itself. You'd have to write a lambda, heap-allocate it, and do a bunch of other stuff to make the explicit continuation code work.

Why it wouldn't work? coro_foo() will end up calling async_foo() passing my current coroutine resume() as a callback and suspending (i.e. returning). Code will be practically the same. I either fantastically missing something or we are not on the same page.

Show me how the code would "be practically the same". Present the equivalent code using your idea.

And please note that it must be the equivalent code: based on `.then` style continuations and the like. So the return value still needs to be a `task<string>`. And the function itself must be a regular function (and thus cannot directly use your `coroutine` keyword).

The kind of design you've presented is exactly what the resumable expressions proposal used. Only it was a bit cleverer about it, such that you would just have a resumable function called "await" that could be overloaded for some "awaitable" type, which would do the scheduling and unpacking, returning the unpacked value once resumed. 

You really should look at that proposal; it's clearly what you want. And yes, it was looked at, but it didn't move forward past P0114R0.

Thank you, I will read it. But as I said -- it isn't what I was talking about.

How can you say it isn't what you were talking about when you haven't read it?

I meant I did not propose a new design -- I presented an idea (for current design) that may change it a bit opening up some options.

You keep saying that it's not a new design, and yet all evidence says that it is. Here are a number of aspects of the design of Coroutines TS:

1. Coroutines appear to be regular functions using a regular interface. Function overloading works as normal with them.
2. Coroutines have a built-in and hidden state object with an internal return value object.
3. The internal functor that represents a coroutine is only visible to the code that absolutely needs to know it exists: the code which schedules its resumption.
4. The scheduling of the resumption of the coroutine is done entirely from within the coroutine itself. It may delegate this to the caller, but this is something it must explicitly choose to do.
5. There is direct and effortless support for suspend-up via the `co_await` operator and its associated machinery.

I would say that most of these are fundamental aspects of what makes the Coroutines TS what it is. Here is how your design compares:

1. Coroutines are not regular functions. You haven't explained how function overloading works with coroutines.
2. Coroutines do not have built-in and hidden state objects; they directly expose this state to the code talking about them.
3. The code calling a coroutine must interact with the internal functor that represents a coroutine.
4. Scheduling the resumption of the coroutine is always granted to the caller.
5. Suspend-up requires explicit effort from the caller. There is apparently no `co_await` operator at all.

It's not even clear if your system allows the user to create their own promise/future types. Coroutines TS allows this by default; the coroutine machinery inspects the coroutine function's signature to determine what the internal promise object will be (if your return value is std::future<T>, the promise type would be std::promise<T>). Yours seems to require the caller to decide what kind of promise/future will be used. So there's another difference.

How can you say that these changes do not constitute a "new design" (and, as I keep reminding you, your design is almost exactly P0114)? You're not making a minor tweak to an existing propsoal; you're fundamentally changing it. You even seem to understand that when you agreed with Todd's analogy with lambdas. An "always type-erased" lambda is very much a new design compared to a "not type-erased" lambda. They may do similar things, but they do them in a fundamentally different way.

To you and your use cases, this may not seem like a big change. But every use case you've presented is for suspend-down, not for suspend-up style coding. And suspend-up is what the Coroutines TS is all about.

So this very much is a new design.

Michael Kilburn

unread,
Mar 9, 2018, 3:14:05 PM3/9/18
to std-pr...@isocpp.org
On Fri, Mar 9, 2018 at 11:33 AM, Lee Howes <xri...@gmail.com> wrote:
On 9 March 2018 at 09:30, Lee Howes <xri...@gmail.com> wrote:
> coroutine void coro_bar(...) { ... }
> class Foo {
>   virtual Awaitable bar() { return cb_(); }
>     coro_bar cb_;  // an instance of state-machine class generated from coro_bar's definition
> };

> does it make sense? I didn't think about it this way, but it should probably work.

You've added shared state. Now what happens if I want to do something like this:

Foo f;
thread t([&](){await  f.bar());
thread t2([&](){await  f.bar());
t.join();
t2.join();

I need two copies of cb_, or for cb_ to use heap allocation magic to hide that, or at the very least to synchronize the coroutine state.

Ah, now I understand what you wanted to do. In this case:

coroutine void coro_bar(...) { ... }
class Foo {
    virtual Awaitable bar()
    {
        auto cb = new coro_bar(); // basically same thing current TS does under the hood
        return (*cb)();
    }
};
 

Ok, enter sent the message. I wanted to also add:
> move coroutine declarions into separate TU and hide them behind plain function.
but then I am back needing hooks like the coroutines TS provides - but unlike the coroutines TS I don't see from your example what type that plain function should return. So now, I wrap my coroutine in a free function foo.
something foo();
bar() {
  await foo()? I'm not sure what the syntax would be here for awaiting on a plain function... do I need a coroutine type defined here? 
}

corotuine void coro_foo() {...}

awaitable<void> bar()
{
    auto cb = new coro_foo();
    return (*cb)();
}

// somewhere in another TU
awaitable<void> bar();

co_await bar();


Sincerely yours,
Michael.

Michael Kilburn

unread,
Mar 9, 2018, 5:00:43 PM3/9/18
to std-pr...@isocpp.org
On Fri, Mar 9, 2018 at 11:38 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Friday, March 9, 2018 at 4:10:36 AM UTC-5, Michael Kilburn wrote:
On Thu, Mar 8, 2018 at 11:12 AM, Nicol Bolas <jmck...@gmail.com> wrote:
You are not being misunderstood. You're trying to equate all "stackless coroutine" proposals; you're claiming that they're all just minor variations of the same concept.

No, I don't. But you correctly noted that some of existing proposals already have "always visible" feature. That is ok -- I presented this idea in context of current proposal in order to enable options of:
- manually controlling generated state machine object (allocation, etc) to make heap elision optimization irrelevant
- changing convention -- e.g. use co_nowait instead of c_await (see my answer to Todd)
- few others

If you don't like any of these options -- naturally, that idea is a no go. But maybe some of them can be appealing?

It's not a question of liking or disliking the options. These options are irrelevant for the use cases that the Coroutines TS is intended to work with. They force the system to move away from the optimal syntax for suspend-up-style programming and towards a suspend-down model.

The basic concepts you're describing are appealing; I implore you to read P0114. But they are not appropriate for this proposal. Your idea is trying to turn a suspend-up system into a suspend-down one.

No, the idea is to force all coroutines to be inline and clearly mark them as coroutines. At minimum, everything else can stay the same -- return types, semantics, co_* keywords, etc.

 
They are not. Coroutines TS is not just creating a resumable function; it's a lot more than that. It's implementing a specific model of coroutines, which is different from the model you're defining.

If you want "just resumable functions", then you need to understand that this really is a completely different proposal with a completely different design from the Coroutines TS. It's not a slight modification of Coroutines TS (which is evidence in the fact that your design literally removes all of the Coroutines TS's keywords).

keywords maybe removed but their intended effect stays unchanged.

How can something which no longer exist have an effect?

like this:
coroutine ... coro1() { ... }

coroutine ... coro2() {
    coro1();                        // we know coro1 is a coroutine (because coro1 is marked so) and we know it is being invoked from a coroutine(because coro2 is marked so)
                                    // therefore we may choose to await by default when calling it
    auto x = co_noawait coro1();    // ... and (for example) require a keyword for non-awaiting call (to distinguish between two calling modes)
}

again -- it is just an option, it allows us to use co_await on coroutine calls (while retaining co_await semantics). You may argue that having explicit co_await is better for aesthetic (or some other) purpose -- I am fine with that.


For example, how does one of your coroutines yield values? What's funny is that, since I've read P0114, I have a pretty good idea what your answer will be ;)

same as it is done today -- co_yield


Why it wouldn't work? coro_foo() will end up calling async_foo() passing my current coroutine resume() as a callback and suspending (i.e. returning). Code will be practically the same. I either fantastically missing something or we are not on the same page.

Show me how the code would "be practically the same". Present the equivalent code using your idea.

And please note that it must be the equivalent code: based on `.then` style continuations and the like. So the return value still needs to be a `task<string>`. And the function itself must be a regular function (and thus cannot directly use your `coroutine` keyword).

My idea has nothing to do with all this -- here we were exploring differences between library that uses callback-based API (it will have async_foo() and coro_foo() for each "foo") and library that uses futures/tasks (which will have only one foo()). My note was that I don't see particular advantage of second approach. I didn't set the goal of reproducing .then-style continuations, in fact question was "why would you use futures approach if other one is more efficient and can produce very similar client code?". Nevertheless, here is the blueprint of how it may look like:

void async_foo(..., user_cb, user_cb_data);

inline string coro_foo(...)     // as mentioned before coro_foo has to be inline or macro (to be able to have access to coroutine that calls it)
{
    string ret;
    struct control_struct {
        ...
    } control(&ret, &current_coro.resume);
    
    auto cb = [](..., user_cb_data) {
        ((control_struct*)user_cb_data)->set_result_and_call_resume( make_string(...) );
    }; 
    async_foo(..., cb, &control);
    suspend_current_coro;
    return ret;
}

// user code:
coroutine string coro_user1(...)
{
  string chunk = coro_foo(...);
  while((chunk = coro_foo(...)).size()) ...;
  ...
}

i.e. it can be done without heap allocations, etc and you still end up with the same simple user-side code. This probably can be optimized further -- coroutine can have multiple resume() methods with different signature (so that data foo passed into callback could be passed directly into coroutine as part of resumption (removing need for "string ret")). Or maybe some trick to avoid calling "string ret" ctor -- that lambda can construct "ret" in-place.

but I'll repeat again -- this detraction has nothing to do with my idea!!!!! It was an attempt to figure out how library API (and coroutines) may look like without using futures/etc.


The kind of design you've presented is exactly what the resumable expressions proposal used. Only it was a bit cleverer about it, such that you would just have a resumable function called "await" that could be overloaded for some "awaitable" type, which would do the scheduling and unpacking, returning the unpacked value once resumed. 

You really should look at that proposal; it's clearly what you want. And yes, it was looked at, but it didn't move forward past P0114R0.

Thank you, I will read it. But as I said -- it isn't what I was talking about.

How can you say it isn't what you were talking about when you haven't read it?

I meant I did not propose a new design -- I presented an idea (for current design) that may change it a bit opening up some options.

You keep saying that it's not a new design, and yet all evidence says that it is. Here are a number of aspects of the design of Coroutines TS:

1. Coroutines appear to be regular functions using a regular interface. Function overloading works as normal with them.
2. Coroutines have a built-in and hidden state object with an internal return value object.
3. The internal functor that represents a coroutine is only visible to the code that absolutely needs to know it exists: the code which schedules its resumption.
4. The scheduling of the resumption of the coroutine is done entirely from within the coroutine itself. It may delegate this to the caller, but this is something it must explicitly choose to do.
5. There is direct and effortless support for suspend-up via the `co_await` operator and its associated machinery.

I would say that most of these are fundamental aspects of what makes the Coroutines TS what it is. Here is how your design compares:

1. Coroutines are not regular functions. You haven't explained how function overloading works with coroutines.
2. Coroutines do not have built-in and hidden state objects; they directly expose this state to the code talking about them.
3. The code calling a coroutine must interact with the internal functor that represents a coroutine.
4. Scheduling the resumption of the coroutine is always granted to the caller.
5. Suspend-up requires explicit effort from the caller. There is apparently no `co_await` operator at all.

It's not even clear if your system allows the user to create their own promise/future types. Coroutines TS allows this by default; the coroutine machinery inspects the coroutine function's signature to determine what the internal promise object will be (if your return value is std::future<T>, the promise type would be std::promise<T>). Yours seems to require the caller to decide what kind of promise/future will be used. So there's another difference.

How can you say that these changes do not constitute a "new design" (and, as I keep reminding you, your design is almost exactly P0114)? You're not making a minor tweak to an existing propsoal; you're fundamentally changing it. You even seem to understand that when you agreed with Todd's analogy with lambdas. An "always type-erased" lambda is very much a new design compared to a "not type-erased" lambda. They may do similar things, but they do them in a fundamentally different way.

To you and your use cases, this may not seem like a big change. But every use case you've presented is for suspend-down, not for suspend-up style coding. And suspend-up is what the Coroutines TS is all about.

So this very much is a new design.

Here I can only bang my head against the desk -- you've decided for yourself what I am trying to present and no amount of arguing can cause you to budge. I can only give up :-)

--
Sincerely yours,
Michael.

Nicol Bolas

unread,
Mar 9, 2018, 6:14:28 PM3/9/18
to ISO C++ Standard - Future Proposals
On Friday, March 9, 2018 at 5:00:43 PM UTC-5, Michael Kilburn wrote:
On Fri, Mar 9, 2018 at 11:38 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Friday, March 9, 2018 at 4:10:36 AM UTC-5, Michael Kilburn wrote:
On Thu, Mar 8, 2018 at 11:12 AM, Nicol Bolas <jmck...@gmail.com> wrote:
You are not being misunderstood. You're trying to equate all "stackless coroutine" proposals; you're claiming that they're all just minor variations of the same concept.

No, I don't. But you correctly noted that some of existing proposals already have "always visible" feature. That is ok -- I presented this idea in context of current proposal in order to enable options of:
- manually controlling generated state machine object (allocation, etc) to make heap elision optimization irrelevant
- changing convention -- e.g. use co_nowait instead of c_await (see my answer to Todd)
- few others

If you don't like any of these options -- naturally, that idea is a no go. But maybe some of them can be appealing?

It's not a question of liking or disliking the options. These options are irrelevant for the use cases that the Coroutines TS is intended to work with. They force the system to move away from the optimal syntax for suspend-up-style programming and towards a suspend-down model.

The basic concepts you're describing are appealing; I implore you to read P0114. But they are not appropriate for this proposal. Your idea is trying to turn a suspend-up system into a suspend-down one.

No, the idea is to force all coroutines to be inline and clearly mark them as coroutines. At minimum, everything else can stay the same -- return types, semantics, co_* keywords, etc.

It should be noted that the original coroutines design did indeed require a keyword on the coroutine function declaration/definition: `async`. This was dropped due to lack of need, since the outside world doesn't interact with coroutines in a different way from regular functions.

If "everything else can stay the same" about the Coroutines TS, then you can get the exact same effect of what you want by doing this:

[[coroutine]] inline future<int> some_coroutine(...) {...}

That is, if you want to encourage the compiler to generate nicer code by adding some syntax and declaring the function inline, then you are free to do so.

But this gains you nothing. After all, if "everything else can stay the same", then the coroutine handle is still type-erased. The stack still has to be declared as heap allocated, since there is no guarantee that the code being given the coroutine handle will be inline as well or otherwise visible (remember: the Coroutines TS's "semantics" are suspend-up; the caller does not necessarily get the handle). You still have to `co_await` or `co_yield` throughout your call stack to get to the eventual non-coroutine caller. And so forth.

So if this "at a minimum" suggestion is what you really want, then literally every problem you raise with the Coroutines TS remains unresolved. What's the point? It doesn't fix anything you believe is wrong.

Clearly, if you want to solve those problems, declaring these functions `inline` and slapping a keyword on them alone isn't going to get it done. You have to actually change the semantics of the Coroutines TS.

Which is exactly what you've suggested, even in your original post. So again, it's not clear why you're talking about some "at a minimum" that doesn't match what you've actually asked for.

For example, how does one of your coroutines yield values? What's funny is that, since I've read P0114, I have a pretty good idea what your answer will be ;)

same as it is done today -- co_yield

The behavior of the `co_yield` expression is based on the concept of a promise object, defined by the Coroutines TS. The type of the promise object of a coroutine is based on the signature of the coroutine function, using a customization point. This typically only bothers to look at the return type, but it could look at the full signature.

`std::future<int>`, when used as a coroutine return value, would have `std::promise<int>` be the promise object type. For some hypothetical `generator<int>` return type, you would need to provide a `promise_generator<int>` type to match. The current Coroutines TS proposal doesn't have one in the standard library, but if you're going to make a generator coroutine, that's how it would work.

But your `coroutine`-labeled functions no longer return types like `future<int>` or `generator<int>`. This is evident by every example you've given here, including the very first one:

coroutine int mycoro(int a, char* b) { ... }  // has to be defined in declaration

That function returns an `int`.

And since you have said nothing about the nature of the coroutine promise object in your design, I must assume that your `coroutine`-qualified functions have no such concept. And if your design still has the promise object, then you need to explain how `mycoro` maps from returning `int` to `promise<int>` or `promise_generator<int>` would work, since literally everything in the Coroutines TS is based on the promise object type.

Because `co_yield`'s behavior is based on promise objects, it is no longer clear what `co_yield` does in your design. So I contest the idea that it is the "same as it is done today". Even if you keep the keyword, the actual behavior of `co_yield` under your design has radically changed.

In the current design, `co_yield` stores a value in the promise object by calling a specific function, then halts execution of the function through the promise object, putting the coroutine_handle to be resumed in the return value object which the caller then gets.

Since your design apparently rips out all of stuff that `co_yield` is based on, I don't know what `co_yield` does in your design. And while what you want it to do might conceptually have similar effects, it will achieve these effects via completely different mechanisms. As such, this keyword will not be the "same as it is done today".

Here I can only bang my head against the desk -- you've decided for yourself what I am trying to present and no amount of arguing can cause you to budge. I can only give up :-)

Maybe because what you are trying to present changes with every post you make. Sometimes you say that we don't need `co_await` and instead use `co_noawait` or some such. Sometimes you say that we'd still have `co_await`. Sometimes you say that you just want inlining with no changes to anything else. Sometimes you say that a coroutine function is an object with some special properties that you never fully define.

It's impossible to talk concretely about a nebulous, moving target. All I can do is look at the code you present and try to deduce from it what you're trying to do.

For example, let's look at this piece of code:


    corotuine void coro_foo() {...}

    awaitable
<void> bar()
   
{
       
auto cb = new coro_foo();
       
return (*cb)();
   
}

I can only hypothesize as to what this code is actually doing. You're invoking the `new` operator on what appears to be a value of type `void`. But since you've talked about coroutines exposing some kind of object to the user, I assume that using `()` on an identifier that is a coroutine function yields some kind of object. Or perhaps `coro_foo` is grammatically a typename (in which case, coroutines cannot be overloaded). Either one would work.

So there's this object that got heap allocated. And then you dereference the pointer to it, and invoke its `operator()` overload, returning the result of it. What does that do?

Again, I can only guess. There's a heap object that represents a coroutine that has not yet started execution. You invoke `operator()` on it, and return the result of that, constructing a thing called `awaitable<void>`. So  I'm guessing this expression kicks off the coroutine. But does it actually invoke the coroutine function at that point?

What does the operator() return? Is it in some way related to `(*cb)`? And if not, how does `cb` get deleted?

None of that is well-defined.

So here's an idea: write out an actual proposal. Not vague ideas, but a fully-fleshed out design. And I mean something that is a fully defined as the current Coroutines TS. I should be able to read your proposal, look at any code that uses it, and know exactly what every expression is and how it will behave. And right now, I cannot say that about anything you've suggested.

I want to stop playing 20 questions with you about what you want. Lay out the entire scheme.

And no; simply saying "be inline and clearly mark them as coroutines" is not that. The above code is very far from that.
Reply all
Reply to author
Forward
0 new messages