Resumable expressions p0114r0 vs async/await P0057R0

1,178 views
Skip to first unread message

Germán Diago

unread,
Oct 3, 2015, 7:41:18 AM10/3/15
to ISO C++ Standard - Future Proposals
Hello everyone,

I do not mean to start a flame here, but I am still wondering why the coroutines from
P0057R0 are still being considered.

For what it is worth, I find the paper from Christopher Kohlhoff very clarifying, very well
reasoned, and providing alternatives for all the important use cases from P0057R0 with
superior implementations.

I still share the same concerns as before for P0057R0, mainly:

- mandatory type erasure.
- as Christopher mentions, embedding a scheduler into the language is not a nice thing.
- viral await is also something to be aware of.


On top of that, he shows alternatives for implementations:

1. Generators (reified and type-erased).
2. await.



You also have yield as an object, which I think can be of advantage in many situations.

But my real question is, since I am not an expert:

1. There is something that can be done in P0057R0 that simply cannot be done by resumable expressions + reasonable library support?

For me await/async + embedded scheduler is something like getting married to an implementation detail that
a run-time must support. The mandatory type-erasure is not nice, compared to being able to generate
what you would write by hand, a function object, which is what resumable expressions do. 

Am I missing anything here? As I say, my knowledge is quite limited in this area.



Nicol Bolas

unread,
Oct 3, 2015, 9:57:18 AM10/3/15
to ISO C++ Standard - Future Proposals


On Saturday, October 3, 2015 at 7:41:18 AM UTC-4, Germán Diago wrote:
Hello everyone,

I do not mean to start a flame here, but I am still wondering why the coroutines from
P0057R0 are still being considered.

For what it is worth, I find the paper from Christopher Kohlhoff very clarifying, very well
reasoned, and providing alternatives for all the important use cases from P0057R0 with
superior implementations.

I am in no way deeply familiar with either of these two proposals. However, after skimming P0114, one thing seems very clear: P0057 is much farther along, in terms of actually creating an implementable standard.

The P0057 paper itself is actual wording, ready to be incorporated into the standard. Not only that, P0057 has actual, live implementation experience behind it. You can go get VS2015 right now and play with their implementation of a version of this functionality.

P0114 seems more... experimental. It sounds like something that has been discussed to some degree, but is as of yet lacking a proof-of-concept implementation. A lot is said about how it would be "possible" to implement some particular facet under their new rules. But the paper never claims that they've taken Clang or GCC or whatever and actually implemented it.

That's not to say that P0114 is dead and all effort should be focused on P0057. But however much you may find P0114 to be technically superior, P0057 has earned the right to be considered.

german...@hubblehome.com

unread,
Oct 3, 2015, 11:12:08 PM10/3/15
to ISO C++ Standard - Future Proposals

P0114 seems more... experimental. It sounds like something that has been discussed to some degree, but is as of yet lacking a proof-of-concept implementation. A lot is said about how it would be "possible" to implement some particular facet under their new rules. But the paper never claims that they've taken Clang or GCC or whatever and actually implemented it.

There is an experimental implementation.
 
That's not to say that P0114 is dead and all effort should be focused on P0057. But however much you may find P0114 to be technically superior, P0057 has earned the right to be considered.

Well, to me P0057 violates the *zero-overhead principle* that can be avoided by the other proposal, in my humble opinion. You do need boxing.
Besides that, it has other disadvantages, and I see a bit of a mistake, again, in my opinion, to embed a scheduler into the language, when you could do it in a library, as Christopher's paper shows.

But I am not in the commitee or proposed anything. It just seems to me that Christopher's proposal is more lightweight and can do everything
that P0057 can do better.

Arash Partow

unread,
Oct 4, 2015, 12:01:52 AM10/4/15
to std-pr...@isocpp.org
Nicol Bolas wrote:
>
> The P0057 paper itself is actual wording, ready to be incorporated into the
> standard. Not only that, P0057 has actual, live implementation experience
> behind it. You can go get VS2015 right now and play with their
> implementation of a version of this functionality.
>
> P0114 seems more... experimental. It sounds like something that has been
> discussed to some degree, but is as of yet lacking a proof-of-concept
> implementation.
>

I believe CK may have a POC implementation available - that is a set
of patches against clang that adds the various features needed eg:
'resumable' et al.



That said, no single proposal has yet discussed why a compliant
compiler will never be able to deduce such scenarios without extra
keywords and rearranging of code - Similar things have been done to
achieve tail call optimizations, why can this not be done with
coroutines?

Gor Nishanov

unread,
Oct 4, 2015, 12:44:48 AM10/4/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
I see a bit of a mistake, again, in my opinion, to embed a scheduler into the language, when you could do it in a library, as Christopher's paper shows.

There is absolutely no embedded scheduler in P0057 and never was. P0057 and its predecessors provide syntactic sugar for common async and sync patterns and it is up to the library to decide what meaning to imbue the coroutine with.

I suggest to look at this presentation:


which walks through some of the aspects of P0057 proposal. Note, that the await syntax is actually quite old. It first appeared as do-notation in Haskell in 1998 and you may notice that P0057 can be used to perform more general "monadic" transformations and not only limited to coroutines.

Another thing that the presentation above highlights is that the abstraction proposed is unique as it is not just zero-overhead. It is negative overhead :-) . Meaning that for some problems, taking the well-written code that uses functions / callbacks and rewriting it using higher level abstractions, namely, the coroutines as proposed by PP0057 will result in simpler implementation, smaller object size and faster execution.

german...@hubblehome.com

unread,
Oct 4, 2015, 7:14:05 AM10/4/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Sunday, October 4, 2015 at 11:44:48 AM UTC+7, Gor Nishanov wrote:
I see a bit of a mistake, again, in my opinion, to embed a scheduler into the language, when you could do it in a library, as Christopher's paper shows.

There is absolutely no embedded scheduler in P0057 and never was.

Hello Gor. If there is no scheduler, I do not understand how await can work. Forgive my ignorance, as I said above, I do not know to detail. But
my understanding is that if you have a call to await, that state for the suspended coroutine must be kept somewhere. Where? I understand that this state must live somewhere. Where is that state held?


 
P0057 and its predecessors provide syntactic sugar for common async and sync patterns and it is up to the library to decide what meaning to imbue the coroutine with.
 
 
Another thing that the presentation above highlights is that the abstraction proposed is unique as it is not just zero-overhead. It is negative overhead :-) . Meaning that for some problems, taking the well-written code that uses functions / callbacks and rewriting it using higher level abstractions, namely, the coroutines as proposed by PP0057 will result in simpler implementation, smaller object size and faster execution.

I do not get yet how it can achieve this negative overhead. Even the coroutines are type erased, as mentioned by Chris' papers. What can be better than having inlinable, reified coroutines? I just do not get it.
 
I have three questions here:

1. How is the negative overhead achieved?
2. This would have negative overhead *compared* to an implementation with resumable expressions?
3. Do these optimizations are fancy? We have had good inliners for years, but it seems the coroutines from P0057 mandate
type erasure.

Sorry if I make any mistakes during my explanation, I am not an expert on this papers, I just happen to understand quite well
Christopher's metaphor of function objects and I see very difficult something more performant that non-type erased coroutines
that only take the space strictly required.


Regards

german...@hubblehome.com

unread,
Oct 4, 2015, 7:16:19 AM10/4/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
Oh my english! I write too fast:

3. Do these optimizations are fancy? ----> Are these optimizations fancy?

Nicol Bolas

unread,
Oct 4, 2015, 9:26:37 AM10/4/15
to ISO C++ Standard - Future Proposals
On Sunday, October 4, 2015 at 12:01:52 AM UTC-4, Arash Partow wrote:
That said, no single proposal has yet discussed why a compliant
compiler will never be able to deduce such scenarios without extra
keywords and rearranging of code - Similar things have been done to
achieve tail call optimizations, why can this not be done with
coroutines?

... I don't understand what you mean.

Again, this may just be my ignorance on the details of these coroutine ideas, but I thought the idea of resumable functions (P0057), at its core, was the ability to halt the execution of a function (yield) and later return to where that function's execution was halted (resume). That is, to my understanding, the core feature.

How could the compiler deduce that I want to do that... without giving me syntax to actually do that (at which point it's not "deduction" anymore)?

You may be confusing P0057 with a pure await/async model that is bound to CPU threadings and so forth. It's rather lower-level than that. And even that is not something which is deducible by the compiler, since it very much affects the apparent behavior of the program.

By contrast, proper tail calls is not apparent behavior. Well, not with regard to the standard at any rate.

Vicente J. Botet Escriba

unread,
Oct 4, 2015, 11:44:22 AM10/4/15
to std-pr...@isocpp.org, german...@hubblehome.com
Le 04/10/15 06:44, Gor Nishanov a écrit :
>>
>> I suggest to look at this presentation:
>>
>> http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4287.pdf
>>
>> which walks through some of the aspects of P0057 proposal. Note, that the
>> await syntax is actually quite old. It first appeared as do-notation in
>> Haskell in 1998 and you may notice that P0057 can be used to perform more
>> general "monadic" transformations and not only limited to coroutines.
Hmm, await can not work with list as a monad, isn't it?
Bit no proposal is tempting to take care of this case.

Vicente

Nicol Bolas

unread,
Oct 5, 2015, 10:42:05 AM10/5/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
On Sunday, October 4, 2015 at 7:14:05 AM UTC-4, german...@hubblehome.com wrote:
On Sunday, October 4, 2015 at 11:44:48 AM UTC+7, Gor Nishanov wrote:
I see a bit of a mistake, again, in my opinion, to embed a scheduler into the language, when you could do it in a library, as Christopher's paper shows.

There is absolutely no embedded scheduler in P0057 and never was.

Hello Gor. If there is no scheduler, I do not understand how await can work. Forgive my ignorance, as I said above, I do not know to detail. But
my understanding is that if you have a call to await, that state for the suspended coroutine must be kept somewhere. Where? I understand that this state must live somewhere. Where is that state held?

After reading the proposal a bit, it's clear that `await` does not actually "wait" on anything. For the most part, it's a syntactic transformation on an expression.

The expression that `await` applies to must result in an object that has a certain interface. And the logic for `await` calls that interface. If there is any scheduling logic going on, it is in the implementation of that interface, not in `await` itself.

As such, the storage in question is in the object resulting from the `await` expression.

And the closest to scheduling that `await` gets is the decision to check if the value is ready before yielding.

Sorry if I make any mistakes during my explanation, I am not an expert on this papers, I just happen to understand quite well
Christopher's metaphor of function objects and I see very difficult something more performant that non-type erased coroutines
that only take the space strictly required.

How much more performant? Is it enough to be worth arguing about? After all, most things you'll be using await for won't be cheap operations. Will you actually notice any such performance loss?

That's not to say that I much like resumable functions as a proposal. I can't say I look forward to doing a bunch of `await`ing and using specialized return values just to be able to allow some deeply nested function perform a `yield` back to the original caller. And that's not even using threading.

That being said, I noticed one thing about resumable expressions that makes it a complete deal-breaker for me:

When a resumable function is used in a resumable expression, the definition of the function must appear before the end of the translation unit.

Um, no. I understand that this requirement is not necessarily recursive. That is, you don't need the definition of every function the resumable one calls. However, if the resumable one you call itself makes a resumable call, you will need those definitions. And if they make resumable calls, you'll need those definitions. And so forth.

If it's a choice between forbidding inlining and forcing inlining, I'll accept the overhead of forbidding inlining.

Germán Diago

unread,
Oct 6, 2015, 7:34:03 AM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


How much more performant? Is it enough to be worth arguing about? After all, most things you'll be using await for won't be cheap operations. Will you actually notice any such performance loss?

Well this is a usual argument to use "productivity languages". As far as I know, the definition of performance for C++ is that between C++ and machine code, we can only choose assembly.
So far, it has been good to me. Boxing is bad, bad, bad. I do not think it is a good idea in a language abstraction. About the scheduling, not sure, but I believe what you say for now :).
 
 
When a resumable function is used in a resumable expression, the definition of the function must appear before the end of the translation unit.

There is an example of a boxed generator with separate compilation in the paper. Doesn't that contradict whay you are claiming?

 

Gor Nishanov

unread,
Oct 6, 2015, 10:04:25 AM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
How much more performant? Is it enough to be worth arguing about? After all, most things you'll be using await for won't be cheap operations. Will you actually notice any such performance loss?

You should not expect any performance loss. When applied to concrete problems, you should expect Coroutines proposal to be as fast or faster than equivalent solution using resumable expressions.
 
If it's a choice between forbidding inlining and forcing inlining, I'll accept the overhead of forbidding inlining.

If coroutine lifetime is fully enclosed in the lifetime of the calling function, then we can
1) elide allocation and use a temporary on the stack of the caller
2) replace indirect calls with direct calls and inline as appropriate:

For example:

auto hello(char const* p) {

   while (*p) yield *p++;

}

int main() {

   for (auto c : hello("Hello, world"))

      putchar(c);

}


Should produce the same code as if you had written:

int main() {

   auto p = "Hello, world";

   while (*p) putchar(*p++);

}


Nicol Bolas

unread,
Oct 6, 2015, 10:05:11 AM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
On Tuesday, October 6, 2015 at 7:34:03 AM UTC-4, Germán Diago wrote:
How much more performant? Is it enough to be worth arguing about? After all, most things you'll be using await for won't be cheap operations. Will you actually notice any such performance loss?

Well this is a usual argument to use "productivity languages". As far as I know, the definition of performance for C++ is that between C++ and machine code, we can only choose assembly.

Yeah, tell that to iostream.

While that might be a goal of C++, it's not an overriding goal. It doesn't automatically pre-empt all other considerations.

So far, it has been good to me. Boxing is bad, bad, bad.

Why?

I do not think it is a good idea in a language abstraction. About the scheduling, not sure, but I believe what you say for now :).
 
 
When a resumable function is used in a resumable expression, the definition of the function must appear before the end of the translation unit.

There is an example of a boxed generator with separate compilation in the paper. Doesn't that contradict whay you are claiming?

So let me get this straight.

A resumable function, under this design, is inline. However, by making my function not resumable, I can get the effect of a non-inline resumable function by making the function body actually a lambda (which the compiler will deduce is a resumable function), and returning it in some object. And if I have allocation needs, I have to explicitly specify them in the body of every function that has those needs. And so forth.

Meaning that, in a rather common case, the user has to do a lot of work. Isn't the whole point of a compiler to do that sort of gruntwork for you? Why not make `resumable` do this "boxing" work for you, and have `inline resumable` do what the current thing suggests? After all, the current `resumable` implies `inline`, so we'd just be making it explicit.

So `inline resumable` means to do what it currently says. And `resumable` means to automatically do boxing and so forth. Or if you prefer `resumable` to mean the current case, introduce another keyword to have the compiler generate the "boxing" code for you.

Users should not have to do this nonsense manually, especially considering how common such code will be.

Nicol Bolas

unread,
Oct 6, 2015, 10:57:15 AM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Tuesday, October 6, 2015 at 10:04:25 AM UTC-4, Gor Nishanov wrote:
If it's a choice between forbidding inlining and forcing inlining, I'll accept the overhead of forbidding inlining.

If coroutine lifetime is fully enclosed in the lifetime of the calling function, then we can
1) elide allocation and use a temporary on the stack of the caller
2) replace indirect calls with direct calls and inline as appropriate:

How exactly does std::experimental::generator<T> accomplish that? How can the object know that it is contained entirely in this way? After all, the promise type is what holds the state, and therefore the promise has to decide whether to statically or dynamically allocate memory, as well as how to handle the forwarding to the function to be resumed.

Is `generator` a magical compiler-only type, or could a user somehow implement these optimizations themselves? Or am I misunderstanding something about how all this works?

Gor Nishanov

unread,
Oct 6, 2015, 11:57:10 AM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Tuesday, October 6, 2015 at 7:57:15 AM UTC-7, Nicol Bolas wrote:
How exactly does std::experimental::generator<T> accomplish that? How can the object know that it is contained entirely in this way? After all, the promise type is what holds the state, and therefore the promise has to decide whether to statically or dynamically allocate memory, as well as how to handle the forwarding to the function to be resumed.

There are only two magic types in this proposal. coroutine_traits, which let the compiler figure out which promise_type describes the coroutine semantics and coroutine_handle<P> which is synthesized by the compiler to allow resumption and destruction of the coroutine.

If you look at the implementation of coroutine_handle in <experimental/resumable>, you will notice the following two members:

void coroutine_handle::resume() { _coro_resume(_Ptr); }

void coroutine_handle::destroy() { _coro_destroy(_Ptr); }


_coro_resume and _coro_destroy are intrinsics that are implemented in our optimizer. After inlining, in the main, optimizer will observe the following sequence:

$fp = _coro_alloc_elision() ? alloca(_coro_frame_size()) : operator new (_coro_frame_size()); // frame size of hello$ coroutine
bla
_coro_resume($fp)
bla
_coro_destroy($fp); <-- here

Now optimizer can reason about the lifetime and also to which function _coro_resume($fp) and _coro_destroy($fp) go.
Since in this example, $fp does not escape. Optimizer replaces _coro_alloc_elision with 1, thus, allocation is done via alloca(constant) which optimizer makes into a normal automatic variable. _coro_resume and _coro_destroy are replaced with direct calls to hello$resume_coro, which after inlining will lead to what I showed in my previous post.

I talked to Clang implementers and they are planning to add this optimization too. Note that it is explicitly allowed by

P0057/[dcl.fct.def.coroutine]/8 A coroutine may need to allocate memory to store objects with automatic storage duration local
to the coroutine. If so, it shall obtain the storage by calling an allocation function (3.7.4.1).
The allocation function’s name is looked up in the scope of the promise type of the coroutine.
If this lookup fails to find the name, the allocation function’s name is looked up in the global
scope...


In other words, when allocation is needed, here is how the compiler figures out what allocation functions to use. But, if hte compiler does not need to allocate, it does not have to.

Germán Diago

unread,
Oct 6, 2015, 1:32:31 PM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
Why not make `resumable` do this "boxing" work for you, and have `inline resumable` do what the current thing suggests? 

Because you can provide library solutions  for the boxing in the proposals, without embedding mandatory boxing into the feature. 
Just provide a generator<int> and you are done. Nothing prevents you from giving these library types, that can box, but
it does not *force* you from the beginning.

I cannot see how resumable expressions are worse than await when:

1. can still provide types for boxing on the lib side.
2. it can emulate async from a library.
3. no viral await when refactoring.
4. it does *not* mandate boxing.
5. it can embed member variables. Maybe even relax restrictions for copy/move as needed in later proposals.
6. does not need a fancy escape analysis that await, as of now, needs, and is a Microsoft-specific compiler optimization as of today.
 

Users should not have to do this nonsense manually, especially considering how common such code will be.

Just providing a library type solves the full problem. 

Gor Nishanov

unread,
Oct 6, 2015, 4:00:28 PM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
On Tuesday, October 6, 2015 at 10:32:31 AM UTC-7, Germán Diago wrote:
Why not make `resumable` do this "boxing" work for you, and have `inline resumable` do what the current thing suggests? 

Because you can provide library solutions  for the boxing in the proposals, without embedding mandatory boxing into the feature.

Germán:

On Boilerplate:

The starting point for my proposal was lambda-*. A lambda that keeps all of the objects with automatic storage duration in the body in the lambda function object. I wanted an abstraction that is more fundamental than what was offered by earlier await proposals pre N4134 and awaits in other languages, but the one, on top of which I can efficiently build C# like await syntax. After running around with that idea for a month I came to the conclusion that when applied to concrete problems, it requires more boilerplate code and does not result in more efficient code. Moreover, in those cases where you don't need to allocate your lambda* on the heap and can put it on the stack, I could elide heap allocation in the optimizer for N4134. Hence, I tabled lambda* until better times.

Chris proposal suffers from the same problem as lambda*. It requires you to write more code as a user without providing tangible benefit. As I said before, for any concrete problem, you will get the same or more efficient code with my proposal than with Chris'es. Thus, there is no justification for added complexity.

Look at the async state machine problem which I discussed in the http://wg21.link/N4287. Or look at http://wg21.link/P0055 which explains how await SomeAsyncOp is expanded. Now, compare what does it take to get from await f() in my proposal and await(f()) or f(use_await) in P0114R0 to the actual OS call. Abstraction overhead is lower in P0057.

On Optimizations

C++ is a language that offers an ability to create zero-overhead abstractions (or negative overhead in case of P0057). However, zero-overhead part comes from the optimizer. When STL was first proposed in 1994, no compiler in the world could make it efficient. It took more than ten years before compilers caught up. Optimization technology is fundamental to C++ abstractions.

Await or Not

If you look at http://wg21.link/P0054, you will find a section "Exploring design space" which sketches out how you can evolve P0057 to add the "magic" so that you don't have to write awaits. However, I am not sure that absence of explicit indication of suspend points is a good thing, but, may get convinced otherwise in the future.

Cheers,
Gor

Nicol Bolas

unread,
Oct 6, 2015, 9:48:07 PM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Tuesday, October 6, 2015 at 1:32:31 PM UTC-4, Germán Diago wrote:
Why not make `resumable` do this "boxing" work for you, and have `inline resumable` do what the current thing suggests? 

Because you can provide library solutions  for the boxing in the proposals, without embedding mandatory boxing into the feature. 
Just provide a generator<int> and you are done. Nothing prevents you from giving these library types, that can box, but
it does not *force* you from the beginning.

Perhaps you misunderstood what I meant when I was talking about "boxing".

I'm not talking about `generator<T>`. I'm talking about the lambda wrapper part, with optional allocator and whatnot. Having to write `return [=](){<actual function>};` around every resumable function I write is a pain that I would rather not have to deal with.

There is absolutely no reason why that boilerplate can't be written by the compiler.

I cannot see how resumable expressions are worse than await when:

"Worse" is ultimately a matter of opinion. Some restrictions will be considered worse by some people than others.

However, I have to say that Gor Nishanov seems to be winning the argument here, since P0057 is being implemented efficiently, with all the inlining and other simplifications that you claimed was not possible. That was your biggest argument against resumable functions, and it turned out to be wrong in at least some of the cases. You can try to dismiss the fact that a good optimizer made the code equivalent if you like, but that doesn't change the fact that optimizer aren't getting worse over time. They're getting better.

If there is no objective performance difference between them, then most of your case just evaporated. So the only remaining function difference is that P0114 requires explicit "boxing" if your code needs boxing.

If you're going to bring up having to use `await` frequently:

3. no viral await when refactoring.

You say that as though resumable expressions don't have their viral aspects too. You can only call a resumable function as part of a resumable context: either the calling function is resumable or the expression making that call is resumable. Both of these require explicit annotation (except in those cases where the compiler magically works it out for you... somehow).

Oh sure, you won't be using `resumable` like you would `await`, or quite as much. But the fact is, you can't call a `resumable` function unless you've typed `resumable` somewhere nearby. So you still have to annotate up your call graph. And you still can't call coroutines of either kind without some kind of annotation.

So I'm not seeing how that's a point in resumable expression's favor.

Oh, and I have to agree with P0054: I think I'd rather see awaiting and yielding happen than for them to be implicit. Also, I seem to recall that an explicit `await` was something that the `expected` guys wanted to be able to key off of. So there is the power to be able to use the functionality for other uses.

Nicol Bolas

unread,
Oct 6, 2015, 10:10:14 PM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
On Tuesday, October 6, 2015 at 1:32:31 PM UTC-4, Germán Diago wrote:

2. it can emulate async from a library.

Just FYI: `async` doesn't exist, and hasn't existed in the resumable functions proposals for quite some time.

Germán Diago

unread,
Oct 6, 2015, 11:44:49 PM10/6/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com

Sorry, I meant await. But it is good that it is disappeared.

Germán Diago

unread,
Oct 7, 2015, 12:00:34 AM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


El miércoles, 7 de octubre de 2015, 3:00:28 (UTC+7), Gor Nishanov escribió:
On Tuesday, October 6, 2015 at 10:32:31 AM UTC-7, Germán Diago wrote:
Why not make `resumable` do this "boxing" work for you, and have `inline resumable` do what the current thing suggests? 

Because you can provide library solutions  for the boxing in the proposals, without embedding mandatory boxing into the feature.

Germán:

On Boilerplate:
 
Chris proposal suffers from the same problem as lambda*. It requires you to write more code as a user without providing tangible benefit.

Well, I am not sure the boilerplate that you are talking about. Chris proposal is more "low-level". But you can build on top of that everything
that can be done by await in a library, can't you? Also, implementing resumable expressions in a compiler cannot be hard,
and you can reuse existing technology from optimizer: basically the inliner. 
In the C++ style of things, where we are starting to use things as "Regular" objects and so on, I see the funcion object metaphor very clear.
But I also see some benefits, please, tell me how this is compared to your proposal, because I understand Chris proposal better, so I could be wrong:

About the negative overhead: I saw your slides and it is impressive, I must admit. But:

1. I do not understand how negative overhead is achieved.
2. That negative overhead *cannot* be achieved by Chris proposal, is it something exclusive of your implementation?


In Chris proposal:

1. You can also implement await. It is a matter of providing a library solution.
2. You can embed as member a resumable expression.
3. You do not need yield keyword, actually, yield is reified in an object, and simple to implement. You could save also this state somewhere, as an object. Can this be done
by await?
4. What is the space overhead of await? As far as I know, resumable expressions just need the strictly necessary space. To eliminate this overhead we need optimizations? Can be done?
5. Reified resumable expressions (no type erased) -- is this possible in your proposal? I think it was mentioned before that the optimizer can discover this and do inlining when needed, anyway?
Can also do escape analysis.
6. Type erased resumable expressions. This is an opt-in in Chris proposal. It is a must in your proposal, right? You rely on the optimizer for eliminating this overhead?



Germán Diago

unread,
Oct 7, 2015, 12:09:28 AM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


3. no viral await when refactoring.

You say that as though resumable expressions don't have their viral aspects too. You can only call a resumable function as part of a resumable context: either the calling function is resumable or the expression making that call is resumable. Both of these require explicit annotation (except in those cases where the compiler magically works it out for you... somehow).

Oh sure, you won't be using `resumable` like you would `await`, or quite as much. But the fact is, you can't call a `resumable` function unless you've typed `resumable` somewhere nearby. So you still have to annotate up your call graph. And you still can't call coroutines of either kind without some kind of annotation.

I think that comparison is simply no honest: if you put await 7 levels down the stack, you need to decorate all the way up with await. For resumable, you would need to do it in the 7th level only, and you
would not need to refactor the rest of the code. That is 7 vs 1 refactoring. Needless to say the reusability problem that Chris exposes in the paper: you cannot reuse algorithms, for example,
with await. You cannot have member variables with await either, right? These are all tangible benefits from having a function object as a representation.
 

So I'm not seeing how that's a point in resumable expression's favor.

You can see it: Imagine a deep stack of calls. How much refactoring do you need in each of the proposals? Imagine code reuse: resumable expressions can reuse code.
We cannot say the same about await *unless* I missed something. In the C++ style, I think resumable functions are more well behaved than await, in the sense that
it is just a function object, you know what it is doing, you could make it (maybe in future proposals) copyable, movable, you know the representation: jump point + strictly needed data.
I think the resumable expressions proposal puts the bar very high to the rest of the proposals, because besides its benefits, you can also implement
what other proposals are proposing.

 

Oh, and I have to agree with P0054: I think I'd rather see awaiting and yielding happen than for them to be implicit.

For me, implicit yielding is precisely what a non-object based yield does. In Chris proposal, a yielder is an object you could pass around.
On top of that you could make more abstractions. He got it right. He did not hide anything that prevents any functionality.

I see Gor's proposal powerful also but I think it is hiding too much stuff, but the point is not that actually, the point is that you can do
what that proposal does on top of resumable expressions, unless I am misunderstanding something.
 

 

Germán Diago

unread,
Oct 7, 2015, 12:17:08 AM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com

Await or Not

If you look at http://wg21.link/P0054, you will find a section "Exploring design space" which sketches out how you can evolve P0057 to add the "magic" so that you don't have to write awaits. However, I am not sure that absence of explicit indication of suspend points is a good thing, but, may get convinced otherwise in the future.


Would this make other code reusable as Chris proposal? That would be a good thing then. Because if you want to be explicit, you do not need to infect things with await everywhere up the stack.
I favor this, sure.

Nicol Bolas

unread,
Oct 7, 2015, 10:39:07 AM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
On Wednesday, October 7, 2015 at 12:09:28 AM UTC-4, Germán Diago wrote:
3. no viral await when refactoring.

You say that as though resumable expressions don't have their viral aspects too. You can only call a resumable function as part of a resumable context: either the calling function is resumable or the expression making that call is resumable. Both of these require explicit annotation (except in those cases where the compiler magically works it out for you... somehow).

Oh sure, you won't be using `resumable` like you would `await`, or quite as much. But the fact is, you can't call a `resumable` function unless you've typed `resumable` somewhere nearby. So you still have to annotate up your call graph. And you still can't call coroutines of either kind without some kind of annotation.

I think that comparison is simply no honest: if you put await 7 levels down the stack, you need to decorate all the way up with await.

The way `await` works is that it halts the current function and returns control to the calling function in such a way that the calling function can resume it later.

Therefore, the only reason you would need to "put await 7 levels down the stack" is if you want every function in that call graph to halt when the top-most function does, and thus return control to the caller "7 levels down". Correct?

If so:
 
For resumable, you would need to do it in the 7th level only, and you
would not need to refactor the rest of the code.

That's not true.

A function marked `resumable` can only be called from a resumable context. This is either another function marked `resumable` or from an expression marked `resumable`.

So this is illegal:

resumable int level3()
{
 
return 5;
}

int level2()
{
 
return level3(); //Cannot call a resumeable function here.
}

Therefore, every one of those 7 levels is going to have to be a `resumable` function. So every one of those levels will have to mark their signature with `resumable`. Any code that calls into any one of those 7 levels will have to mark each use of them with `resumable`, or will themselves have to be coroutines.

So yes, it's just as viral. Only it's worse, because not only do you have to mark them `resumable`, they must be inline.

The only saving grace you get is that the proposal allows automatic deduction of resumeable functions. But not everywhere; only in template code and lambdas. So normal functions don't provide this feature.

That is 7 vs 1 refactoring. Needless to say the reusability problem that Chris exposes in the paper: you cannot reuse algorithms, for example, 
with await.

That helps demonstrate the viral nature of resumable expressions. If I call an algorithm that internally does an implicitly resumable operation, then that algorithm internally becomes a coroutine. It becomes resumable.

Which means... I now must call that algorithm instantiation from a resumable context. So either my function itself is `resumable`, or I have to say `resumable for_each(...)`.

This doesn't invalidate your point, namely that algorithms will deduce how to properly be `resumable` for their contents. The exact suspend/resume points will not be defined by the writer of the algorithm, but by the functions the algorithm actually calls. And there is value to that.

At this point, that is basically the only advantage of resumable expressions. It's a non-trivial thing to be sure, but I don't see anything about resumable functions that would prevent you from addressing these concerns there.

What resumable functions lack relative to resumable expressions are two things:

1) A way for a function to effectively force the caller to become a coroutine (implicit await).

2) A way for the caller of a function to reverse the implicit `await` of a function call (that's what `resumable` applied to expressions does).

These features are all it takes to allow for the kind of template code reuse you're talking about. Though it does make it slightly more inconvenient than the resumable expressions model, since RE coroutines don't have return type requirements.

But neither of these is impossible with the resumable functions model. It's simply a matter of finding the best way to add those features in.

And of deciding if we want them at all (that's not necessarily a given).

You cannot have member variables with await either, right?

I don't know what you mean by this.
 
These are all tangible benefits from having a function object as a representation.

... huh? Those benefits have nothing to do with "having a function object as a representation." Those benefits come from having to declare whether a function is a coroutine at the function level, rather than in the function's implementation.

So I'm not seeing how that's a point in resumable expression's favor.

You can see it: Imagine a deep stack of calls. How much refactoring do you need in each of the proposals? Imagine code reuse: resumable expressions can reuse code.

Code reuse in template functions, perhaps. Code reuse elsewhere? Not so much.
 
We cannot say the same about await *unless* I missed something. In the C++ style, I think resumable functions are more well behaved than await, in the sense that
it is just a function object, you know what it is doing, you could make it (maybe in future proposals) copyable, movable, you know the representation: jump point + strictly needed data.
I think the resumable expressions proposal puts the bar very high to the rest of the proposals, because besides its benefits, you can also implement
what other proposals are proposing.

Yes, you could implement resumable functions on top of resumable expressions. But that doesn't prove resumable expressions are better. Not does it prove that anything you could build atop resumable expressions cannot also be built atop resumable functions.

The thing is that there is one really important fact that cannot be gotten around.

Resumable functions are a pretty-well proven concept. We have multiple implementations of them, apparently. We have experience in implementing them, with an evaluation of optimization opportunities. We have actual standard wording for it.

Can you tell me that resumable expressions have anywhere near the field experience?

Nicol Bolas

unread,
Oct 7, 2015, 11:21:59 AM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com

Actually, I misunderstood two things about resumable expressions, both of which lead me to believe that P0114 is broken. Though not irreparably.

1) Implicit resumable deduction.

Apparently, implicit resumable deduction happens more often than I thought. It happens on `inline` functions, member functions in class definitions, and so forth. Indeed, pretty much the only place where resumable deduction is not implicit... is in normal functions in .cpp files.

Which leads to breakage #1:

//Some header file.
inline int a_func()
{
 
return 5;
}

inline int b_func()
{
 
break resumable;
 
return 5;
}

//Some cpp file.

static void func()
{
 
auto x = a_func() + b_func();
  internal_global_var
+= x;
}

void caller()
{
  func
();
}

OK, where will the compile error point? Well, the compile error will cite the first line of `func`. Which seems OK. You're calling a resumable function from a non-resumable context, and the function isn't inline, so it won't be implicitly deduced. That sounds legitimate.

But wait: what happens if you stick `inline` in front of `func`? It's a static function, so making it `inline` seems harmless (though silly). Yet suddenly... the error moves. Now it points at `caller`.

Why? The last time anyone mentioned `resumable` was 2 function levels below `caller`, and in a completely different file. All because someone thought they'd help the compiler out with inlining by (mis)using the `inline` keyword.

The distance between where the last `resumable` was and where the improper use of a resumable function call is (ie: where the error actually happened) should be exactly one. That is, one call. The compiler should be able to point at the call, and the user should be able to see, from the call itself, that this function can't be called in this way.

I should not have to look at the implementation of your code, and the code you call, ad-infinitium, before I'm able to decide whether and/or how I can call your function.

If the question is this: should it be possible to accidentally write a coroutine by calling a coroutine? My answer to that should be "only if the caller can easily see that they've done so." And the only way to do that is to put something in the function signature that says "hey, I'm a coroutine; if you call me, so are you."

For resumable expressions, that's spelled `resumable`. And therefore, every resumable function ought to be explicitly tagged as such.

Of course, taking away implicit `resumable` deduction breaks the marquee feature of resumable expressions: the ability for templates to become coroutines based on what template parameters they're given.

I don't care. The downsides of this approach are not worth the advantages.

`resumable` may not be "viral". But it is just as infectious as await. And a silent infection is far more pernicious than a noisy one.

2) Implicit `resumable` deduction oversight.

I must assume that this is an oversight. Otherwise, the proposal makes no sense.

The proposal states that a function is implicitly resumable if it calls `break resumable` or calls any resumable function. Period.

What about calling a resumable function within a resumable expression? IE: `resumable auto x = resumable_func()`. No exception is listed for this; calling `resumable_func` makes the calling function resumable, period. And if it's not in an implicitly resumable deduction context, it is a compile error.

I have to assume that this was simply a mistake. That the section on implicit deduction was meant to have allowances for resumable expressions, that any functions called in such an expression effectively don't count. Otherwise, main itself will have to be resumable if you call any resumable function.

german...@hubblehome.com

unread,
Oct 7, 2015, 11:37:45 AM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


Therefore, the only reason you would need to "put await 7 levels down the stack" is if you want every function in that call graph to halt when the top-most function does, and thus return control to the caller "7 levels down". Correct?

If so:
 
For resumable, you would need to do it in the 7th level only, and you
would not need to refactor the rest of the code.

That's not true.

As I understood it, maybe I am wrong, the function will suspend inside.
 

A function marked `resumable` can only be called from a resumable context. This is either another function marked `resumable` or from an expression marked `resumable`.

So this is illegal:

resumable int level3()
{
 
return 5;
}

int level2()
{
 
return level3(); //Cannot call a resumeable function here.
}

Therefore, every one of those 7 levels is going to have to be a `resumable` function. So every one of those levels will have to mark their signature with `resumable`. Any code that calls into any one of those 7 levels will have to mark each use of them with `resumable`, or will themselves have to be coroutines.

 
So yes, it's just as viral. Only it's worse, because not only do you have to mark them `resumable`, they must be inline.

I am not sure about this limitation. I will take a look again.

 
The only saving grace you get is that the proposal allows automatic deduction of resumeable functions. But not everywhere; only in template code and lambdas. So normal functions don't provide this feature.

Now I understand why it works. I did not catch this at first.
 

That is 7 vs 1 refactoring. Needless to say the reusability problem that Chris exposes in the paper: you cannot reuse algorithms, for example, 
with await.

That helps demonstrate the viral nature of resumable expressions. If I call an algorithm that internally does an implicitly resumable operation, then that algorithm internally becomes a coroutine. It becomes resumable.

Yes. True. Though, still more reusable than await in this regard.
 

Which means... I now must call that algorithm instantiation from a resumable context. So either my function itself is `resumable`, or I have to say `resumable for_each(...)`.

This doesn't invalidate your point, namely that algorithms will deduce how to properly be `resumable` for their contents. The exact suspend/resume points will not be defined by the writer of the algorithm, but by the functions the algorithm actually calls. And there is value to that.

Agree.
 

At this point, that is basically the only advantage of resumable expressions. It's a non-trivial thing to be sure, but I don't see anything about resumable functions that would prevent you from addressing these concerns there.

What resumable functions lack relative to resumable expressions are two things:

1) A way for a function to effectively force the caller to become a coroutine (implicit await).

2) A way for the caller of a function to reverse the implicit `await` of a function call (that's what `resumable` applied to expressions does).

These features are all it takes to allow for the kind of template code reuse you're talking about. Though it does make it slightly more inconvenient than the resumable expressions model, since RE coroutines don't have return type requirements.

 
But neither of these is impossible with the resumable functions model. It's simply a matter of finding the best way to add those features in.

Would be nice to have those.
 

And of deciding if we want them at all (that's not necessarily a given).

You cannot have member variables with await either, right?

I mean a member variable that holds a resumable expression, you can have that, it is a function object.
Can this be done with resumable functions? As far as I understand, but again, you seem
to understand the proposal better than me, you can only store the result, but not the object itself.

 

I don't know what you mean by this.
 
These are all tangible benefits from having a function object as a representation.

If you have a function object, we understand how to save it, reified (non-type erased) and how to extend that to
a copyable, movable object when it makes sense. That is what I meant.
 
... huh? Those benefits have nothing to do with "having a function object as a representation." Those benefits come from having to declare whether a function is a coroutine at the function level, rather than in the function's implementation.
 
Code reuse in template functions, perhaps. Code reuse elsewhere? Not so much.

All the STL is not a small thing to dismiss... Not to mention all the template libs there are in the wild.
 

Yes, you could implement resumable functions on top of resumable expressions. But that doesn't prove resumable expressions are better. Not does it prove that anything you could build atop resumable expressions cannot also be built atop resumable functions.

Maybe it is me only, but I find resumable expressions so easy to translate, in my head, to what it really is.
I cannot say the same about await. It has its merits also, sure, but resumable expressions seem simpler to me,
and everything else seems to be composable on top of it.
 

Can you tell me that resumable expressions have anywhere near the field experience?

 Well, the closest thing we can have is creating function objects with stackless coroutines: http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/overview/core/coroutine.html

That is a function object with resumable feature. I think what a resumable expression does can fit in my head
the same way a lambda fits.
I cannot say the same when I reason about await. At least not so easily. And I am still not convinced about
cannot hold it as an object (am I right?) you can only get its result, and the fact that you must type-erase it
(sure, there are optimizations for that, but for now they are in MS compiler only?).

Thank you very much for your feedback, it made me understand a few points I was wrong about.
 

Gor Nishanov

unread,
Oct 7, 2015, 1:31:48 PM10/7/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
German:

If you noticed the theme of my answers to your questions was to move you away from feature list style comparison and take a look at how it is reduced to practice.

Alex Stepanov said many insightful things and one of them was: "I still believe in abstraction, but now I know that one ends with abstraction, not starts with it. I learned that one has to adapt abstractions to reality and not the other way around." (http://web.archive.org/web/20071120015600/http://www.research.att.com/~bs/hopl-almost-final.pdf page 18).

I suggested earlier to take some concrete example, such as tcp_reader. Write it both ways and analyze from three angles.

1) how much code a user has to write to use this abstraction to solve the problem
2) how much code a library/framework developer has to write to support this abstraction
3) what is the abstraction penalty. How many (after optimization) instructions need to get executed to get from the abstraction to the hardware.

I believe approaching the comparisons in this way, will help you discover the answers to the questions you seek.

On hidden magic:

Coroutine proposal is similar to a range-base-for. A compiler does the syntactic sugar. The magic is an idea of iterable that allows a compiler to communicate with the library.

Similarly, with coroutine proposal, the magic that gets you zero/negative-overhead is in an idea of awaitable. You do the magic yourself. You can find samples of "negative-overhead" awaitable in the slides http://wg21.link/N4287. Also http://wg21.link/P0055 shows how this magic can be extended via CompletionToken technique to any template library that models their API after the networking library.

The transformation that compiler does is specified in  http://wg21.link/P0057.

Cheers,
Gor

german...@hubblehome.com

unread,
Oct 9, 2015, 1:49:45 AM10/9/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Thursday, October 8, 2015 at 12:31:48 AM UTC+7, Gor Nishanov wrote:
German:

If you noticed the theme of my answers to your questions was to move you away from feature list style comparison and take a look at how it is reduced to practice.

Alex Stepanov said many insightful things and one of them was: "I still believe in abstraction, but now I know that one ends with abstraction, not starts with it. I learned that one has to adapt abstractions to reality and not the other way around." (http://web.archive.org/web/20071120015600/http://www.research.att.com/~bs/hopl-almost-final.pdf page 18).

Well, I agree that you accumulated a good deal of experience during the implementation. Noone can negate that. And evidence shows that, for your test cases,
the negative overhead seems impressive.
 
On hidden magic:

Coroutine proposal is similar to a range-base-for. A compiler does the syntactic sugar. The magic is an idea of iterable that allows a compiler to communicate with the library.

That is nice, because I really thought there was a scheduler embedded in some way, but this stays on the lib side, right?

 
Similarly, with coroutine proposal, the magic that gets you zero/negative-overhead is in an idea of awaitable. You do the magic yourself. You can find samples of "negative-overhead" awaitable in the slides http://wg21.link/N4287. Also http://wg21.link/P0055 shows how this magic can be extended via CompletionToken technique to any template library that models their API after the networking library.
 
The transformation that compiler does is specified in  http://wg21.link/P0057.


Given all this, I see your proposal as a nice candidate also, though you already know my preference and why. Basically:

1. Resumable expressions do not need to be type erased, but can.
2. Resumable expression objects can be held as objects,, (even non-type erased? I am not sure).

What are the chances that we could capture the coroutines themselves in variables, and make them copyable and movable?
I tend to see as a standard idiom to have objects that can be copied/moved/compared, etc. that is the trend lately I think.
I do not mean the rest is not good, but why should we prevent these semantics in the first place in coroutines?


Also, I see the yield keyword. I am not sure how it works. That is on the library side in Chris' proposal, represented as an object.
What are the differences?

german...@hubblehome.com

unread,
Oct 9, 2015, 1:51:23 AM10/9/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com

1. Resumable expressions do not need to be type erased, but can.
2. Resumable expression objects can be held as objects,, (even non-type erased? I am not sure).
 
Also:

  3. Template code reuse also seems appealing to me.

Gor Nishanov

unread,
Oct 9, 2015, 10:19:40 AM10/9/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Thursday, October 8, 2015 at 10:49:45 PM UTC-7, germa...@gmail.com wrote:

Well, I agree that you accumulated a good deal of experience during the implementation. Noone can negate that. And evidence shows that, for your test cases,
the negative overhead seems impressive.

Perhaps I was too cryptic in my previous response. The point of Stepanov's quote was that usefulness of an abstraction comes from how it helps to solve a real problem. You start with the problem, you end with an abstraction (See N4287 slide 7 for more). Thus, if you believe that a particular aspect of resumable expressions is awesome. Take a real problem (possibly reduce it to the size of tcp_reader) and code it up using resumable expression syntax, than compare how the same problem can be solved using coroutines. Evaluate it on the three criteria I listed: how much end-user writes, how much library support required, what is the abstraction penalty.
 
 Also, I see the yield keyword. I am not sure how it works. That is on the library side in Chris' proposal, represented as an object.

`yield expr' is syntactic sugar for 'await $p.yield_value(expr)' . See p0057r0/[expr.yield]

Nicol Bolas

unread,
Oct 9, 2015, 11:21:56 AM10/9/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


On Friday, October 9, 2015 at 1:49:45 AM UTC-4, germa...@gmail.com wrote:
What are the chances that we could capture the coroutines themselves in variables, and make them copyable and movable?
I tend to see as a standard idiom to have objects that can be copied/moved/compared, etc. that is the trend lately I think.
I do not mean the rest is not good, but why should we prevent these semantics in the first place in coroutines?

If you're talking about value semantics, that doesn't make sense for coroutines. Remember that part of a coroutine's state is the stack. And stack variables are often references or pointers to other stack variables. You cannot effectively copy such a construct. And it's silly for the user to have to define a "copy constructor" for their call stack.

That's why `coroutine_handle` has reference semantics. It just makes more sense for coroutines. That doesn't prevent you from being able to pass them (and any containing object) around. You can even have a `std::vector<generator<int>>` and resume each one in turn.

Or to put it another way, just because you see `await` used to catch the coroutine promise returned by a coroutine does not mean you have to use it that way.
 
Also, I see the yield keyword. I am not sure how it works.

It returns a value and suspends the coroutine's execution at that point.

If you're wondering about the details of how the value is passed to the coroutine promise and all, that's part of the proposal.

Germán Diago

unread,
Oct 10, 2015, 12:24:34 AM10/10/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com


El viernes, 9 de octubre de 2015, 22:21:56 (UTC+7), Nicol Bolas escribió:


On Friday, October 9, 2015 at 1:49:45 AM UTC-4, germa...@gmail.com wrote:
What are the chances that we could capture the coroutines themselves in variables, and make them copyable and movable?
I tend to see as a standard idiom to have objects that can be copied/moved/compared, etc. that is the trend lately I think.
I do not mean the rest is not good, but why should we prevent these semantics in the first place in coroutines?

If you're talking about value semantics, that doesn't make sense for coroutines. Remember that part of a coroutine's state is the stack. And stack variables are often references or pointers to other stack variables. You cannot effectively copy such a construct. And it's silly for the user to have to define a "copy constructor" for their call stack.

In some cases it makes sense, in others, it does not make sense. It all depends, I think. Let me look for more seriuos use cases.

 

That's why `coroutine_handle` has reference semantics. It just makes more sense for coroutines. That doesn't prevent you from being able to pass them (and any containing object) around. You can even have a `std::vector<generator<int>>` and resume each one in turn.

That you can store them is nice, indeed! 
 
Or to put it another way, just because you see `await` used to catch the coroutine promise returned by a coroutine does not mean you have to use it that way.

I am not sure about this. I have to take a more serious look to both proposals and do a comparison. I think a good starting point would be to convert Gor's code 
to resumable expressions, which is what I think is more low-level proposal, and see how code looks.

Yesterday I was taking a look and I still have the impression that Gor's proposal is not as minimal as it could be. At least it does not embed any scheduler, that is true,
but I see that it "hardcodes" a protocol into the language that is bigger than Chris'. But as I said, I need to take a more serious look at this to make a really
fair and accurate comparison.
 
 
Also, I see the yield keyword. I am not sure how it works.

It returns a value and suspends the coroutine's execution at that point.

In resumable expressions, that can be done on top of a library abstraction, why putting this into the language should be better?
Remember that when you put something into the language, there is no way back. This is my main reasoning for making things
into a library when possible. Resumable expressions are minimal. Still, hold on, I need a more serious look into this.
 
If you're wondering about the details of how the value is passed to the coroutine promise and all, that's part of the proposal.

Well, at a first impression, you all know which proposal I would favour, but I need to document myself further on this. I hope I can
give a more in-depth comparison. Though, cannot promise, a lot of work. :)

Regards
 

Nicol Bolas

unread,
Oct 10, 2015, 2:34:40 AM10/10/15
to ISO C++ Standard - Future Proposals, german...@hubblehome.com
On Saturday, October 10, 2015 at 12:24:34 AM UTC-4, Germán Diago wrote:
El viernes, 9 de octubre de 2015, 22:21:56 (UTC+7), Nicol Bolas escribió:
Or to put it another way, just because you see `await` used to catch the coroutine promise returned by a coroutine does not mean you have to use it that way.

I am not sure about this. I have to take a more serious look to both proposals and do a comparison. I think a good starting point would be to convert Gor's code 
to resumable expressions, which is what I think is more low-level proposal, and see how code looks.

Yesterday I was taking a look and I still have the impression that Gor's proposal is not as minimal as it could be. At least it does not embed any scheduler, that is true,
but I see that it "hardcodes" a protocol into the language that is bigger than Chris'. But as I said, I need to take a more serious look at this to make a really
fair and accurate comparison.

I guess my question is this: why does it matter which is "lower level" than the other?

A proposal should be as low level as it needs to be, and no lower. So, given equal performance in similar situations (and Gor has provided evidence that this is possible in at least some cases), the principle difference maker should be actual functionality, not the level of abstraction.

Indeed, I feel quite the opposite from you. So long as they provide equivalent functionality with equivalent performance, the higher level one should be considered better. `int[5]` is unquestionably lower level than `array<int, 5>`, but we tell people to almost always use the latter instead of the former. We do so because, though the latter is higher level, it causes no loss of performance, and it improves safety and ease-of-use over the former.

Also, I see the yield keyword. I am not sure how it works.

It returns a value and suspends the coroutine's execution at that point.

In resumable expressions, that can be done on top of a library abstraction, why putting this into the language should be better?

The question could easily be turned around: why is putting it into the library better? Because it's more "low-level", by some measurement? What good does that do me as a user of the language?

After all, with the exception of `break`, there's nothing range-based for can do that `for_each` cannot. Yet we thought that was important enough to put into the language.

So I do not see why merely allowing a feature to be implemented in the library rather than the language is a point in that version's favor. It's interesting and useful to note, but it is not, by itself, an advantage.

Remember that when you put something into the language, there is no way back.

I contest the idea that language features are more immutable than library features. The standard library contains a lot of deficient elements, but I don't see those being undone. We still have iostreams lying around, despite wide-spread conventional wisdom saying not to use them. We still have STL containers that don't erase the allocator's type, despite this being an oft-requested feature.

Oh sure, we seem to be getting rid of `auto_ptr` and a couple of other small things. But I don't see any evidence that library functionality is so much more malleable than language features.

Mistakes will persist, no matter whether they are language or library mistakes. So we shouldn't be that much more afraid of language screwups than library ones. Indeed, the latter will be far more persistent, since most people aren't going to be directly interfacing with the low-level.

Or to put it another way, however you choose to implement "yield", that is what people are going to use. As a common coroutine feature, lots of code will be written against it. And once implemented, you will be no more able to correct flaws in a library `yield` function than in a language `yield` expression.

Nicol Bolas

unread,
Oct 10, 2015, 10:22:40 AM10/10/15
to ISO C++ Standard - Future Proposals
On the recent `operator await` syntax in P0057. Is it possible for a user to call this operator these themselves? And if so, will it work correctly for types that don't provide one (that is, resolving to the original type or issuing a compiler error)?

If not, it would probably be useful if the user could invoke it themselves.

Gor Nishanov

unread,
Oct 10, 2015, 10:58:13 AM10/10/15
to ISO C++ Standard - Future Proposals


On Saturday, October 10, 2015 at 7:22:40 AM UTC-7, Nicol Bolas wrote:
On the recent `operator await` syntax in P0057. Is it possible for a user to call this operator these themselves? And if so, will it work correctly for types that don't provide one (that is, resolving to the original type or issuing a compiler error)?

If not, it would probably be useful if the user could invoke it themselves.

Absolutely, observe me doing this in the main of the attached program:

int main() {

   operator await(1ms);


One thing that is somewhat awkward:

   operator await(1ms);

is not the same as

   await 1ms;

The first one, gets me an awaitable from 1ms. Another is await-ing 1ms.
Difference between calling an operator function directly or calling it via operator notation, is not exactly novel.
The behavior of (x || y) is different from calling operator||(x,y).
I don't like it much. I don't mind operator await to be renamed to get_awaiter or something, but, I do think that operator await is prettier and it is easier to think that language synthesizes operator await for some types rather than synthesizing a function.

Again, I am slightly leaning toward operator await, rather than get_awaiter, but if Core/Evolution/LWG/LEWG wants something else. Absolutely.
opawait.cpp

Gor Nishanov

unread,
Oct 10, 2015, 11:01:43 AM10/10/15
to ISO C++ Standard - Future Proposals
Correction. I think you meant: (referring to classes in opawait attached in the previous response)

operator await(awaiter{ 1ms });


not


operator await(1ms);


Yes, I think the wording is written to make it possible, however, I just checked, the implementation we ship in VS Update 1 does not do that. I filed a bug against myself.

Evgeny Panasyuk

unread,
Oct 12, 2015, 1:08:21 AM10/12/15
to std-pr...@isocpp.org
04.10.2015 18:44, Vicente J. Botet Escriba:
>>>
>>> I suggest to look at this presentation:
>>>
>>> http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4287.pdf
>>>
>>> which walks through some of the aspects of P0057 proposal. Note, that
>>> the
>>> await syntax is actually quite old. It first appeared as do-notation in
>>> Haskell in 1998 and you may notice that P0057 can be used to perform
>>> more
>>> general "monadic" transformations and not only limited to coroutines.
> Hmm, await can not work with list as a monad, isn't it?
> Bit no proposal is tempting to take care of this case.


If use same underlying technique as was used at macro-based stackless
coroutines of Boost.Asio then it can work with list monad, because such
coroutine is just value type which can be copied/moved.

Here is small live demo of list-moand-like based on stackless coroutines
from Boost.Asio: http://coliru.stacked-crooked.com/a/465f5bcb59c8b0b3

--
Evgeny Panasyuk

Richard Smith

unread,
Oct 12, 2015, 2:46:54 PM10/12/15
to std-pr...@isocpp.org
On Sun, Oct 11, 2015 at 10:08 PM, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
04.10.2015 18:44, Vicente J. Botet Escriba:

I suggest to look at this presentation:

http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4287.pdf

which walks through some of the aspects of P0057 proposal. Note, that
the
await syntax is actually quite old. It first appeared as do-notation in
Haskell in 1998 and you may notice that P0057 can be used to perform
more
general "monadic" transformations and not only limited to coroutines.
Hmm, await can not work with list as a monad, isn't it?
Bit no proposal is tempting to take care of this case.


If use same underlying technique as was used at macro-based stackless coroutines of Boost.Asio then it can work with list monad, because such coroutine is just value type which can be copied/moved.

It doesn't really work; you can't support local variables with such a model, because their lifetimes could be reentered after they end. P0057 is fundamentally a coroutines proposal, not a monads proposal, because it does not support repeated resumption from the same suspension state; I think this is the right semantic match for an impure language such as C++.

Evgeny Panasyuk

unread,
Oct 12, 2015, 3:19:55 PM10/12/15
to std-pr...@isocpp.org
12.10.2015 21:46, Richard Smith:
> If use same underlying technique as was used at macro-based
> stackless coroutines of Boost.Asio then it can work with list monad,
> because such coroutine is just value type which can be copied/moved.
>
>
> It doesn't really work; you can't support local variables with such a
> model, because their lifetimes could be reentered after they end.

Local variables do work with technique used by stackless coroutines of
Boost.Asio (and proposals like N4244).

With such approach coroutine is transformed into class. Local variables
are transformed into fields of class (more precisely into nested unions
corresponding to scopes, as described in N4244), and coroutine body is
transformed into method-state-machine, where it's states correspond to
yield points.
This already can be implemented via macros to some extent.

Moreover, C#'s await is implemented based on similar approach:
http://www.codeproject.com/Articles/535635/Async-Await-and-the-Generated-StateMachine

> P0057
> is fundamentally a coroutines proposal, not a monads proposal, because
> it does not support repeated resumption from the same suspension state;
> I think this is the right semantic match for an impure language such as C++.

It is intrinsically non-zero overhead, due to type-erasure/allocations.
This fact alone is strong argument against it.

While with approach based on method-state-machine - we can get both:
generality and performance.

Gor Nishanov

unread,
Oct 12, 2015, 3:30:44 PM10/12/15
to ISO C++ Standard - Future Proposals
Evgeny:

Note the "purity" word in Richard's answer.
If you write a body of the coroutine in a pure manner, you can hack P0057 and in your await_suspend for the list monad resume the coroutine multiple times.
You need to provide proper final_suspend and return_value to make it work. But it will work ONLY if your body is pure :-). That is the body of your coroutine. And you cannot save any state in the awaiter, since it is torn down at the end of the full expressions, hence, I am using thread_local to ferry a value from await_suspend to await_resume.

Here is "do not try this at home" awaiter for the list<T>. Untested. Just an idea of how it can look like.

auto operator await(list<T> const& l) {
 
struct awaiter {
    list
<T> const * list_;

   
static thread_local T* result_;

   
bool await_ready() { return false; }

   
void await_suspend(coroutine_handle<> h) {
       
auto l = list_;
       
for (auto && item : *l) { result_ = &item; h.resume(); }
       
// add code to extract the result from the promise and do something with it.
   
}

   
// for every element of the list return the value that we stashed in thread_local
    T
const & await_resume() { return *result_; }
 
}
 
return awaiter{&l};
}


Richard Smith

unread,
Oct 12, 2015, 3:53:42 PM10/12/15
to std-pr...@isocpp.org
On Mon, Oct 12, 2015 at 12:19 PM, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
12.10.2015 21:46, Richard Smith:
    If use same underlying technique as was used at macro-based
    stackless coroutines of Boost.Asio then it can work with list monad,
    because such coroutine is just value type which can be copied/moved.


It doesn't really work; you can't support local variables with such a
model, because their lifetimes could be reentered after they end.

Local variables do work with technique used by stackless coroutines of Boost.Asio (and proposals like N4244).

With such approach coroutine is transformed into class. Local variables are transformed into fields of class (more precisely into nested unions corresponding to scopes, as described in N4244), and coroutine body is transformed into method-state-machine, where it's states correspond to yield points.
This already can be implemented via macros to some extent.

I think you've missed my point about object lifetime. Consider:

list_monad<int> f(list_monad<int> ints) {
  {
    auto x = make_shared<int>(42);
    auto &r = x;
    int y = await ints; // #1, suppose this behaves like a list monad
    cout << *r + y;
  } // #2
  return 0;
}

No matter how you transform this into a class, it won't actually work (and rightly so): the lifetime of the x object ends the first time line #2 is reached. When you try to resume at line #1, there's no way to bring x back to life again. Now, you might suggest that the way to solve this is to make a copy of the monad state at the point where we hit the 'await', so you can "safely" resume it multiple times. But that doesn't work either: your copy's 'r' would refer to the original's 'x' (whose lifetime has ended), not to the copy's 'x'.

Moreover, C#'s await is implemented based on similar approach: http://www.codeproject.com/Articles/535635/Async-Await-and-the-Generated-StateMachine

The implementation approach is fine for coroutines (C#'s await doesn't support the list monad / continuations), but doesn't work for the full generality of monads in a system with mutable state.

Richard Smith

unread,
Oct 12, 2015, 3:59:29 PM10/12/15
to std-pr...@isocpp.org
On Mon, Oct 12, 2015 at 12:30 PM, Gor Nishanov <gorni...@gmail.com> wrote:
Evgeny:

Note the "purity" word in Richard's answer.
If you write a body of the coroutine in a pure manner, you can hack P0057 and in your await_suspend for the list monad resume the coroutine multiple times.
You need to provide proper final_suspend and return_value to make it work. But it will work ONLY if your body is pure :-). That is the body of your coroutine. And you cannot save any state in the awaiter, since it is torn down at the end of the full expressions, hence, I am using thread_local to ferry a value from await_suspend to await_resume.

Here is "do not try this at home" awaiter for the list<T>. Untested. Just an idea of how it can look like.

auto operator await(list<T> const& l) {
 
struct awaiter {
    list
<T> const * list_;

   
static thread_local T* result_;

   
bool await_ready() { return false; }

   
void await_suspend(coroutine_handle<> h) {
       
auto l = list_;
       
for (auto && item : *l) { result_ = &item; h.resume(); }

For this to work, I think you'd need your coroutine to (somehow) repeatedly await the list item. That is, instead of:

list_monad<int> foo(list_monad<int> v) {
  int x = await v;
  return x * x;
}

... you'd need to write:

list_monad<int> foo(list_monad<int> v) {
loop:
  int x = await v;
  // somehow return x * x then conditionally goto loop.
}

... because you don't have any kind of call/cc primitive.

Gor Nishanov

unread,
Oct 12, 2015, 4:24:56 PM10/12/15
to ISO C++ Standard - Future Proposals
... you'd need to write:

list_monad<int> foo(list_monad<int> v) {
loop:
  int x = await v;
  // somehow return x * x then conditionally goto loop.
}

... because you don't have any kind of call/cc primitive.

Yep. You are are right. Without "checkpointing" of the coroutine state you would need a loop. But if you had :-) checkpointing, then:
Maybe this:

    void await_suspend(coroutine_handle<> h) {
       
auto l = list_;

       
auto checkpoint = h.checkpoint();
       
for (auto && item : *l) { result_ = &item; h.resume(); h.load(checkpoint); }
       
// add code to extract the result from the promise and do something with it.
   
}

For "pure" functions checkpointing is cheap. The only state they have is which suspend point they are at.
In no way I am suggesting that we are going to do checkpointing. Just geeking out.


Evgeny Panasyuk

unread,
Oct 12, 2015, 5:08:21 PM10/12/15
to std-pr...@isocpp.org
12.10.2015 22:53, Richard Smith:
>
> I think you've missed my point about object lifetime. Consider:
>
> list_monad<int> f(list_monad<int> ints) {
>
> No matter how you transform this into a class, it won't actually work
> (and rightly so): the lifetime of the x object ends the first time line
> #2 is reached. When you try to resume at line #1, there's no way to
> bring x back to life again. Now, you might suggest that the way to solve
> this is to make a copy of the monad state at the point where we hit the
> 'await', so you can "safely" resume it multiple times. But that doesn't
> work either: your copy's 'r' would refer to the original's 'x' (whose
> lifetime has ended), not to the copy's 'x'.

Thank you for detailed description, I get your point.
I am aware of this issue, and I agree that it can lead to subtle bugs -
because such code works in unintuitive/unaccustomed manner.

Nevertheless, I don't think that possibility of such bugs makes whole
approach non-usable. I think it is acceptable price for performance and
generality/features/power it provides.
C++ was never a defensive language.

For instance I want to use non-owning raw pointers, and I accept the
price of increased possibility of memory corruption. And if one needs
higher defensiveness - it is possible to use shared/weak_ptr in
casual/wasteful manner.

Same applies here - I want to copy/move/fork/serialize/etc coroutines,
and I agree to pay for possibility of problems with locals lifetime
issues. But if someone would like to avoid such issues, and do not need
fork/etc - then he could use non-copyable non-movable coroutines
allocated on heaps.

>
> Moreover, C#'s await is implemented based on similar approach:
> http://www.codeproject.com/Articles/535635/Async-Await-and-the-Generated-StateMachine
>
>
> The implementation approach is fine for coroutines (C#'s await doesn't
> support the list monad / continuations),

Yes, C# await doesn't support copy of state (at least in straightforward
way).
My point here is that approach based on such kind of transformation
(coroutine body into method-state-machine, locals to class fields) is
already implemented and used in one of mainstream languages.

> but doesn't work for the full
> generality of monads in a system with mutable state.

Why? As I can see copying of coroutine is similar to call/cc, which in
turn is somewhat dual to monads.
By the way, I did an example sometime ago how to use call/cc to get
"monadic flow", including List monad: http://ideone.com/7uOVe2

Gor Nishanov

unread,
Oct 12, 2015, 6:18:24 PM10/12/15
to ISO C++ Standard - Future Proposals


On Monday, October 12, 2015 at 2:08:21 PM UTC-7, Evgeny Panasyuk wrote:
Same applies here - I want to copy/move/fork/serialize/etc coroutines,
and I agree to pay for possibility of problems with locals lifetime
issues.

Write a proposal. It is trivial to add, clone(), checkpoint(), save(), restore() members to coroutine_handle<>.  Coroutine handle is just a pointer to a blob of memory representing the current state of the coroutine.

In fact, you can probably hack it up today using VS 2015 RTM, by providing your own allocator and learning the size and location of the memory block representing the coroutine state. Once you now it, you can pretty much do whatever you want with it. 

Evgeny Panasyuk

unread,
Oct 12, 2015, 6:47:24 PM10/12/15
to std-pr...@isocpp.org
13.10.2015 1:18, Gor Nishanov:

> Same applies here - I want to copy/move/fork/serialize/etc coroutines,
> and I agree to pay for possibility of problems with locals lifetime
> issues.

> Write a proposal. It is trivial to add, clone(), checkpoint(), save(),
> restore() members to coroutine_handle<>. Coroutine handle is just a
> pointer to a blob of memory representing the current state of the coroutine.

My main concern about P0057R0 is type-erasure - it is far from being
zero-overhead.
And as I can see - stackless coroutine can be implemented without such
type-erasure. It's size is known at compile-time - and it can be just
normal type with all data contained within it's sizeof - there is no
need for any special allocation/deallocation of "remote" parts.

> In fact, you can probably hack it up today using VS 2015 RTM, by
> providing your own allocator and learning the size and location of the
> memory block representing the coroutine state. Once you now it, you can
> pretty much do whatever you want with it.

Well, same is possible with stackful coroutines and even with threads.
But it is just chunk of raw bits. And all type info required to do
proper copy/move (not just memcpy) is removed during compilation.

Gor Nishanov

unread,
Oct 12, 2015, 7:03:34 PM10/12/15
to ISO C++ Standard - Future Proposals


On Monday, October 12, 2015 at 3:47:24 PM UTC-7, Evgeny Panasyuk wrote:
My main concern about P0057R0 is type-erasure - it is far from being
zero-overhead.

I have had an outstanding challenge for a year already to anyone who thinks that way to come up with a real world problem, reduce it to managable size (say async_tcp_reader) write it up it both ways using P0057 and whatever you consider zero overhead and evaluate on three criteria:

1) How much code end-user have to write
2) How much library support required
3) What is an abstraction penalty, how many instructions need to get executed to get from, say, await Read(buf, len) to an low-level API/hardware, say WSARecv

My statement is that P0057 is as good or better on all 3 criteria than any other proposal I've seen. If you want to accept the challenge, write up an equivalent to TcpReader described in one of these two presentations:


or


Evgeny Panasyuk

unread,
Oct 12, 2015, 10:28:35 PM10/12/15
to ISO C++ Standard - Future Proposals
12 oct 2015 г., 22:30:44 UTC+3 Gor Nishanov :
Evgeny:

Note the "purity" word in Richard's answer.
If you write a body of the coroutine in a pure manner, you can hack P0057 and in your await_suspend for the list monad resume the coroutine multiple times.

Here is an example of how it can be implemented using method-state-machine macros :
http://coliru.stacked-crooked.com/a/a463b0a5504c7401
COROUTINE(vector<int>, list_demo, (int, param),
   
(int, local_x)
   
(int, local_y))
{
    AWAIT
(local_x =) vector<int>{1,2,3};
    AWAIT
(local_y =) vector<int>{10, 20, 30};

    RETURN
(local_x + local_y + param);
}
COROUTINE_END
;

int main()
{
   
auto xs = list_demo{1000}();
   
for(auto x : xs)
        cout
<< x << " ";
}
// Prints: 1011 1021 1031 1012 1022 1032 1013 1023 1033

There is no purity requirement, and actually coroutine body may contain imperative loops.

Possible syntax with language support:
vector<int> list_demo(int param)
{
   
int local_x = await vector<int>{1,2,3};
   
int local_y = await vector<int>{10, 20, 30};
   
   
return local_x + local_y + param;
}
As you can see - it is straightforward syntax transformation from macro-based version.

Germán Diago

unread,
Oct 13, 2015, 7:58:01 AM10/13/15
to ISO C++ Standard - Future Proposals

It is intrinsically non-zero overhead, due to type-erasure/allocations.
This fact alone is strong argument against it.

This is my main concern with the proposal: type-erasure and allocations.
Seems that there are some fancy optimizations possible, but I am not sure
why we should rely on these when there are other alternatives.
 
While with approach based on method-state-machine - we can get both:
generality and performance.

I agree with this. On top of that I do think you can implement everything on
top of this without polluting the core language.

 

Gor Nishanov

unread,
Oct 13, 2015, 9:25:22 AM10/13/15
to ISO C++ Standard - Future Proposals


On Tuesday, October 13, 2015 at 4:58:01 AM UTC-7, Germán Diago wrote:
This is my main concern with the proposal: type-erasure and allocations.
Seems that there are some fancy optimizations possible, but I am not sure
why we should rely on these when there are other alternatives.

At the moment, there is NO alternative that can match zero-overhead of the P0057. If you believe there is one, I offered you earlier in this thread a way how you can validate whether you belief is true or not.

Evgeny Panasyuk

unread,
Oct 13, 2015, 5:37:50 PM10/13/15
to std-pr...@isocpp.org
13.10.2015 2:03, Gor Nishanov:
>
> My main concern about P0057R0 is type-erasure - it is far from being
> zero-overhead.
>
>
> I have had an outstanding challenge for a year already to anyone who
> thinks that way to come up with a real world problem, reduce it to
> managable size (say async_tcp_reader) write it up it both ways using
> P0057 and whatever you consider zero overhead and evaluate on three
> criteria:

Example of real world problem is generator/yield.
An extra allocation here results in significant overhead. Even if some
kind of "small object optimization" scheme is used - it is still not
zero overhead.

> 1) How much code end-user have to write

Code is very similar in both cases.

> 2) How much library support required

Nearly the same. Maybe some additional customization points.

> 3) What is an abstraction penalty, how many instructions need to get
> executed to get from, say, await Read(buf, len) to an low-level
> API/hardware, say WSARecv

Because of concrete types instead of type-erasure (and allocations) -
abstraction penalty is much lower in subset of cases, like generators.
Execution path is very similar, perhaps even less due to less indirections.

> My statement is that P0057 is as good or better on all 3 criteria than
> any other proposal I've seen. If you want to accept the challenge, write
> up an equivalent to TcpReader described in one of these two presentations:

There are use-cases where allocation is OK. For instance in case of
large structure and/or because it is moved around many times. Your
example shows exactly this. But it is not the only one use case for
stackless coroutines.

Nicol Bolas

unread,
Oct 13, 2015, 6:22:40 PM10/13/15
to ISO C++ Standard - Future Proposals
On Tuesday, October 13, 2015 at 5:37:50 PM UTC-4, Evgeny Panasyuk wrote:
13.10.2015 2:03, Gor Nishanov:
>
>     My main concern about P0057R0 is type-erasure - it is far from being
>     zero-overhead.
>
>
> I have had an outstanding challenge for a year already to anyone who
> thinks that way to come up with a real world problem, reduce it to
> managable size (say async_tcp_reader) write it up it both ways using
> P0057 and whatever you consider zero overhead and evaluate on three
> criteria:

Example of real world problem is generator/yield.
An extra allocation here results in significant overhead. Even if some
kind of "small object optimization" scheme is used - it is still not
zero overhead.

Except that he's already proven (in this thread no less) that a good optimizer can elide the allocation. If the compiler can reasonably make it zero overhead, then it is zero overhead.

Evgeny Panasyuk

unread,
Oct 13, 2015, 7:45:42 PM10/13/15
to ISO C++ Standard - Future Proposals
14.10.2015 1:22, Nicol Bolas:


1. It is impossible (practically) in general case.
For instance in case when we put coroutines in container, like:
vector<coroutine> x(N);
In case of coroutines with concrete types and sizeof known at compile - this can be done within single allocation.
But if coroutine type is erased the we will have N+1 allocations in general case - it can't be practically elided.

2. Even if consider only functions scopes - escape analysis would not give 100% guarantee for elision in every case. First of all - I think it would hit halting problem, second - some of functions in call tree may not be inlined for adequate reasons - and this would blind analysis.

3. This would put additional burden on implementers, and I don't see reasonable benefits which we get for such burden.

4. C++11 has lambdas with concrete type - this ensures zero overhead, and fits naturally into language. We don't have type-erasured closures. We can use external type erasure like std::function when needed.
Why we should have type-erasure for stackless coroutines?
 

Gor Nishanov

unread,
Oct 13, 2015, 8:03:56 PM10/13/15
to ISO C++ Standard - Future Proposals
Please see this thread from the last year. 
Make sure to read to the point where Ville said: "Ouch" and what followed afterwards.

Gor Nishanov

unread,
Oct 13, 2015, 8:11:57 PM10/13/15
to ISO C++ Standard - Future Proposals
Okay. Slightly less cryptic reply. Coroutine frame must be stationary once the coroutine starts running.
Resumable Expressions abandoned movability/copyability of the lambda* that was present in earlier resumable lambda proposal. Due to the reasons highlighted in the thread I linked earlier. Thus, the resumable expressions is in exactly the same boat as P0057.

The difference is that in Resumable Expressions you must do type erasure by hand which is difficult to eliminate.
Whereas in P0057 compiler decides whether it needs to do type erasure or not, thus, allowing to optimize it out when unnecessary.


Evgeny Panasyuk

unread,
Oct 13, 2015, 8:20:25 PM10/13/15
to std-pr...@isocpp.org
14.10.2015 3:03, Gor Nishanov:
Thanks for link. I already commented on this issue in current topic:
https://groups.google.com/a/isocpp.org/d/msg/std-proposals/L5ZsY1SYnrA/kGXSVV4RDgAJ

In short, yes - I agree, it could lead to subtle bugs.
But same applies to raw non-owning pointers - they could lead to subtle
bugs too, but we still use them.
And in general: we do not ban square root just because we have negative
numbers.

Evgeny Panasyuk

unread,
Oct 13, 2015, 8:25:45 PM10/13/15
to std-pr...@isocpp.org
14.10.2015 3:11, Gor Nishanov:
> Whereas in P0057 compiler decides whether it needs to do type erasure or
> not, thus, allowing to optimize it out when unnecessary.

What about "vector<coroutine>(N)" use case? As I can see - overhead of N
allocations can't be elided automatically.

Gor Nishanov

unread,
Oct 13, 2015, 8:32:47 PM10/13/15
to ISO C++ Standard - Future Proposals
You can add an ability to clone() and restore() the coroutine state on top of either proposals.
If someone feel strongly about it, he/she can write and submit a proposal to make it happen.

Gor Nishanov

unread,
Oct 13, 2015, 8:34:34 PM10/13/15
to ISO C++ Standard - Future Proposals

On Tuesday, October 13, 2015 at 5:25:45 PM UTC-7, Evgeny Panasyuk wrote:
What about "vector<coroutine>(N)" use case? As I can see - overhead of N
allocations can't be elided automatically.

How can P0114 help you with that? Coroutine state is uncopyable and unmovable there as well. 

Evgeny Panasyuk

unread,
Oct 13, 2015, 8:44:51 PM10/13/15
to std-pr...@isocpp.org
14.10.2015 3:32, Gor Nishanov:
clone/restore are just desirable features. Yes, maybe they can be added
separately.
But I am talking about allocation overhead, which exists here due to
erased type - it is separate issue. If we would have erased type in ISO
- it will be there for a long time, and it would be hard to make it fix it.

For instance, I expect that generators would be fast small zero (or at
least almost zero) overhead things - for instance like transform
iterator. I do not want transform iterator which is allocated on heap.

Nicol Bolas

unread,
Oct 13, 2015, 9:57:55 PM10/13/15
to ISO C++ Standard - Future Proposals
On Tuesday, October 13, 2015 at 7:45:42 PM UTC-4, Evgeny Panasyuk wrote:
14.10.2015 1:22, Nicol Bolas:
On Tuesday, October 13, 2015 at 5:37:50 PM UTC-4, Evgeny Panasyuk wrote:
13.10.2015 2:03, Gor Nishanov:
>
>     My main concern about P0057R0 is type-erasure - it is far from being
>     zero-overhead.
>
>
> I have had an outstanding challenge for a year already to anyone who
> thinks that way to come up with a real world problem, reduce it to
> managable size (say async_tcp_reader) write it up it both ways using
> P0057 and whatever you consider zero overhead and evaluate on three
> criteria:

Example of real world problem is generator/yield.
An extra allocation here results in significant overhead. Even if some
kind of "small object optimization" scheme is used - it is still not
zero overhead.

Except that he's already proven (in this thread no less) that a good optimizer can elide the allocation. If the compiler can reasonably make it zero overhead, then it is zero overhead.



1. It is impossible (practically) in general case.
For instance in case when we put coroutines in container, like:
vector<coroutine> x(N);
In case of coroutines with concrete types and sizeof known at compile - this can be done within single allocation.
But if coroutine type is erased the we will have N+1 allocations in general case - it can't be practically elided.

Ignoring the rest of the discussion on this point, I never claimed that P0057 could guarantee elision in the case you present here. Before, you asked about a specific problem, and I answered with a specific example showing that it was elidable. What you've shown here hardly disproves my point.

Also... how does `vector<coroutine>` make any kind of sense with regard to P0114? The type isn't type erased, so each coroutine has its own type. Therefore, in order to put them in a homogeneous container like `vector`, you'll have to type-erase them. Which requires memory allocation.

At which point, your version gains nothing over P0057.
 
2. Even if consider only functions scopes - escape analysis would not give 100% guarantee for elision in every case. First of all - I think it would hit halting problem, second - some of functions in call tree may not be inlined for adequate reasons - and this would blind analysis.

P0114 requires that all resumable functions you call are inlined. If they're not inlined, you have to manually box them (and the boxing function is no longer resumable). Boxing involves type erasure. And as previously stated, memory allocation.

In order for P0114 to not require the same allocations as P0057, you must be using resumable functions directly, without boxing. So they must be inline. And therefore, your second problem is a non-issue for comparable cases: the compiler for the TU has access to all relevant code.

The only question that remains is this: given full inlining, where does the optimizer break down?

Do you have any actual knowledge that it breaks down in common cases? Or can a smart one cover 80-90% of these cases? Stop talking theory as though this weren't an idea that has already been implemented on at least one compiler.
 
3. This would put additional burden on implementers, and I don't see reasonable benefits which we get for such burden.

No, it doesn't. Or rather, it's the same burden, it's just in a different place.

P0114 requires implementations to go through whole hierarchies of inline function calls and generate types that represent their stacks. It puts a lot of burden on implementer too; it's just in the implementation of the feature rather than the optimization phase.

It's more or less the same work either way. Though admittedly, the P0114 does make it a bit easier for the compiler to see it.

4. C++11 has lambdas with concrete type - this ensures zero overhead, and fits naturally into language. We don't have type-erasured closures. We can use external type erasure like std::function when needed.
Why we should have type-erasure for stackless coroutines?

Because implementing await machinery (promises, awaitable, etc) is hard enough as it is. Adding a template on top of everything only makes things harder.

Evgeny Panasyuk

unread,
Oct 13, 2015, 10:18:44 PM10/13/15
to std-pr...@isocpp.org
14.10.2015 3:34, Gor Nishanov:
It does not erase type:
"
No hidden memory allocations. The memory representation of a resumable
expression can be wherever you need it: on the stack, a global, or a
member of a possibly heap-allocated object.
"

And there is possibility for copyability:
"
10.1 Allowing copyability
By allowing copyability of resumable objects, we enable interesting use
cases such as undo stacks. Although this behaviour comes with risk
associated with aliasing of local variables, an explicit opt in may be
feasible.
"

But, even if there would be no copyability/moveability, and as the
result we cannot use std::vector - we still can place N coroutines into
array with single allocation: make_unique<coroutine[]>(N)

Gor Nishanov

unread,
Oct 13, 2015, 10:31:34 PM10/13/15
to ISO C++ Standard - Future Proposals

On Tuesday, October 13, 2015 at 7:18:44 PM UTC-7, Evgeny Panasyuk wrote:

It does not erase type:

It forces you to do it by hand, as you can see this in both generator and async examples. The key is it requires stationary frame and for that you either need to heap allocate or keep the lifetime of the coroutine fully enclosed in the lifetime of its consumer. Exactly the same case where heap elision is done.
 
But, even if there would be no copyability/moveability, and as the
result we cannot use std::vector - we still can place N coroutines into
array with single allocation: make_unique<coroutine[]>(N)

P0057 allows you to customize allocation, thus you can achieve the same goal, but with different means.
I do not claim that feature lists are identical for P0057 and P0114. I claim that for a complicated problem, like async programming, for example,

P0057 solution will results in:

1) less user written code
2) less library support code
3) less abstraction overhead (see TcpReader, for example)

Than a solution to the same problem in P0114.
If you want to argue superiority of P0114, pick a problem (hint, hint async programming) write a solution, compare with equivalent of P0057.

Evgeny Panasyuk

unread,
Oct 13, 2015, 11:17:53 PM10/13/15
to std-pr...@isocpp.org
14.10.2015 4:57, Nicol Bolas:
> Except that he's already proven (in this thread no less) that a
> good optimizer can elide the allocation. If the compiler can
> reasonably /make/ it zero overhead, then it /is/ zero overhead.
>
>
>
> 1. It is impossible (practically) in general case.
> For instance in case when we put coroutines in container, like:
> |
> vector<coroutine>x(N);
> |
> In case of coroutines with concrete types and sizeof known at
> compile - this can be done within single allocation.
> But if coroutine type is erased the we will have N+1 allocations in
> general case - it can't be practically elided.
>
>
> Ignoring the rest of the discussion on this point, I never claimed that
> P0057 could guarantee elision in the case you present here. Before, you
> asked about a /specific/ problem, and I answered with a specific example
> showing that it was elidable. What you've shown here hardly disproves my
> point.

It is not zero overhead even with good optimizer/compiler, because they
can't elide every allocation, and I am not talking about some exotic cases.

>
> Also... how does `vector<coroutine>` make any kind of sense with regard
> to P0114? The type isn't type erased, so each coroutine has its own
> type. Therefore, in order to put them in a homogeneous container like
> `vector`, you'll have to type-erase them. Which requires memory allocation.
> At which point, your version gains /nothing/ over P0057.

Same coroutines have same concrete types. For instance, with P0114 it
may be:
|
struct concrete_coroutine
{
resumable auto r = expression;
// ...
};
...
make_unique<concrete_coroutine[]>(N);
|

For example, imagine some kind of TCP server, coroutine for each
incoming connection does same job, has same locals, and as the
consequence they have same type.

LIVE DEMO using macro-based stackless coroutines from Boost.Asio:
http://coliru.stacked-crooked.com/a/0c09744abd5e57ae


>
> 2. Even if consider only functions scopes - escape analysis would
> not give 100% guarantee for elision in every case. First of all - I
> think it would hit halting problem, second - some of functions in
> call tree may not be inlined for adequate reasons - and this would
> blind analysis.
>
>
> P0114 /requires/ that all resumable functions you call are inlined. If
> they're not inlined, you have to manually box them (and the boxing
> function is no longer resumable). Boxing involves type erasure. And as
> previously stated, memory allocation.
>
> In order for P0114 to not require the same allocations as P0057, you
> must be using resumable functions directly, without boxing. So they must
> be inline. And therefore, your second problem is a non-issue for
> comparable cases: the compiler for the TU has access to all relevant
code.


It requires inlining of resumable things which are inside body of
resumable functions.
Outside of resumable context you can .resume() coroutine without inlining.

>
> The only question that remains is this: given full inlining, where does
> the optimizer break down?

For instance, when you store coroutine in some container, not an unusual
case.

> Stop talking theory as
> though this weren't an idea that has already been implemented on at
> least one compiler.

I am not talking specifically about P0114. I am talking about stackless
coroutines with concrete non-type-erased types.

And this is already implementable in macro-library form. I already
showed several examples in current topic, even await for List Monad. And
these example are live - you can test them - modify and recompile it
with your browser just by several mouse clicks - it is not just a
theory, it is proven to work approach.

>
> 3. This would put additional burden on implementers, and I don't see
> reasonable benefits which we get for such burden.
>
>
> No, it doesn't. Or rather, it's the same burden, it's just in a
> different place.

It is not the same burden. Stackless coroutines with concrete types,
P0114, as well as macro-based solutions - do not need special tricky
allocation elision, they just don't do any allocation in a first place.

> P0114 requires implementations to go through whole hierarchies of inline
> function calls and generate types that represent their stacks. It puts a
> lot of burden on implementer too; it's just in the implementation of the
> feature rather than the /optimization/ phase.
>
> It's more or less the same work either way. Though admittedly, the P0114
> does make it a bit easier for the compiler to see it.

Again, I am not talking specifically about P0114. Even P0057 can be
changed to have concrete coroutine type.

>
> 4. C++11 has lambdas with concrete type - this ensures zero
> overhead, and fits naturally into language. We don't have
> type-erasured closures. We can use external type erasure like
> std::function when needed.
> Why we should have type-erasure for stackless coroutines?
>
>
> Because implementing await machinery (promises, awaitable, etc) is hard
> enough as it is.

It is implementable to some extent even which macros, but with not
pretty syntax.

> Adding a template on top of everything only makes
> things harder.

Which template? Do you mean mandatory inlining in P0114? It requires
this inlining in order to solve orthogonal and harder problem, not
problem of allocations.
N4244, somewhat predecessor of P0114 - also does not force type-erasure
and allocation, but it does not require inlining you are referring to.


Evgeny Panasyuk

unread,
Oct 13, 2015, 11:46:31 PM10/13/15
to std-pr...@isocpp.org
14.10.2015 5:31, Gor Nishanov:
>
> It does not erase type:
>
>
> It forces you to do it by hand, as you can see this in both generator
> and async examples.

Not "by hand" - this type-erasure can be within standard library. Just
like std::function. And it is not hard at all to use lambdas with
std::function, when needed.

> The key is it requires stationary frame and for that
> you either need to heap allocate or keep the lifetime of the coroutine
> fully enclosed in the lifetime of its consumer. Exactly the same case
> where heap elision is done.

1. Even if consider only functions scopes - escape analysis would not
give 100% guarantee for elision in every case. First of all - I think it
would hit halting problem, second - some of functions in call tree may
not be inlined for adequate reasons - and this would blind analysis.
While in cases of concrete coroutine type - no analysis is required at all.

2. Coroutine locals can be copied/moved with coroutine itself - this is
already implementable via macros. P0144 does not exclude copying
possibility.
But even if add .clone() to P0057 - we still would have erased type.

3. Again, consider case when coroutine is stored in structure or array
like: make_unique<coroutine[]>(N).
Yes, it will be in heap, but each coroutine would not be allocated
separately, it will be just one allocation.
Same applies to structures. For instance if we have:
struct Foo
{
coroutine x;
// ...
};
and then do make_unique<Foo>() - then in case of type-erased coroutine
there will be two allocations, but for coroutine with concrete type -
just one.

>
> But, even if there would be no copyability/moveability, and as the
> result we cannot use std::vector - we still can place N coroutines into
> array with single allocation: make_unique<coroutine[]>(N)
>
>
> P0057 allows you to customize allocation, thus you can achieve the same
> goal, but with different means.

1. Even with custom allocation, in case of array of coroutines there
will be O(N) overhead vs possible O(1).

2. You still need place somewhere for allocation buffer. And things
complicated (resulting in overhead) by fact that you do not know at
compile time how much each coroutine would take.

> I do not claim that feature lists are identical for P0057 and P0114. I
> claim that for a complicated problem, like async programming, for example,
>
> P0057 solution will results in:
>
> 1) less user written code
> 2) less library support code
> 3) less abstraction overhead (see TcpReader, for example)
>
> Than a solution to the same problem in P0114.
> If you want to argue superiority of P0114, pick a problem (hint, hint
> async programming) write a solution, compare with equivalent of P0057.


I am not talking specifically about P0114. Right now I am talking about
stackless coroutines with concrete non-type-erased types. And this is
possible even for P0057, without any major syntax change.

Gor Nishanov

unread,
Oct 13, 2015, 11:50:16 PM10/13/15
to ISO C++ Standard - Future Proposals


On Tuesday, October 13, 2015 at 8:17:53 PM UTC-7, Evgeny Panasyuk wrote:
I am not talking specifically about P0114. I am talking about stackless
coroutines with concrete non-type-erased types.

The starting point of my design was a lambda* with the properties you describe. When applied to problems I needed solving I found it unsatisfactory and therefore went with N4134 proposal. That does not mean that at some point, somebody won't be able to invent a better lambda* and get it standardized.

P0114, P0057 and lambda* are all powered by the same underlying machinery. A transformation of a state machine written in imperative fashion into an actual state machine. The difference is in a public face of the state machine. You just need to figure out compelling use-cases and sane semantics that are not already covered efficiently by P0057 and write a proposal. I cannot do it for you as the problems I need solving, namely async I/O and async programming in general, are addressed by P0057 succinctly and efficiently.

Evgeny Panasyuk

unread,
Oct 14, 2015, 12:14:59 AM10/14/15
to ISO C++ Standard - Future Proposals
14 October 2015 г., 6:50:16 UTC+3 Gor Nishanov :


On Tuesday, October 13, 2015 at 8:17:53 PM UTC-7, Evgeny Panasyuk wrote:
I am not talking specifically about P0114. I am talking about stackless
coroutines with concrete non-type-erased types.

The starting point of my design was a lambda* with the properties you describe. When applied to problems I needed solving I found it unsatisfactory and therefore went with N4134 proposal. That does not mean that at some point, somebody won't be able to invent a better lambda* and get it standardized.


Well, this is also the problem. If we already would have some stackless coroutines in ISO - it would be much harder to get additional one into it.
 
P0114, P0057 and lambda* are all powered by the same underlying machinery. A transformation of a state machine written in imperative fashion into an actual state machine. The difference is in a public face of the state machine. You just need to figure out compelling use-cases and sane semantics that are not already covered efficiently by P0057 and write a proposal. I cannot do it for you as the problems I need solving, namely async I/O and async programming in general, are addressed by P0057 succinctly and efficiently.


If you need only async I/O - yes, I could imagine that extra allocation is tolerable in such context. But P0057 describes not only async I/O - but also for instance generators. And for generators (like transform iterators) an extra allocation is huge price.
It is acceptable price for languages like C# (especially taking into account fast happy-path allocations in first generation of copying GC), but it is definitely not acceptable for C++ which has costly default allocations and ambitious zero-overhead goal.

And if you consider only performance of async I/O use cases - then I think proposal should directly reflect this somehow.

Nicol Bolas

unread,
Oct 14, 2015, 12:46:51 AM10/14/15
to ISO C++ Standard - Future Proposals
On Tuesday, October 13, 2015 at 11:17:53 PM UTC-4, Evgeny Panasyuk wrote:
14.10.2015 4:57, Nicol Bolas:
>         Except that he's already proven (in this thread no less) that a
>         good optimizer can elide the allocation. If the compiler can
>         reasonably /make/ it zero overhead, then it /is/ zero overhead.
>
>
>
>     1. It is impossible (practically) in general case.
>     For instance in case when we put coroutines in container, like:
>     |
>     vector<coroutine>x(N);
>     |
>     In case of coroutines with concrete types and sizeof known at
>     compile - this can be done within single allocation.
>     But if coroutine type is erased the we will have N+1 allocations in
>     general case - it can't be practically elided.
>
>
> Ignoring the rest of the discussion on this point, I never claimed that
> P0057 could guarantee elision in the case you present here. Before, you
> asked about a /specific/ problem, and I answered with a specific example
> showing that it was elidable. What you've shown here hardly disproves my
> point.

It is not zero overhead even with good optimizer/compiler, because they
can't elide every allocation, and I am not talking about some exotic cases.

Thus far, including in this post, you haven't mentioned an example that would actually compile.

You linked to some macro code, but macros are, basically, cheating. They get to break all kinds of C++ rules, which an actual language feature would not.
 
>
> Also... how does `vector<coroutine>` make any kind of sense with regard
> to P0114? The type isn't type erased, so each coroutine has its own
> type. Therefore, in order to put them in a homogeneous container like
> `vector`, you'll have to type-erase them. Which requires memory allocation.
 > At which point, your version gains /nothing/ over P0057.

Same coroutines have same concrete types. For instance, with P0114 it
may be:
|
struct concrete_coroutine
{
     resumable auto r = expression;
     // ...
};
...
make_unique<concrete_coroutine[]>(N);
|

`auto` doesn't work that way. Non-static data members cannot be `auto`. Normally I wouldn't care about a small issue like that, but it basically makes your code impossible.

Without `auto` NSDMI (and I wouldn't hold my breath on seeing it), you can't store a resumable expression. So you can't make containers of them.

Unless you erase their types. So again, you've gained nothing.

Your macro solution gets around this because it uses macros.
 
Again, I am not talking specifically about P0114. Even P0057 can be
changed to have concrete coroutine type.

I'd be curious to see how, exactly.

And I don't mean some macro nonsense. I mean the specific details of how you turn P0057's `coroutine_handle` into a type.

See, the way P0057 works is that the coroutine object is introduced in one specific place: the awaiter object's `await_suspend` method. That's the first place where user code gets to touch a coroutine, and if the method doesn't store it or otherwise keep it around, it's also the last.

That's what I meant when I said "adding a template". Because now, `await_suspend` must become a template function. There's no other way to capture a parameter of an arbitrary, compiler-generated type.

But what of generators and promises in such a scenario? A `generator<int>` needs to be able to store any coroutine, so it has to... type erase it. The promise type cannot be a template on the coroutine type, since it was declared long before the coroutine handle appeared. And so forth.

In short, the entirety of P0057 is designed around a type-erased `coroutine_handle`. You can't simply declare that it's not type-erased and expect everything to work reasonably. The entire design would need to be rethought.

If you want this done, then you're going to need to go through the effort of designing the feature to work without type erasure. Then you have to get someone to implement it. Then, you can know whether it works just as well as P0057, whether it's equally easy to use, and how much of a performance advantage it gets.

If any.

Nicol Bolas

unread,
Oct 14, 2015, 12:51:07 AM10/14/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 12:14:59 AM UTC-4, Evgeny Panasyuk wrote:
If you need only async I/O - yes, I could imagine that extra allocation is tolerable in such context. But P0057 describes not only async I/O - but also for instance generators. And for generators (like transform iterators) an extra allocation is huge price.

A price you will never pay because it will be elided.

Please stop repeating statements that have been disproven; it's not helping your case. You have yet to post an example of a generator that would not be elided.

Evgeny Panasyuk

unread,
Oct 14, 2015, 1:13:56 AM10/14/15
to std-pr...@isocpp.org
14.10.2015 7:51, Nicol Bolas:

> If you need only async I/O - yes, I could imagine that extra
> allocation is tolerable in such context. But P0057 describes not
> only async I/O - but also for instance generators. And for
> generators (like transform iterators) an extra allocation is huge price.
>
>
> A price you will never pay because it will be elided.
>
> Please stop repeating statements that have been disproven; it's not
> helping your case. You have yet to post an example of a generator that
> would not be elided.

I already described it several times, just put generator into some
structure/array or return somewhere.
In similar situation:
http://coliru.stacked-crooked.com/a/0c09744abd5e57ae
- allocation of P0057 generator will not be elided, there will be N
allocations, i.e. for each coroutine.

You can read what Gor said previously in this topic:


"If coroutine lifetime is fully enclosed in the lifetime of the calling
function, then we can
1) elide allocation and use a temporary on the stack of the caller
2) replace indirect calls with direct calls and inline as appropriate:"


There is "if" condition. Even if we assume that optimizers always do
elision when condition is true, there is still no elision for cases with
false condition.

Ville Voutilainen

unread,
Oct 14, 2015, 1:36:28 AM10/14/15
to ISO C++ Standard - Future Proposals
On 14 October 2015 at 07:14, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
>> The starting point of my design was a lambda* with the properties you
>> describe. When applied to problems I needed solving I found it
>> unsatisfactory and therefore went with N4134 proposal. That does not mean
>> that at some point, somebody won't be able to invent a better lambda* and
>> get it standardized.
> Well, this is also the problem. If we already would have some stackless
> coroutines in ISO - it would be much harder to get additional one into it.

That "much harder" is fairly questionable. While there apparently are people
who think working on e.g. stackful coroutines becomes a pointless exercise
if Gor's proposal is accepted, and while there are people who prefer
picking one rather than many solutions (albeit the problems being different),

- we have plenty of people in the committee who understand that the different
solutions are tackling different problems,
- for those who don't understand it, the proposal authors can fairly easily
re-explain it
- while they are doing that re-explaining, Gor is going to nod vigorously
and help them explain it, even though it's not the main focus of his
attention as far as the overall design space of these facilities goes,

so, compared to almost any other facility that would provide an
alternative/additional
approach for something already partially tackled by a standard facility,
standardizing something additional in this area may be much easier than people
fear.

Evgeny Panasyuk

unread,
Oct 14, 2015, 2:02:45 AM10/14/15
to std-pr...@isocpp.org
14.10.2015 8:36, Ville Voutilainen:
> On 14 October 2015 at 07:14, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
>>> The starting point of my design was a lambda* with the properties you
>>> describe. When applied to problems I needed solving I found it
>>> unsatisfactory and therefore went with N4134 proposal. That does not mean
>>> that at some point, somebody won't be able to invent a better lambda* and
>>> get it standardized.
>> Well, this is also the problem. If we already would have some stackless
>> coroutines in ISO - it would be much harder to get additional one into it.
>
> That "much harder" is fairly questionable. While there apparently are people
> who think working on e.g. stackful coroutines becomes a pointless exercise
> if Gor's proposal is accepted, and while there are people who prefer
> picking one rather than many solutions (albeit the problems being different),

Indeed, there is overlap between use cases of stackless and stackful
coroutines. But in addition, each one has it's own area where it beats
another approach.

And I think C++ ISO needs both - stackless and stackful coroutines.
(though I would prefer to get stackless into ISO first, because stackful
can be completely implemented in library, like Boost.Context/Coroutine)

But here is another situation - both options aim to exactly same use
cases, and even one is strictly more powerful than another. If we would
have stackless coroutine with concrete type - then we always can
implement type-erasure on top of it. Just like we have std::function
type-erasure for concrete lambda types.


German Diago

unread,
Oct 14, 2015, 3:12:22 AM10/14/15
to std-pr...@isocpp.org
On Wed, Oct 14, 2015 at 7:11 AM, Gor Nishanov <gorni...@gmail.com> wrote:
Okay. Slightly less cryptic reply. Coroutine frame must be stationary once the coroutine starts running.
Resumable Expressions abandoned movability/copyability of the lambda* that was present in earlier resumable lambda proposal.

The new paper mentions about opt-in to move/copy. Could be proposed. I find it useful.
 

The difference is that in Resumable Expressions you must do type erasure by hand which is difficult to eliminate.
Whereas in P0057 compiler decides whether it needs to do type erasure or not, thus, allowing to optimize it out when unnecessary.

What I really like about resumable expressions is that it is really, really obvious and lightweight how it works.
Your implementation, Gor, looks good to me. But I am concerned we can avoid some of the trouble. I do not see a problem
in having library-abstracted generators on top of resumable expressions. Why should we embed a full protocol in the language itself
when we can get it done with only "break resumable" and have the rest on top of library abstractions. I just do not get why,
because additionally, you can have your generators, your await, everything, and remove type erasure and have *real* zero overhead
from the beginning. Without any fancy optimizations or escape analysis. I am against introducing in the language something that
is not inherently zero-overhead when we have alternatives. Do not get me wrong, the proposal gives a lot of inspiration, in my opinion,
for how to do a few great things. But I honestly think we can do better.

I know about your suggestion on how to compare, but I simply do not have enough time. I hope I had, but I am short on time.
I see resumable expressions more understandable respect to the traditional c++ model and I think they guarantee zero-overhead
in more cases than your proposal. Though, I recognize that the numbers you show for your implementation look good, but, again:

1. need compiler optimizations such as escape analysis.
2.  no matter the way you put it, they are not inherently zero-overhead for the curent state of the art. Even you mentioned clang
is planning to introduce some optimization that is not available already. I think we should not get into that trouble,
we have alternatives. There are more compilers around also: Intel, IBM...


German Diago

unread,
Oct 14, 2015, 3:14:49 AM10/14/15
to std-pr...@isocpp.org
P0114 requires that all resumable functions you call are inlined. If they're not inlined, you have to manually box them (and the boxing function is no longer resumable). Boxing involves type erasure. And as previously stated, memory allocation.

You can box it. You *can*, you do not *need*. And you do not need to have an inline resumable expression if you just implement and type-erase in a .cpp file. You still have the same amount of power as your proposal, just you have the *additional* option of inlining.
I think this alone makes already that proposal inherently "more zero-overhead".
 

German Diago

unread,
Oct 14, 2015, 3:18:53 AM10/14/15
to std-pr...@isocpp.org


Not "by hand" - this type-erasure can be within standard library. Just like std::function. And it is not hard at all to use lambdas with std::function, when needed.

Completely agree. This is the way it should be done, IMHO. Not embedding it into the language for no gain.
I do not claim that feature lists are identical for P0057 and P0114. I
claim that for a complicated problem, like async programming, for example,

P0057 solution will results in:

1) less user written code

This is due to embedding more into the language. The resumable expressions can do all of it in libraries.
So as a solution, I find it superior a library than embedding into the language
 
2) less library support code

Again, because it is embedded.
 
3) less abstraction overhead (see TcpReader, for example)

This is a carefully chosen use-case. There are many others. Though, it is a real one, I cannot say it is not.
 
Than a solution to the same problem in P0114.
If you want to argue superiority of P0114, pick a problem (hint, hint
async programming) write a solution, compare with equivalent of P0057.

You need to write less code does not mean that it is zero-overhead, which is a main point of the discussion.
 

Ville Voutilainen

unread,
Oct 14, 2015, 3:42:53 AM10/14/15
to ISO C++ Standard - Future Proposals
On 14 October 2015 at 09:02, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
>> That "much harder" is fairly questionable. While there apparently are
>> people
>> who think working on e.g. stackful coroutines becomes a pointless exercise
>> if Gor's proposal is accepted, and while there are people who prefer
>> picking one rather than many solutions (albeit the problems being
>> different),
>
>
> Indeed, there is overlap between use cases of stackless and stackful
> coroutines. But in addition, each one has it's own area where it beats
> another approach.

Correct.

> And I think C++ ISO needs both - stackless and stackful coroutines. (though
> I would prefer to get stackless into ISO first, because stackful can be
> completely implemented in library, like Boost.Context/Coroutine)

Fully agreed.

> But here is another situation - both options aim to exactly same use cases,
> and even one is strictly more powerful than another. If we would have
> stackless coroutine with concrete type - then we always can implement
> type-erasure on top of it. Just like we have std::function type-erasure for
> concrete lambda types.

The use cases may be slightly different, because stackful coroutines
do not require an Awaitable type all through the call stack, whereas
stackless coroutines do. The use case may be the same, but which
facility to apply depends on other less-technical-things, like whether
the user of a coroutine controls the full call stack.

As far as having the concrete type goes, that sounds like it requires even
more inlining and across-call-stack transparency. In a stackless coroutine,
the erased type combined with elision of the erasure and allocations avoids
having all coroutines have a different type.

Oliver Kowalke

unread,
Oct 14, 2015, 4:40:19 AM10/14/15
to std-pr...@isocpp.org
2015-10-14 8:02 GMT+02:00 Evgeny Panasyuk <evgeny....@gmail.com>:

And I think C++ ISO needs both - stackless and stackful coroutines. (though I would prefer to get stackless into ISO first, because stackful can be completely implemented in library, like Boost.Context/Coroutine)

but implementing context switching is cumbersome (because assembler) - better the compiler vendors provide implementations for all those combinations of architecture + ABI + binary format

thorsten...@gmail.com

unread,
Oct 14, 2015, 7:50:33 AM10/14/15
to ISO C++ Standard - Future Proposals


On Tuesday, October 13, 2015 at 1:03:34 AM UTC+2, Gor Nishanov wrote:

I have had an outstanding challenge for a year already to anyone who thinks that way to come up with a real world problem, reduce it to managable size (say async_tcp_reader) write it up it both ways using P0057 and whatever you consider zero overhead and evaluate on three criteria:

1) How much code end-user have to write
2) How much library support required
3) What is an abstraction penalty, how many instructions need to get executed to get from, say, await Read(buf, len) to an low-level API/hardware, say WSARecv

My statement is that P0057 is as good or better on all 3 criteria than any other proposal I've seen. If you want to accept the challenge, write up an equivalent to TcpReader described in one of these two presentations:


That is a good way forward. I think the abstraction penalty should be the same, otherwise the "resumable" proposal is dead. Given that, I don't agree that the amount of code is the most important aspect. What matters here is that normal programmers should be able to write correct programs without subtle bugs. I don't mind writing a little more code, if the resulting code is easier to get correct.

kind regards

Thorsten
 

Gor Nishanov

unread,
Oct 14, 2015, 9:37:50 AM10/14/15
to ISO C++ Standard - Future Proposals, thorsten...@gmail.com


On Wednesday, October 14, 2015 at 4:50:33 AM UTC-7, thorsten...@gmail.com wrote:
3) What is an abstraction penalty, how many instructions need to get executed to get from, say, await Read(buf, len) to an low-level API/hardware, say WSARecv

My statement is that P0057 is as good or better on all 3 criteria than any other proposal I've seen. If you want to accept the challenge, write up an equivalent to TcpReader described in one of these two presentations:

That is a good way forward. I think the abstraction penalty should be the same,

I really hoped that one of the proponents will do the exercise and reach the same conclusion as I did. Namely that P0114 has significantly higher overhead measured in hundreds more instructions, memory barriers, type erasure and memory allocations.

If you look at http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/p0055r0.html, it shows how you can automate creation of awaitables for template based libraries dealing with async I/O that gets you code equivalent to hand written assembly.

     ResultType r = await async_xyz(p);

becomes

     async_xyz`Awaiter __tmp{p}; 
     $promise.resume_addr = &__resume_label;   // save the resumption point of the coroutine
     __tmp.resume = $RBP;                      // inlined await_suspend
     os_xyz(p,&OsContextBase::Invoke, &__tmp); // inlined await_suspend
     jmp Epilogue; // suspends the coroutine
__resume_label:    // will be resumed at this point once the operation is finished
     R r = move(__tmp.result); // inlined await_resume

which is pretty much what you would have written by hand if you would write your coroutine in assembly.
Compare that to "can do it in a library" approach.

First, look at heavy-weight await emulation machinery described in 12.5 of P0114R0 starting on page 21.

But, let's say, await is not a natural pattern for P0114, that is why it is heavy weight. Let's look at what could be more natural pattern, shown in:
https://github.com/chriskohlhoff/resumable-expressions/blob/master/examples/await4.cpp . Same situtation, not as bad as with await emulation, but, still bad.

Though, I think that this is fixable. P0114RX can get syntactic sugar equivalent of await and yield, it would not need to do as much work in the library. So it will become as efficient P0057. 

What remains are various inconveniences of how coroutines are presented to the user.
You need to do manual type erasure, like in this example (await4.cpp):

resumable void echo(tcp::socket socket);

resumable
void listen(tcp::acceptor acceptor) {
       
...
    spawn
([s = std::move(socket)]() mutable { echo(std::move(s)); });

When I say manual type erasure, I mean that instead of just calling echo, you need to wrap it in a lambda and give to type-erasing library helper spawn that does wrapping and allocation for you.

Another limitation is that you have to put everything in one file to the code above to work, since in order for listen to be able to use echo in resumable expression, it needs to see the body of the echo, thus, it needs to be defined in the same TU. If you want to put them in different files you need to create little type erasing wrappers, as in:

echo.h:
void echo(tcp::socket socket);

echo
.cpp:

resumable echo_impl
(tcp::socket socket) { ... }

void echo(tcp::socket socket) {
    spawn
([s = std::move(socket)]() mutable { echo_impl(std::move(s)); });
}

I am not so much critisizing Chris, as critisizing myself. I went through this design already when exploring lambda* nearly two years ago. I reached a conclusion that for the problems I tried to apply it to, it did not offer anything to compensate for the complexity compared to boring C# like await syntax. Hence, I no longer pursue this approach.

It does not mean that for some problems, some incarnation of lambda* might be better than P0057. That is wonderful. When this happen, let's add it in, in addition to P0057 and P0099 (modestly called "A low-level API for stackful context switching").

Nicol Bolas

unread,
Oct 14, 2015, 9:47:09 AM10/14/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 1:13:56 AM UTC-4, Evgeny Panasyuk wrote:
14.10.2015 7:51, Nicol Bolas:

>     If you need only async I/O - yes, I could imagine that extra
>     allocation is tolerable in such context. But P0057 describes not
>     only async I/O - but also for instance generators. And for
>     generators (like transform iterators) an extra allocation is huge price.
>
>
> A price you will never pay because it will be elided.
>
> Please stop repeating statements that have been disproven; it's not
> helping your case. You have yet to post an example of a generator that
> would not be elided.

I already described it several times, just put generator into some
structure/array or return somewhere.
In similar situation:
http://coliru.stacked-crooked.com/a/0c09744abd5e57ae
- allocation of P0057 generator will not be elided, there will be N
allocations, i.e. for each coroutine.

And in your case, the `Coroutine` will have to type-erase them too. Thus performing N allocations.

Put it another way. In order to make something a member of a struct, you must first be able to name it. In C++ as it currently stands, it is impossible to store an unnamable type in a non-static data member. Whether it's a lambda or the result of a resumable expression or anything else, it simply cannot happen.

Templates will not save you, because you can't do this:

Coroutine<resumable {expr}>.

You can't even do this:

using coroutine_type = decltype(resumable {expr});

Why do these fail? Because each separate `expr`, even if it's technically the exact same function, will result in a different type. Just like a lambda, copying-and-pasting the expression will yield a different type.

What you're suggesting is impossible. Or at least, it's impossible without trickery (ie: macros).

So Boost.Asio is either doing type erasure or it is cheating. Any core feature will not be allowed to cheat, so you'll have to use type erasure to store the result of such an operation.

You can read what Gor said previously in this topic:


"If coroutine lifetime is fully enclosed in the lifetime of the calling
function, then we can
1) elide allocation and use a temporary on the stack of the caller
2) replace indirect calls with direct calls and inline as appropriate:"

There is "if" condition. Even if we assume that optimizers always do
elision when condition is true, there is still no elision for cases with
false condition.

The only way you could avoid a dynamic allocation while still leaving the lifetime of the calling function is if you could copy/move the coroutine type. And that's just not reasonable.

Or rather, whether a coroutine is movable is based entirely on its implementation. An implementation that you may not have access to. So how could the compiler possibly know that function X will return an immobile type while function Y returns a mobile one?

Of course, for P0114, the question is moot: all resumable functions must be inline, and thus any code catching them will know whether they're mobile or not. But for any other suggestion, the question still stands: if the function isn't inline, how do you know if the coroutine type is mobile?

This is a question that has yet to yield a satisfactory answer. P0114 says that you have to box non-inline functions, which means you always pay overhead for them, unlike P0057, which allows the possibility of optimizing the overhead based on usage.

Without a plan to deal with coroutine type mobility for non-inline cases, there's no reason to talk about what happens if a non-erased coroutine type escapes its owning function.

P0057 has a plan for this. You don't thus far.

Nicol Bolas

unread,
Oct 14, 2015, 9:59:33 AM10/14/15
to ISO C++ Standard - Future Proposals, thorsten...@gmail.com
On Wednesday, October 14, 2015 at 9:37:50 AM UTC-4, Gor Nishanov wrote:
It does not mean that for some problems, some incarnation of lambda* might be better than P0057. That is wonderful. When this happen, let's add it in, in addition to P0057 and P0099 (modestly called "A low-level API for stackful context switching").

There are two problems with the "let's add it in" approach.

First, how do you teach when to use which? For stackful vs. stackless, it's quite easy. You usually know when you genuinely want a stack, and emulating that with stackless (as I discovered) becomes amazingly difficult very quickly. Similarly, stackful makes it very apparent that creating an execution_context is allocating memory, so it's not cheap (though I was surprised to see that Boost.Context's switching only cost ~50 cycles or so).

So with P0114 in the mix, how do you tell people when to use which? Do you use P0114 when you want stackless that can go farther than one or two functions? Is there some simple guideline you can tell beginners about when to use which tool?

The second problem is interoperation. How P0099 and P0057 interoperate is pretty obvious. How P0114 would interop with P0057 is... less obvious. What happens if you `break resume` in an awaitable? Can you use `await` in a resumable function? What madness does `await resumable <expr>` accomplish?

Now, they may have obvious interactions. I haven't gone through a detailed analysis of both. But it is disconcerting.

Gor Nishanov

unread,
Oct 14, 2015, 10:12:13 AM10/14/15
to ISO C++ Standard - Future Proposals, thorsten...@gmail.com


On Wednesday, October 14, 2015 at 6:59:33 AM UTC-7, Nicol Bolas wrote:
On Wednesday, October 14, 2015 at 9:37:50 AM UTC-4, Gor Nishanov wrote:
It does not mean that for some problems, some incarnation of lambda* might be better than P0057. That is wonderful. When this happen, let's add it in, in addition to P0057 and P0099 (modestly called "A low-level API for stackful context switching").

There are two problems with the "let's add it in" approach.

First, how do you teach when to use which? 

The second problem is interoperation. How P0114 would interop with P0057?
 
I need to learn to be more direct. The answers to your questions are in two "some" I used in the sentence you quoted.
First some, means, that there must be a problem which gives a lambda* a clear benefit over P0057. Thus, if lambda* is accepted, there is some important problem that P0057 does not address it efficiently. That should give a pretty clear indication when you should not use P0057 and use lambda* instead.

The second some, states that: "for some incarnation of lambda*". We don't know what it is at the moment. Thus it is premature to worry how it will interoperate. It might do so magnificently or not at all. We don't know.

Giovanni Piero Deretta

unread,
Oct 14, 2015, 11:16:35 AM10/14/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 2:47:09 PM UTC+1, Nicol Bolas wrote:
On Wednesday, October 14, 2015 at 1:13:56 AM UTC-4, Evgeny Panasyuk wrote:
14.10.2015 7:51, Nicol Bolas:

>     If you need only async I/O - yes, I could imagine that extra
>     allocation is tolerable in such context. But P0057 describes not
>     only async I/O - but also for instance generators. And for
>     generators (like transform iterators) an extra allocation is huge price.
>
>
> A price you will never pay because it will be elided.
>
> Please stop repeating statements that have been disproven; it's not
> helping your case. You have yet to post an example of a generator that
> would not be elided.

I already described it several times, just put generator into some
structure/array or return somewhere.
In similar situation:
http://coliru.stacked-crooked.com/a/0c09744abd5e57ae
- allocation of P0057 generator will not be elided, there will be N
allocations, i.e. for each coroutine.

And in your case, the `Coroutine` will have to type-erase them too. Thus performing N allocations.

Put it another way. In order to make something a member of a struct, you must first be able to name it. In C++ as it currently stands, it is impossible to store an unnamable type in a non-static data member. Whether it's a lambda or the result of a resumable expression or anything else, it simply cannot happen.

It is trivially possible. The name is unimportant, only the type is. In this example a lambda stands for an unnamed type. I could have used other unnamed types.

template<class Impl> struct Coroutine { Impl body; };

auto createACoroutine()
{
   auto coro = [] {...};
   Coroutine<decltype(coro)> coroWrapper = {coro};
   return coroWrapper;


-- gpd

Nicol Bolas

unread,
Oct 14, 2015, 11:31:39 AM10/14/15
to ISO C++ Standard - Future Proposals

OK, yes you can do that. My mistake.

However, that code is not a complete example. It doesn't match with your sample code (which currently uses type erasure/chicanery). So what does the non-trick version look like?

const int numCoroutines = 10000;
std
::vector<What> v;
v
.reserve(10000);
for(int i : range(0, 10000))
  v
.emplace_back(createACoroutine);

What goes in `What`?

Oh sure, you can do it if you invert it and wrap it in a function call:

auto createCoroutineVector(const int numCoroutines)
{
  std
::vector<decltype(createACoroutine())> v;
  v
.reserve(numCoroutines);
 
for(int i : range(0, 10000))
    v
.emplace_back(createACoroutine);
 
return v;
}

But this now forces every piece of code to use template deduction to interact with this data. This makes the code a lot less readable.

Just look at this example for inscrutability. To be able to know what to do with the return value, I have to look through two function definitions. The more layers between the source type and the destination, the less readable the code gets.

I would much rather have a genuine type and type-erasure than to have to search through 5 function calls just to figure out what a type is supposed to be.

Giovanni Piero Deretta

unread,
Oct 14, 2015, 11:42:41 AM10/14/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 4:31:39 PM UTC+1, Nicol Bolas wrote:


On Wednesday, October 14, 2015 at 11:16:35 AM UTC-4, Giovanni Piero Deretta wrote:
On Wednesday, October 14, 2015 at 2:47:09 PM UTC+1, Nicol Bolas wrote:
On Wednesday, October 14, 2015 at 1:13:56 AM UTC-4, Evgeny Panasyuk wrote:
14.10.2015 7:51, Nicol Bolas:

>     If you need only async I/O - yes, I could imagine that extra
>     allocation is tolerable in such context. But P0057 describes not
>     only async I/O - but also for instance generators. And for
>     generators (like transform iterators) an extra allocation is huge price.
>
>
> A price you will never pay because it will be elided.
>
> Please stop repeating statements that have been disproven; it's not
> helping your case. You have yet to post an example of a generator that
> would not be elided.

I already described it several times, just put generator into some
structure/array or return somewhere.
In similar situation:
http://coliru.stacked-crooked.com/a/0c09744abd5e57ae
- allocation of P0057 generator will not be elided, there will be N
allocations, i.e. for each coroutine.

And in your case, the `Coroutine` will have to type-erase them too. Thus performing N allocations.

Put it another way. In order to make something a member of a struct, you must first be able to name it. In C++ as it currently stands, it is impossible to store an unnamable type in a non-static data member. Whether it's a lambda or the result of a resumable expression or anything else, it simply cannot happen.

It is trivially possible. The name is unimportant, only the type is. In this example a lambda stands for an unnamed type. I could have used other unnamed types.
[...]

OK, yes you can do that. My mistake.

However, that code is not a complete example. It doesn't match with your sample code (which currently uses type erasure/chicanery). So what does the non-trick version look like?

Note that I'm not the original poster.

You write the same code directly inside createACoroutineVector. No need for the extra function call, createACoroutine was purely an example.

-- gpd

Evgeny Panasyuk

unread,
Oct 14, 2015, 2:29:22 PM10/14/15
to ISO C++ Standard - Future Proposals
14 October 2015 г., 7:46:51 UTC+3 Nicol Bolas :

Thus far, including in this post, you haven't mentioned an example that would actually compile.


Actually examples do compile and do run.
 
You linked to some macro code, but macros are, basically, cheating. They get to break all kinds of C++ rules, which an actual language feature would not.

Macros allow us to emulate language feature, to test it now, with current compilers. Even Stroustrup uses macros in Mach7 library to emulate language feature.
I think it is obvious that following macro-based code:
COROUTINE(vector<int>, list_demo, (int, param),
   
(int, local_x)
   
(int, local_y))
{
    AWAIT
(local_x =) vector<int>{1,2,3};
    AWAIT
(local_y =) vector<int>{10, 20, 30};

    RETURN
(local_x + local_y + param);
}
COROUTINE_END
;

Is equivalent to following code with language support:

vector<int> list_demo(int param)
{
   
int local_x = await vector<int>{1,2,3};
   
int local_y = await vector<int>{10, 20, 30};
   
   
return local_x + local_y + param;
}

And if macro-based version does work, then this one will work without problems.


 
>
> Also... how does `vector<coroutine>` make any kind of sense with regard
> to P0114? The type isn't type erased, so each coroutine has its own
> type. Therefore, in order to put them in a homogeneous container like
> `vector`, you'll have to type-erase them. Which requires memory allocation.
 > At which point, your version gains /nothing/ over P0057.

Same coroutines have same concrete types. For instance, with P0114 it
may be:
|
struct concrete_coroutine
{
     resumable auto r = expression;
     // ...
};
...
make_unique<concrete_coroutine[]>(N);
|

`auto` doesn't work that way. Non-static data members cannot be `auto`. Normally I wouldn't care about a small issue like that, but it basically makes your code impossible.


Such usage of auto is at p0114r0.pdf at page 11.
 

Without `auto` NSDMI (and I wouldn't hold my breath on seeing it), you can't store a resumable expression. So you can't make containers of them.

Unless you erase their types. So again, you've gained nothing.

Your macro solution gets around this because it uses macros.
 

I showed two versions above - macro based version and possible syntax with language support.
Do you have any concrete reasoning why it would be impossible without macros?
 
Again, I am not talking specifically about P0114. Even P0057 can be
changed to have concrete coroutine type.

I'd be curious to see how, exactly.


Currently it works like this:
struct generator
{
     
...
     coroutine_handle
<promise_type> coro;
};

generator example_generator
()
{
   
yield 1;
}

int main()
{
    generator x
= example_generator();
    x
.move_next();
    g
.current_value();
}

With concrete coroutine type it could be something like this:

template<template<typename> class coroutine_value>
struct generator
{
     
...
     coroutine_value
<promise_type> coro;
};

generator example_generator
()
{
   
yield 1;
}
// example_generator is transformed to:
using example_generator = generator< synthesized_coroutine >;

int main()
{
    example_generator x
{};
    x
.move_next();
    g
.current_value();
}

 

If you want this done, then you're going to need to go through the effort of designing the feature to work without type erasure. Then you have to get someone to implement it.

It could be implemented even with macros, to some extent. And I think macro-based solution is enough for proof-of-concept.
 
Then, you can know whether it works just as well as P0057, whether it's equally easy to use, and how much of a performance advantage it gets.

If any.

Of course it gives performance advantage, because does not impose extra mandatory allocation.

Nicol Bolas

unread,
Oct 14, 2015, 3:35:49 PM10/14/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 2:29:22 PM UTC-4, Evgeny Panasyuk wrote:
14 October 2015 г., 7:46:51 UTC+3 Nicol Bolas :

Thus far, including in this post, you haven't mentioned an example that would actually compile.


Actually examples do compile and do run.
 
You linked to some macro code, but macros are, basically, cheating. They get to break all kinds of C++ rules, which an actual language feature would not.

Macros allow us to emulate language feature, to test it now, with current compilers. Even Stroustrup uses macros in Mach7 library to emulate language feature.
I think it is obvious that following macro-based code:
COROUTINE(vector<int>, list_demo, (int, param),
   
(int, local_x)
   
(int, local_y))
{
    AWAIT
(local_x =) vector<int>{1,2,3};
    AWAIT
(local_y =) vector<int>{10, 20, 30};

    RETURN
(local_x + local_y + param);
}
COROUTINE_END
;

Is equivalent to following code with language support:

vector<int> list_demo(int param)
{
   
int local_x = await vector<int>{1,2,3};
   
int local_y = await vector<int>{10, 20, 30};
   
   
return local_x + local_y + param;
}

And if macro-based version does work, then this one will work without problems.

I'll talk about this more later, but a good language feature should be minimal, not do whatever it takes. That's why a macro approach is a bad idea for a proposal. It's fine for a general sketch. But macros make you brave; you can do anything with them.

When it comes to a language feature, you shouldn't do anything. You should do just enough, and no more.

Same coroutines have same concrete types. For instance, with P0114 it
may be:
|
struct concrete_coroutine
{
     resumable auto r = expression;
     // ...
};
...
make_unique<concrete_coroutine[]>(N);
|

`auto` doesn't work that way. Non-static data members cannot be `auto`. Normally I wouldn't care about a small issue like that, but it basically makes your code impossible.


Such usage of auto is at p0114r0.pdf at page 11.

True, but that doesn't make it correct. C++14 doesn't let `auto` do that; the standard is very clear on that. And P0114 does not actually propose allowing `auto` to do that.

All you've shown is that P0114 is in error.

Without `auto` NSDMI (and I wouldn't hold my breath on seeing it), you can't store a resumable expression. So you can't make containers of them.

Unless you erase their types. So again, you've gained nothing.

Your macro solution gets around this because it uses macros.
 

I showed two versions above - macro based version and possible syntax with language support.
Do you have any concrete reasoning why it would be impossible without macros?

Because you'd have to get language support for `auto` in NSDMI's. And I just linked to you a discussion about precisely that and how it's not gonna happen. So your "possible syntax with language support" doesn't hold water.


Um, what does that code mean? Where does `synthesized_coroutine` come from? How does `example_generator` get defined twice? And how does `example_generator` return a template that has no template arguments?

A nice thing about resumable functions is that it doesn't take a sledgehammer to basic elements of the language. If a coroutine function returns a type, it returns that type, and the return value has all the rights and behaviors of a return value from a regular function.

With P0057, C++ works as normal, except where absolutely necessary.

What you're suggesting requires a bunch of different changes to lots of elements of C++. You have to be able to return a template with no arguments, who's arguments are provided by that `using` declaration, I guess. And that the argument has to be able to be generated from... whatever `synthesized_coroutine` is. And so on.

That's a huge amount of work to do just to avoid type erasure. And not just library work; that's core language work. Lots of it.

After all, there's no proposal even remotely like this at present. Even your idea above is incomplete, as it's not clear what all of those pieces actually mean or do (P0057 makes `await` mean one thing. What does it mean in your idea?). You have one general notion: coroutines having a firm type. And you're ready and willing invent a plethora of subsidiary C++ language features that exist for the sole purpose of making that work.

That's not a good way to make a solid proposal. If that one thing requires so many subsidiary language features... maybe that one thing is not worth it.

Even if we accept that this is a good way to make a proposal... it's not a proposal yet. It's just some ideas being batted around on a forum. None of the various coroutine proposals do anything like what you've suggested. Why should we halt or delay progress on P0057 because you think you might be able to do better?

I hate to use this phrase as a way to win arguments, but "perfect is the enemy of good".

Evgeny Panasyuk

unread,
Oct 14, 2015, 4:14:31 PM10/14/15
to ISO C++ Standard - Future Proposals
14 October 2015 г., 22:35:49 UTC+3 Nicol Bolas:

And if macro-based version does work, then this one will work without problems.

I'll talk about this more later, but a good language feature should be minimal, not do whatever it takes. That's why a macro approach is a bad idea for a proposal. It's fine for a general sketch. But macros make you brave; you can do anything with them.

When it comes to a language feature, you shouldn't do anything. You should do just enough, and no more.

I don't do "anything" here. And I don't see that it requires "anything". Syntax is very similar to what P0057 proposes.
 

Without `auto` NSDMI (and I wouldn't hold my breath on seeing it), you can't store a resumable expression. So you can't make containers of them.

Unless you erase their types. So again, you've gained nothing.

Your macro solution gets around this because it uses macros.
 

I showed two versions above - macro based version and possible syntax with language support.
Do you have any concrete reasoning why it would be impossible without macros?

Because you'd have to get language support for `auto` in NSDMI's. And I just linked to you a discussion about precisely that and how it's not gonna happen. So your "possible syntax with language support" doesn't hold water.

Again, here I am not talking about P0114. Example code above is much more closer to P0057 than to P0114. And it does not requires ''auto in NSDMI" - it is clearly seen from code.
 


With concrete coroutine type it could be something like this:

template<template<typename> class coroutine_value>
struct generator
{
     
...
     coroutine_value
<promise_type> coro;
};

generator example_generator
()
{
   
yield 1;
}
// example_generator is transformed to:
using example_generator = generator< synthesized_coroutine >;

int main()
{
    example_generator x
{};
    x
.move_next();
    g
.current_value();
}


Um, what does that code mean? Where does `synthesized_coroutine` come from?

"using" part is done by compiler, synthesized_coroutine comes from compiler.
 
How does `example_generator` get defined twice?

It is not defined twice. First one is what user writes, second one ("using" part) is what compiler does for this code. In essence user code is transformed into type with name example_generator.
 
And how does `example_generator` return a template that has no template arguments?


If you don't like it, it is possible to return type with template inside. For instance
struct generator
{

   
template<template<typename> class coroutine_value>

   
struct apply { ... };
};

 
 
A nice thing about resumable functions is that it doesn't take a sledgehammer to basic elements of the language. If a coroutine function returns a type, it returns that type, and the return value has all the rights and behaviors of a return value from a regular function.


It is not truly return type. Even P0057 does not have true return type, you can't return value of that type from body - it just mimics normal function syntax, but it is not normal function at all, it is just synthetic language construction.
 
What you're suggesting requires a bunch of different changes to lots of elements of C++. You have to be able to return a template with no arguments, who's arguments are provided by that `using` declaration, I guess.

No, it does not require changes to lots of C++ elements. In both cases this is not true function, it is just something with function-like syntax that defines coroutine.
 

After all, there's no proposal even remotely like this at present. Even your idea above is incomplete, as it's not clear what all of those pieces actually mean or do (P0057 makes `await` mean one thing. What does it mean in your idea?). You have one general notion: coroutines having a firm type. And you're ready and willing invent a plethora of subsidiary C++ language features that exist for the sole purpose of making that work.


I do not offer to invent plethora of subsidiary features.
 
That's not a good way to make a solid proposal. If that one thing requires so many subsidiary language features... maybe that one thing is not worth it.


No, this does not requires subsidiary language features.
 
Even if we accept that this is a good way to make a proposal... it's not a proposal yet. It's just some ideas being batted around on a forum. None of the various coroutine proposals do anything like what you've suggested. Why should we halt or delay progress on P0057 because you think you might be able to do better?


If authors of P0057 still would insist on design with intrinsic overhead and high burden on optimizers, then you are right - probably viable path is to make another proposal.
Personally I would prefer to get fast coroutines in like 2020, then to get some coroutines in 2017.

Evgeny Panasyuk

unread,
Oct 14, 2015, 5:09:05 PM10/14/15
to ISO C++ Standard - Future Proposals
 14 October 2015 г., 10:42:53 UTC+3 Ville Voutilainen :

As far as having the concrete type goes, that sounds like it requires even
more inlining and across-call-stack transparency.

It actually requires less inlining and transparency. For instance, this code:
future<int> concrete_coroutine()
{
   
int local = await async_operation();
   
return local;
}
Can be straightforwardly transformed to something like:
struct concrete_coroutine
{
    state_value_type current_state
;
   
int local;

    future
<int> method_state_machine(); // or operator()()
};
Where method_state_machine can be compiled separately, in another translation unit.
And actually this approach is already implementable with macros, to some extent (it works, but compiler-side transformation will give better result).

 
In a stackless coroutine,
the erased type combined with elision of the erasure and allocations avoids
having all coroutines have a different type.

Yes, but it is easy to get erased type from concrete when needed.
For instance different lambdas have different types, but can be easily placed into std::function (if has appropriate signature).

Ville Voutilainen

unread,
Oct 14, 2015, 5:18:21 PM10/14/15
to ISO C++ Standard - Future Proposals
On 15 October 2015 at 00:09, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
> 14 October 2015 г., 10:42:53 UTC+3 Ville Voutilainen :
>>
>>
>> As far as having the concrete type goes, that sounds like it requires even
>> more inlining and across-call-stack transparency.
>
>
> It actually requires less inlining and transparency. For instance, this
> code:
> future<int> concrete_coroutine()
> {
> int local = await async_operation();
> return local;
> }
> Can be straightforwardly transformed to something like:
> struct concrete_coroutine
> {
> state_value_type current_state;
> int local;
>
> future<int> method_state_machine(); // or operator()()
> };
> Where method_state_machine can be compiled separately, in another
> translation unit.
> And actually this approach is already implementable with macros, to some
> extent (it works, but compiler-side transformation will give better result).

Where does this transformation happen translation-unit-wise, and how
would the method_state_machine get compiled in a different translation
unit? What type does the caller of the previous concrete_coroutine() see?

>> In a stackless coroutine,
>> the erased type combined with elision of the erasure and allocations
>> avoids
>> having all coroutines have a different type.
> Yes, but it is easy to get erased type from concrete when needed.
> For instance different lambdas have different types, but can be easily
> placed into std::function (if has appropriate signature).

For some values of "easily". For the many users who don't care about the
underlying type of the coroutine, it's not so easy when they have to wrap every
time they use a coroutine.

Evgeny Panasyuk

unread,
Oct 14, 2015, 5:46:34 PM10/14/15
to ISO C++ Standard - Future Proposals
15 October 2015 г., 0:18:21 UTC+3 Ville Voutilainen:
On 15 October 2015 at 00:09, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
>  14 October 2015 г., 10:42:53 UTC+3 Ville Voutilainen :
>>
>>
>> As far as having the concrete type goes, that sounds like it requires even
>> more inlining and across-call-stack transparency.
>
>
> It actually requires less inlining and transparency. For instance, this
> code:
> future<int> concrete_coroutine()
> {
>     int local = await async_operation();
>     return local;
> }
> Can be straightforwardly transformed to something like:
> struct concrete_coroutine
> {
>     state_value_type current_state;
>     int local;
>
>     future<int> method_state_machine(); // or operator()()
> };
> Where method_state_machine can be compiled separately, in another
> translation unit.
> And actually this approach is already implementable with macros, to some
> extent (it works, but compiler-side transformation will give better result).

Where does this transformation happen translation-unit-wise, and how
would the method_state_machine get compiled in a different translation
unit?

This is a good point. Looks like such transformation should happen in each translation unit which uses it - in order to deduce size of structure (maybe not full code generation, just analysis of locals). Method itself can be compiled only in one of translation units using mechanism similar to extern and explicit instantiation.
But my point still holds, this method can be not inlined (in optimizer sense) and still produce zero allocations.
 
What type does the caller of the previous concrete_coroutine() see?


What do you mean? Which "previous"?
 
>> In a stackless coroutine,
>> the erased type combined with elision of the erasure and allocations
>> avoids
>> having all coroutines have a different type.
> Yes, but it is easy to get erased type from concrete when needed.
> For instance different lambdas have different types, but can be easily
> placed into std::function (if has appropriate signature).

For some values of "easily". For the many users who don't care about the
underlying type of the coroutine, it's not so easy when they have to wrap every
time they use a coroutine.

It can be done even without explicit wrapping, but just relying on different coroutine_traits specializations. One trait may give concrete coroutine type, and another can erase concrete and give erased type to user.
For instance:
concrete_generator<int> cg1()
{
   
yield 1;
}

concrete_generator<int> cg2()
{
   
yield 2;
}

Here cg1 and cg2 are different types.

But here:
type_erased_generator<int> teg1()
{
   
yield 1;
}

type_erased_generator<int> teg2()
{
   
yield 2;
}
teg1 and teg2 would have same type.

User do not have to wrap manually cg1 and cg2 (but he can do this also) - instead he may use type_erased_generator from the start, and it will do type erasure itself via coroutine_traits mechanism.

Ville Voutilainen

unread,
Oct 14, 2015, 5:53:41 PM10/14/15
to ISO C++ Standard - Future Proposals
On 15 October 2015 at 00:46, Evgeny Panasyuk <evgeny....@gmail.com> wrote:
>> What type does the caller of the previous concrete_coroutine() see?
> What do you mean? Which "previous"?

You described how

future<int> concrete_coroutine()

is supposedly transformed. I don't know what that transformation does
from the point of view of the caller of concrete_coroutine.

Evgeny Panasyuk

unread,
Oct 14, 2015, 6:32:26 PM10/14/15
to ISO C++ Standard - Future Proposals
15 October 2015 г., 0:53:41 UTC+3  Ville Voutilainen:

At low level it gives type of coroutine with several methods like resume and is_terminated. And can be used like:
void test()
{
    concrete_coroutine coro
{};
    future
<int> f = coro.resume(); // or coro()
}

_______________
Another example:
concrete_low_level_generator<int> positive_numbers(int N)
{
   
for(int x=1; x<=N; ++x)
       
yield x;
}

void test()
{
    positive_numbers xs
{100};
   
while(xs.resume())
       
print(xs.current_value());
}
This example is very similar to one described at page 14 of p0057r0.
Coroutine traits may provide higher level abstractions on top of this, like give type which behaves like a range:
concrete_high_level_generator<int> positive_numbers(unsigned N)
{
   
for(int x=1; x<=N; ++x)
       
yield x;
}

void test()
{
   
for(auto x : positive_numbers{100});
       
print(x);
}



 

Evgeny Panasyuk

unread,
Oct 14, 2015, 8:22:00 PM10/14/15
to std-pr...@isocpp.org
14.10.2015 16:47, Nicol Bolas:

> I already described it several times, just put generator into some
> structure/array or return somewhere.
> In similar situation:
> http://coliru.stacked-crooked.com/a/0c09744abd5e57ae
> <http://coliru.stacked-crooked.com/a/0c09744abd5e57ae>
> - allocation of P0057 generator will not be elided, there will be N
> allocations, i.e. for each coroutine.
>
>
> And in your case, the `Coroutine` will have to type-erase them too. Thus
> performing N allocations.

No need for type-erasure - no need for N allocations.

>
> Put it another way. In order to make something a member of a struct, you
> must first be able to name it. In C++ as it currently stands, it is
> /impossible/ to store an unnamable type in a non-static data member.
> Whether it's a lambda or the result of a resumable expression or
> anything else, it simply /cannot happen/.


Again, I am not talking specifically about P0114r0. I am talking about
at least adding possibility to have concrete types in P0057R0-like proposal.

For instance, here:

generator<int> numbers()
{
yield 1;
}

we can use "numbers" as a name for synthesized class (which represents
concrete coroutine), instead of name for synthesized function.

>
> What you're suggesting is impossible. Or at least, it's impossible
> without trickery (ie: macros).

It is possible without any trickery. It is just transformation of
function-like code into class which has name of that function-like entity.

>
> So Boost.Asio is either doing type erasure or it is cheating. Any core
> feature will not be allowed to cheat, so you'll /have/ to use type
> erasure to store the result of such an operation.

There is no cheating and type erasure. We just can use name given by user.

>
> The only way you could avoid a dynamic allocation while still leaving
> the lifetime of the calling function is if you could copy/move the
> coroutine type. And that's just not reasonable.

First of all, I would like to have copy and move semantics, at least
some explicit control for it.

Anyway, even if we would not have copy and move semantics, yes - there
will be some allocation when needed to leave lifetime of calling
function, but there will be less allocations than in proposed P0057R0.

For instance make_unique<concrete_coroutine[]>(N) is just one
allocation, instead of N+1.

Another example is

struct Widget { concrete_coroutine x; };
make_unique<Widget>()

This is also one allocation, while P0057R0 would result in two
allocations - one for Widget itself and another for coroutine.

german...@hubblehome.com

unread,
Oct 15, 2015, 11:41:54 PM10/15/15
to ISO C++ Standard - Future Proposals


First of all, I would like to have copy and move semantics, at least
some explicit control for it.

+1
 

Nicol Bolas

unread,
Oct 16, 2015, 11:52:28 AM10/16/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 8:22:00 PM UTC-4, Evgeny Panasyuk wrote:
14.10.2015 16:47, Nicol Bolas:

>     I already described it several times, just put generator into some
>     structure/array or return somewhere.
>     In similar situation:
>     http://coliru.stacked-crooked.com/a/0c09744abd5e57ae
>     <http://coliru.stacked-crooked.com/a/0c09744abd5e57ae>
>     - allocation of P0057 generator will not be elided, there will be N
>     allocations, i.e. for each coroutine.
>
>
> And in your case, the `Coroutine` will have to type-erase them too. Thus
> performing N allocations.

No need for type-erasure - no need for N allocations.

>
> Put it another way. In order to make something a member of a struct, you
> must first be able to name it. In C++ as it currently stands, it is
> /impossible/ to store an unnamable type in a non-static data member.
> Whether it's a lambda or the result of a resumable expression or
> anything else, it simply /cannot happen/.


Again, I am not talking specifically about P0114r0. I am talking about
at least adding possibility to have concrete types in P0057R0-like proposal.

Well, it's hard to gauge how reasonable a proposal is when said proposal doesn't actually exist. You don't have a proposal; you just have some general notions of how you think it ought to act, with no demonstrated knowledge of how feasible that will be to implement.

And no, Boost.Asio's macro hacks are not a feasibility study.

For an example of the feasibility issue, let's use your example:
 
generator<int> numbers()
{
    yield 1;
}

we can use "numbers" as a name for synthesized class (which represents
concrete coroutine), instead of name for synthesized function.

No, you cannot. Why? Well:

generator<int> numbers();

static_assert(std::is_function_v<decltype(numbers)>);

This assert should never fire. Yet you want to make it fire.

That's breaking basic rules of C++: a function declaration should be a function declaration, not a struct declaration. Even lambdas don't look like non-lambda functions.

So let's skip past that obviously non-functional idea. Let's say that you allow users to decorate a function definition. Maybe you even use lambda syntax, since it is similar:

[]numbers() -> generator<int>;

OK, so the compiler sees this and knows that `numbers` is a struct.

How big is it?

The compiler doesn't know. The compiler cannot know. Not from the information presented here. What `numbers` is here is an incomplete type.

The only way to generate a complete type is to complete the function definition of `numbers`. That way, the alignment and storage of the stack data is available.

And that means that the function must be inline. Not only that, you can't have virtual functions use this at all.

So all you've done is re-invent P0114 with slightly different syntax. For someone who keeps claiming that their idea isn't P0114, it seems to have a lot of P0114's restrictions.

The beauty of P0057's design is that it works with C++ as it currently exists. It changes the bare minimum needed to make the feature work. It doesn't require that resumable functions are inlined or anything like that. It doesn't make function declarations automatically become struct declarations.

All of the work for resumable functions happens within the function that is resumable, and external code is none-the-wiser. I can manipulate a resumable function as normal, I can stick one in a std::function, I can make it virtual, non-inline, anything. It's just a normal function.

What, did you think P0057 made the decision to type-erase coroutines on a whim? It is there specifically to avoid all of these elements. That design decision is what allows P0057 to work generally. No forced inlining. Virtual calls are allowed. Resumable functions look and behave just like any other functions.

Nicol Bolas

unread,
Oct 16, 2015, 12:26:18 PM10/16/15
to ISO C++ Standard - Future Proposals
On Wednesday, October 14, 2015 at 4:14:31 PM UTC-4, Evgeny Panasyuk wrote:
14 October 2015 г., 22:35:49 UTC+3 Nicol Bolas:
With concrete coroutine type it could be something like this:


template<template<typename> class coroutine_value>
struct generator
{
     
...
     coroutine_value
<promise_type> coro;
};

generator example_generator
()
{
   
yield 1;
}
// example_generator is transformed to:
using example_generator = generator< synthesized_coroutine >;

int main()
{
    example_generator x
{};
    x
.move_next();
    g
.current_value();
}


Um, what does that code mean? Where does `synthesized_coroutine` come from?

"using" part is done by compiler, synthesized_coroutine comes from compiler.
 
How does `example_generator` get defined twice?

It is not defined twice. First one is what user writes, second one ("using" part) is what compiler does for this code. In essence user code is transformed into type with name example_generator.
 
And how does `example_generator` return a template that has no template arguments?


If you don't like it,

It's not a question of what I like. It is simply not possible in C++. You cannot return a template; you can only return a concrete type. This may be a specific instantiation of a template, but you cannot return a template itself.

What you wrote is syntactic nonsense. Therefore, if you want it to stop being syntactic nonsense, your proposal will need to define what it means.

That's fine, but it is another feature added to your idea. Hence the whole "plethora of subsidiary features" I was talking about. Every time someone points out a problem with making coroutines concrete types, you resolve it by adding another feature to the language. I remind you that it's impossible to return a template, so you then define how being a coroutine makes the previously  impossible possible.

That's a new feature.

The nice thing about P0057 is that it doesn't have very many new features. It pretty much stops at function suspend/resume and internally-generated promise types. The return type of a coroutine is no different from any other type. The awaiter type, even the promise type are all types using the C++ rules for types.

Your proposal seems to require a lot of special-case handling at the type level.

it is possible to return type with template inside. For instance
struct generator
{
   
template<template<typename> class coroutine_value>
   
struct apply { ... };
};


OK, so... what does that do? Does `generator` store the coroutine? That seems more or less impossible, since you have the same problem: a template parameter for the return type getting filled in by the function returning it.

Somewhere, there's a variable who's type, an instantiation of a template, has one of its template arguments get filled in by the compiler. That's a very new thing that's unlike normal C++ code.

A nice thing about resumable functions is that it doesn't take a sledgehammer to basic elements of the language. If a coroutine function returns a type, it returns that type, and the return value has all the rights and behaviors of a return value from a regular function.


It is not truly return type. Even P0057 does not have true return type, you can't return value of that type from body - it just mimics normal function syntax, but it is not normal function at all, it is just synthetic language construction.

OK, internally it may only "mimic normal function syntax", but by design, the "mimicry" is complete. It looks and behaves no different from any other function. There is absolutely no way to tell the difference between it and non-coroutine functions.

Your proposal exposes everyone to the deep guts of working with a coroutine. All just for some minor performance gain that in most situations compilers can optimize out. And even when they can't, you can optimize them out with a decent allocator.

Even if we accept that this is a good way to make a proposal... it's not a proposal yet. It's just some ideas being batted around on a forum. None of the various coroutine proposals do anything like what you've suggested. Why should we halt or delay progress on P0057 because you think you might be able to do better?


If authors of P0057 still would insist on design with intrinsic overhead and high burden on optimizers, then you are right - probably viable path is to make another proposal.

Passive-aggressiveness does not prove your point. For example, you have yet to prove that the "burden on optimizers" is "high" by some definition of that word. It's merely "non-zero".

Just like the burden on optimizers for dealing with template code, inlining, and the like.

Personally I would prefer to get fast coroutines in like 2020, then to get some coroutines in 2017.

Considering that you haven't proven that P0057 is particularly slow, I remain yet unconvinced that the performance gains you claim are necessary to be "fast" are actually worth those 3 years.

Germán Diago

unread,
Oct 17, 2015, 4:52:14 AM10/17/15
to std-pr...@isocpp.org


> Passive-aggressiveness does not prove your point. For example, you have yet to prove that the "burden on optimizers" is "high" by some definition of that word. It's merely "non-zero".

I think there is no need to, well, insult anyone for having a different view insinuating he is being too agressive. I think it is good to have discussion.

That said, one of the principles of C++ is the zero-overhead principle. Not the "little overhead" principle. Why? Because c++ is for max. performance, and if u do something suboptimal by design, people are going to invent another solution.

Herb Sutter defined this zero overhead as nothing between c++ that is not assembly. I agrew with that. P
The only library that does have a design I dnt like, and u mentioned before, is iostreams. No lib ever followed iostream path since then. We have templated non-inheritance components that are generic mostly.

Erasure does have costs and there are alternatives. Chris paper mentions about the inherent erasure overhead. Why it is mentioned if it is not that important... Erasure cannot be controlled in all scenarios once it is embedded into the design. That is something that is simply true. So the question here should be if we can have a design with inherently minimal overhead, not if you sympathize with one solution or another only.

Evgeny Panasyuk

unread,
Oct 17, 2015, 7:14:40 AM10/17/15
to ISO C++ Standard - Future Proposals
16 October 2015 г., 18:52:28 UTC+3 Nicol Bolas:

Again, I am not talking specifically about P0114r0. I am talking about
at least adding possibility to have concrete types in P0057R0-like proposal.

Well, it's hard to gauge how reasonable a proposal is when said proposal doesn't actually exist. You don't have a proposal; you just have some general notions of how you think it ought to act, with no demonstrated knowledge of how feasible that will be to implement.

Having just some feature proposal and some implementation is not enough reason to standardize. Proposed feature must also fit main design goals of language.
I am showing major flow in existing proposal. Do you think it should not be discussed?
I agree that the best way is to make full-featured proposal, but first I want to discuss it here.
 

And no, Boost.Asio's macro hacks are not a feasibility study.


These hacks allow us to investigate possible paths with low efforts.
 

That's breaking basic rules of C++: a function declaration should be a function declaration, not a struct declaration. Even lambdas don't look like non-lambda functions.

I agree, if it would be not function - it should not pretend to be a function.
 
OK, so the compiler sees this and knows that `numbers` is a struct.

How big is it?

The compiler doesn't know. The compiler cannot know. Not from the information presented here. What `numbers` is here is an incomplete type.

The only way to generate a complete type is to complete the function definition of `numbers`. That way, the alignment and storage of the stack data is available.

And that means that the function must be inline.

Yes, if someone want coroutine with concrete type - it's body should be visible to compiler at point of usage. If type-erasure is OK - then body can be hidden it completely. And this is orthogonal to syntax issues.
 

So all you've done is re-invent P0114 with slightly different syntax. For someone who keeps claiming that their idea isn't P0114, it seems to have a lot of P0114's restrictions.


P0114 sets much more ambitious goal, it tries to merge/fuse several stack frames into one coroutine.
 

The beauty of P0057's design is that it works with C++ as it currently exists. It changes the bare minimum needed to make the feature work. It doesn't require that resumable functions are inlined or anything like that. It doesn't make function declarations automatically become struct declarations.

P0057 can be changed to allow concrete coroutine types with very small syntax modifications. For instance:
// Type erasure version:
generator
<int> numbers(int x)
{
   
yield x;
   
...
}
// Compiler uses std::coroutine_triats<generator<int>, int> (as it is proposed in P0057)
/**************************/

// Version with concrete coroutine type:

auto numbers(int x, concrete_generator_tag = 0)
{
   
yield x;
   
...
}
// Compiler uses coroutine_triats<auto_result_tag, int,
concrete_generator_tag> , and based on this specialization it will generate return type and value.
// Type of coroutine is
decltype(numbers(1)), i.e.:
decltype(numbers(1)) coro = numbers(1);

This approach is even closer to P0057.
 

What, did you think P0057 made the decision to type-erase coroutines on a whim? It is there specifically to avoid all of these elements. That design decision is what allows P0057 to work generally. No forced inlining. Virtual calls are allowed. Resumable functions look and behave just like any other functions.

Type-erasure can be optional, it should not be the only way.

One of main use cases for coroutines are generators, even authors of P0057 refer frequently this use-case. Many languages which has some kind of stackless coroutines start with support of this uses case, like Python/C# yield.

While mandatory type-erasure can be tolerated in cases like async I/O, but for generators it adds huge overhead, which can't be tolerated for C++.

Evgeny Panasyuk

unread,
Oct 17, 2015, 7:35:35 AM10/17/15
to ISO C++ Standard - Future Proposals
16 октября 2015 г., 19:26:18 UTC+3 Nicol Bolas :

It's not a question of what I like. It is simply not possible in C++. You cannot return a template; you can only return a concrete type. This may be a specific instantiation of a template, but you cannot return a template itself.

You can return concrete type which has template inside.
 

What you wrote is syntactic nonsense. Therefore, if you want it to stop being syntactic nonsense, your proposal will need to define what it means.

Again, it is not ready proposal.
At first I am pointing to flaws of existing proposal, and want to discuss it.
 

Your proposal seems to require a lot of special-case handling at the type level.

No, it does not require much of special-case handling.
 

it is possible to return type with template inside. For instance
struct generator
{
   
template<template<typename> class coroutine_value>
   
struct apply { ... };
};


OK, so... what does that do? Does `generator` store the coroutine?

It describes how to create result type.
 

OK, internally it may only "mimic normal function syntax", but by design, the "mimicry" is complete. It looks and behaves no different from any other function. There is absolutely no way to tell the difference between it and non-coroutine functions.

This is possible (if it is required to mimic function syntax) for concrete coroutines. We already have "auto" return types on functions - this mechanism can be used for concrete coroutines. Check my previous message.
 
Your proposal exposes everyone to the deep guts of working with a coroutine. All just for some minor performance gain that in most situations compilers can optimize out.

It is not a minor difference. One allocation and virtual/indirect calls for resumption is a huge overhead for things like generators.
 
And even when they can't, you can optimize them out with a decent allocator.

First of all, I don't want to use custom allocators for simple things like generators.
Second, even custom allocator is not zero-overhead - at least it must check size, because it is not known at compile time.
 

Even if we accept that this is a good way to make a proposal... it's not a proposal yet. It's just some ideas being batted around on a forum. None of the various coroutine proposals do anything like what you've suggested. Why should we halt or delay progress on P0057 because you think you might be able to do better?


If authors of P0057 still would insist on design with intrinsic overhead and high burden on optimizers, then you are right - probably viable path is to make another proposal.

Passive-aggressiveness does not prove your point.

Constantly calling things "nonsense" things you do not like - does not prove your point either.
 
For example, you have yet to prove that the "burden on optimizers" is "high" by some definition of that word. It's merely "non-zero".

Just like the burden on optimizers for dealing with template code, inlining, and the like.

Well, I agree. I think that I should make detailed report on this issue. Showing concrete aspects of overhead, showing what compilers can do today, etc.
 

Personally I would prefer to get fast coroutines in like 2020, then to get some coroutines in 2017.

Considering that you haven't proven that P0057 is particularly slow, I remain yet unconvinced that the performance gains you claim are necessary to be "fast" are actually worth those 3 years.

 I already made some tests, checked ASM for both versions, etc. I should do more tests and then create some kind of report based on this.
It is loading more messages.
0 new messages