Why can't I understand what coroutines are?

Juha Nieminen

unread,

Aug 4, 2019, 5:37:49 AM8/4/19

to

C++20 will introduce support for coroutines. To this day, for reasons
that are completely mysterious even to me, I have *absolutely no idea*
what they are. I just don't get it.

I go to the Wikipedia page "Coroutine"... and it tells me nothing.

"Coroutines are computer program components that generalize subroutines
for non-preemptive multitasking, by allowing execution to be suspended
and resumed. Coroutines are well-suited for implementing familiar program
components such as cooperative tasks, exceptions, event loops, iterators,
infinite lists and pipes."

This tells me nothing. I don't get it.

"By contrast, coroutines can exit by calling other coroutines, which may
later return to the point where they were invoked in the original
coroutine; from the coroutine's point of view, it is not exiting but
calling another coroutine."

Can exit by calling other coroutines... And then the execution later
resumes from that calling point. This is not "exiting" but "yielding",
whatever that's supposed to mean.

Sounds like a normal function call to me, but what do I know? I can't
understand what this is saying. It's like it's speaking in legible
English, yet it somehow is complete gibberish at the same time. I don't
get it.

It then gives a pseudocode example of coroutines... that look exactly
like a function calling another (which then just returns to the
original caller).

==========================================
var q := new queue

coroutine produce
loop
while q is not full
create some new items
add the items to q
yield to consume

coroutine consume
loop
while q is not empty
remove some items from q
use the items
yield to produce
==========================================

This looks to me exactly like

==========================================
var q := new queue

coroutine produce
loop
while q is not full
create some new items
add the items to q
call consume

coroutine consume
while q is not empty
remove some items from q
use the items
return
==========================================

But what do I know? I don't get it.

Ok, Wikipedia is just a generic article about coroutines. Maybe
https://en.cppreference.com/w/cpp/language/coroutines has a better
explanation. It's done from the perspective of C++ in particular,
after all.

"A coroutine is a function that can suspend execution to be resumed
later. Coroutines are stackless: they suspend execution by returning
to the caller. This allows for sequential code that executes
asynchronously (e.g. to handle non-blocking I/O without explicit
callbacks), and also supports algorithms on lazy-computed infinite
sequences and other uses."

A function that can suspend execution to be resumed later...
Yeah, I don't get it. Allows sequential code that executes
asynchronously... What? Supports algorithms on lazy-computed infinite
sequences... Ok, but what does this all mean? I don't get it.

The examples given are complete gibberish. I have *absolutely no
idea* what something like this is supposed to mean:

co_await async_write(socket, buffer(data, n));

co_await? What?

It seems that everybody else is awaiting (hah!) and praising coroutines
like the second coming of Christ. Yet I just can't understand what they
are, or how exactly they are used or useful. All web pages I can find
are legible English that still somehow manages to be completely meaningless
gibberish that tells me nothing.

Chris M. Thomasson

unread,

Aug 4, 2019, 5:58:46 AM8/4/19

to

On 8/4/2019 2:37 AM, Juha Nieminen wrote:
> C++20 will introduce support for coroutines. To this day, for reasons
> that are completely mysterious even to me, I have *absolutely no idea*
> what they are. I just don't get it.

[...]

They are like Fibers on Windows:

https://docs.microsoft.com/en-us/windows/win32/procthread/fibers

http://man7.org/linux/man-pages/man3/getcontext.3.html

https://docs.microsoft.com/en-us/windows/win32/procthread/user-mode-scheduling

Louis Krupp

unread,

Aug 4, 2019, 6:58:20 AM8/4/19

to

On Sun, 4 Aug 2019 09:37:39 -0000 (UTC), Juha Nieminen
<nos...@thanks.invalid> wrote:

>C++20 will introduce support for coroutines. To this day, for reasons
>that are completely mysterious even to me, I have *absolutely no idea*
>what they are. I just don't get it.

<snip>

It might help to look at coroutines in another language, say Lua.

A main program and a coroutine cooperate like this:

The main program creates the coroutine object, associating it with a
function to be executed. At some point, the main program hands control
to the coroutine, possibly passing it arguments, and then waits for
the coroutine to either yield (or return) and possibly pass back
results. When the main program gets around to it, it resumes the
coroutine and waits again for it to yield or return.

The coroutine wakes up when it's told, possibly accepting arguments,
and then does whatever it wants until it decides it's time to either
yield control back to the main program or return, possibly passing
results back to the main program. A coroutine that yields waits until
it's told to resume by the main program; a coroutine that returns
simply goes away and cannot be told to resume.

Here's a simple example of a program that does a couple of trivial
computations and then quits:

===

-- Function to run in coroutine
--
function cf(n)
print("coroutine beginning n = ", n)
k = n * n
print("coroutine yielding, returning ", k)
coroutine.yield(k)

print("coroutine resuming")
k = k * n
print("coroutine yielding, returning", k)
coroutine.yield(k)

print("coroutine resuming")
k = -1
print("coroutine all done, returning", k)
return(k)
end

-- Main program
--
print("main creating coroutine")
co = coroutine.create(cf)

print("main telling coroutine to resume (actually, to start)")
print("main waiting for coroutine to yield or return")
status, result = coroutine.resume(co, 7)
print("main result from coroutine = ", result)

print("main telling coroutine to resume")
print("main waiting for coroutine to yield or return")
status, result = coroutine.resume(co)
print("main result from coroutine = ", result)

print("main telling coroutine to resume")
print("main waiting for coroutine to yield or return")
status, result = coroutine.resume(co)
print("main result from coroutine = ", result)

===

The output:

main creating coroutine
main telling coroutine to resume (actually, to start)
main waiting for coroutine to yield or return
coroutine beginning n = 7
coroutine yielding, returning 49
main result from coroutine = 49
main telling coroutine to resume
main waiting for coroutine to yield or return
coroutine resuming
coroutine yielding, returning 343
main result from coroutine = 343
main telling coroutine to resume
main waiting for coroutine to yield or return
coroutine resuming
coroutine all done, returning -1
main result from coroutine = -1

I hope this helps a little.

Louis

Chris Vine

unread,

Aug 4, 2019, 9:09:20 AM8/4/19

to

On Sun, 4 Aug 2019 09:37:39 -0000 (UTC)

Juha Nieminen <nos...@thanks.invalid> wrote:
[snip]

> A function that can suspend execution to be resumed later...
> Yeah, I don't get it. Allows sequential code that executes
> asynchronously... What? Supports algorithms on lazy-computed infinite
> sequences... Ok, but what does this all mean? I don't get it.
>
> The examples given are complete gibberish. I have *absolutely no
> idea* what something like this is supposed to mean:
>
> co_await async_write(socket, buffer(data, n));
>
> co_await? What?
>
> It seems that everybody else is awaiting (hah!) and praising coroutines
> like the second coming of Christ. Yet I just can't understand what they
> are, or how exactly they are used or useful. All web pages I can find
> are legible English that still somehow manages to be completely meaningless
> gibberish that tells me nothing.

You have summarized coroutines quite well in your posting, and I
shouldn't get too hung up on one particular implementation (the one
which may be included in C++20).

The easiest coroutines to understand, and the ones used to implement
"await" semantics on asynchronous operations, are asymmetrical
coroutines implemented using delimited continuations. These normally
have only two control-flow operations, namely a "yield" or "suspend"
operation, and a "resume" operation. When a function executes a
"yield" or "suspend" operation, it returns control to its caller,
somewhat like a return statement. However, unlike a return statement,
a "resume" operation on the coroutine in question will return program
execution to the point at which the yield previously occurred.

Asynchronous programming with callbacks normally works so that when a
i/o operation might block because the resource in question (read or
write) is unavailable, instead of blocking the function returns
immediately and registers a callback with the program's event loop
(usually based on select() or poll()), representing the i/o operation's
continuation when the resource becomes available. Programming this way
using callbacks in the style of, say, Node.js is troublesome because it
gives rise to what is sometimes called inversion of control, aka
"callback hell". In particular, control flow ceases to appear as a
sequential series of i/o operations which can easily be reasoned about.

Coroutines can offer a way out because when an i/o resource is
unavailable the i/o operation concerned can (i) register a callback
with the event loop which resumes the operation when the resource
becomes available, and then (ii) suspends to the caller (which
indirectly is the event loop). When the resource becomes available the
event loop resumes the coroutine at the point of suspension, so
"tricking" the control flow so that it appears to the coder to be
sequential (like blocking i/o) even though it is in fact executing
asynchronously via an event loop.

The scheme language has first-class continuations built into the
language which makes implementing coroutines relatively trivial. Here
is one library which uses asymmetrical coroutines for the purposes
mentioned above which may or may not help:
https://github.com/ChrisVine/chez-a-sync/wiki .
ECMAscript and python also now have somewhat more limited asymmetrical
coroutines.

Sam

unread,

Aug 4, 2019, 9:17:43 AM8/4/19

to

Juha Nieminen writes:

> This tells me nothing. I don't get it.

You are not alone. I had the same reaction.

After reading all the available documentation on it, I believe that
coroutines are a solution in search of a problem.

Sam

unread,

Aug 4, 2019, 9:18:37 AM8/4/19

to

Chris M. Thomasson writes:

> On 8/4/2019 2:37 AM, Juha Nieminen wrote:
>> C++20 will introduce support for coroutines. To this day, for reasons
>> that are completely mysterious even to me, I have *absolutely no idea*
>> what they are. I just don't get it.
> [...]
>
> They are like Fibers on Windows:

That's because Windows always sucked at multithreading.

Sam

unread,

Aug 4, 2019, 9:24:01 AM8/4/19

to

Chris Vine writes:

> Asynchronous programming with callbacks normally works so that when a
> i/o operation might block because the resource in question (read or
> write) is unavailable, instead of blocking the function returns
> immediately and registers a callback with the program's event loop
> (usually based on select() or poll()), representing the i/o operation's
> continuation when the resource becomes available. Programming this way
> using callbacks in the style of, say, Node.js is troublesome because it
> gives rise to what is sometimes called inversion of control, aka
> "callback hell". In particular, control flow ceases to appear as a
> sequential series of i/o operations which can easily be reasoned about.

This problem was solved a long time ago. It is called "threads".

The funniest thing about this is that Gnome is most extreme case of this.
The entire Gnome stack is just one event dispatching main loop. Everything
is callbacks. But since Gnome is C code, none of this will help it, that's
the funny part.

Chris Vine

unread,

Aug 4, 2019, 9:37:11 AM8/4/19

to

Threads are very definitely not the answer. Threads work well with
cpu-bound workloads. They are hopeless with i/o-bound workloads, which
is why co-operative (possibly single-threaded) multi-tasking using
coroutines has become popular.

If you use native OS threads for i/o bound workloads, you are doing it
wrong.

Sam

unread,

Aug 4, 2019, 10:31:14 AM8/4/19

to

So instead of simply spawning off a separate thread whose only job is a
laughably simple loop that wait for I/O to happen, and that do something
simple with it, like saving some data that's read from the socket into a
file, it's much easier to build another rube-goldbergian event-loop based
contraption, and the entire application now must be dragged, kicking and
screaming, into rebuilding under that framework? That's the easier approach?

Coroutines are not going to be better at this, either.

> If you use native OS threads for i/o bound workloads, you are doing it
> wrong.

Nope. One only needs to understand that I/O bound workloads is not the only
reason that threads exist.

Robert Wessel

unread,

Aug 4, 2019, 11:12:41 AM8/4/19

to

On Sun, 4 Aug 2019 09:37:39 -0000 (UTC), Juha Nieminen
<nos...@thanks.invalid> wrote:

The problem is that this is a bit oversimplified:

var q := new queue

coroutine produce
loop
while q is not full
create some new items
add the items to q
yield to consume

coroutine consume
loop
while q is not empty
remove some items from q
use the items
yield to produce

It's not wrong, but rather it's been reduced to so simple a form that
coroutines are not a great help. That's one of the classic "problems"
with coroutines - there's not a trivial example of why you'd want
them.

Consider instead:
var z := ...

coroutine produce
loop
...code...
loop
...code...
create z
yield to consume
...code...
endloop
...code...
endloop

coroutine consume
loop
...code...
loop
...code...
yield to produce
process z
...code...
endloop
...code...
endloop

Now try unwinding those into a simple subroutine call.

The point is that the producer can generate the output at any
convenient point (that's not different from having the producer call
the consumer at a point convenient for the producer), but also the
*consumer* can acquire the item at a point convenient for the
*consumer*.

Consider a program that processes an input stream - it would normally
call getc() (or whatever) at various places, probably several nested
subroutines and/or loops deep. Consider how much more convoluted the
code would be if instead the OS called main() with every character
(and yes, that's the model many event driven GUIs use).

The point of a coroutine is that it can "suspend" at some arbitrary
point in its logic (say a dozens subroutine calls and loops deep, plus
a bunch of state from local variables), and then continue from that
point (still those dozen subroutine calls and loops deep, with state
intact), rather than at the beginning of the routine again.

Chris Vine

unread,

Aug 4, 2019, 11:43:16 AM8/4/19

to

On Sun, 04 Aug 2019 10:31:02 -0400
Sam <s...@email-scan.com> wrote:
> Chris Vine writes:
>
> > On Sun, 04 Aug 2019 09:23:52 -0400
> > Sam <s...@email-scan.com> wrote:
> > >
> > > The funniest thing about this is that Gnome is most extreme case of this.
> > > The entire Gnome stack is just one event dispatching main loop. Everything
> > > is callbacks. But since Gnome is C code, none of this will help it, that's
> > > the funny part.
> >
> > Threads are very definitely not the answer. Threads work well with
> > cpu-bound workloads. They are hopeless with i/o-bound workloads, which
> > is why co-operative (possibly single-threaded) multi-tasking using
> > coroutines has become popular.
>
> So instead of simply spawning off a separate thread whose only job is a
> laughably simple loop that wait for I/O to happen, and that do something
> simple with it, like saving some data that's read from the socket into a
> file, it's much easier to build another rube-goldbergian event-loop based
> contraption, and the entire application now must be dragged, kicking and
> screaming, into rebuilding under that framework? That's the easier approach?

You have it wrong.

For toy programs or for cases where there are only a few concurrent i/o
operations running at any one time then native threads are fine. But
using native OS threads with blocking i/o on i/o-bound workloads does
not scale. No one writes a server these days with one thread per
connection, which would be required for your blocking i/o: instead you
multiplex connections with poll/epoll/kqueue and tune the number of
threads to the workload and the number of cores available. You can have
millions of C++20-style stackless coroutines running at any one time
on a thread, although you are going to run out of file descriptors
before any such limit is reached (I see that my relatively modest 8 core
machine has a hard maximum of 805038 open descriptors although other
things would probably break before then).

> Coroutines are not going to be better at this, either.
>
> > If you use native OS threads for i/o bound workloads, you are doing it
> > wrong.
>
> Nope. One only needs to understand that I/O bound workloads is not the only
> reason that threads exist.

Clueless.

Mr Flibble

unread,

Aug 4, 2019, 12:57:30 PM8/4/19

to

This is why coroutines should be banned: the average programmer simply
doesn't understand them. This state of affairs is probably partly due to
the pervasive understanding that the abstract machine has a stack. C++ is
complicated enough.

/Flibble

--
"Snakes didn't evolve, instead talking snakes with legs changed into
snakes." - Rick C. Hodgin

“You won’t burn in hell. But be nice anyway.” – Ricky Gervais

“I see Atheists are fighting and killing each other again, over who
doesn’t believe in any God the most. Oh, no..wait.. that never happens.” –
Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a
world that is so full of injustice and pain. That's what I would say."

Chris M. Thomasson

unread,

Aug 4, 2019, 2:16:08 PM8/4/19

to

lol. ;^)

Chris M. Thomasson

unread,

Aug 4, 2019, 2:21:01 PM8/4/19

to

Threads can work fairly well with IO, take a look at IOCP:

https://docs.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports

Chris Vine

unread,

Aug 4, 2019, 3:31:03 PM8/4/19

to

On Sun, 4 Aug 2019 11:20:51 -0700
"Chris M. Thomasson" <invalid_chris_t...@invalid.com>
wrote:

I don't know microsoft's implementations, but as far as I can make out
the i/o is still done asynchronously (the file operations themselves do
not block) but when an asynchronous operation completes that completion
is handled by the completion port. Although threads can wait on the
completion port for an i/o operation to reach completion, there appears
to be some form of multiplexing so that one thread can handle a number
of completions, and the documentation suggests that having a number of
threads equal to the number of cores (as opposed to the number of
connections) as a good starting point, which seems reasonable.

But I cannot say the documentation you refer to is that clear: is my
summary wrong (and if so, what's the point of completion ports in the
first place)? I find it hard to believe microsoft would have gone for
the one thread per connection option favoured by my respondent.

Scott Lurndal

unread,

Aug 4, 2019, 3:46:31 PM8/4/19

to

I disagree. So do the Open Data Plane (ODP) folks. Threads work very
well with I/O bound workloads when used properly. Last time I found a
coroutine useful was in PDP-11 assembler (JMS @(SP)+).

Sam

unread,

Aug 4, 2019, 4:44:03 PM8/4/19

to

Chris Vine writes:

>
> You have it wrong.
>
> For toy programs or for cases where there are only a few concurrent i/o
> operations running at any one time then native threads are fine. But
> using native OS threads with blocking i/o on i/o-bound workloads does
> not scale.

Says who?

> No one writes a server these days with one thread per
> connection, which would be required for your blocking i/o: instead you
> multiplex connections with poll/epoll/kqueue and tune the number of
> threads to the workload and the number of cores available.

Maybe you just don't know many people who write server code.

> You can have
> millions of C++20-style stackless coroutines running at any one time
> on a thread, although you are going to run out of file descriptors
> before any such limit is reached

Sure, and since coroutines introduce an additional level of complexity –
something also needs to be aware and keep track of which coroutine has
something that needs to be done, instead of efficiently encapsulating all
I/O handling logic in a simple, straightforward, execution thread; and no
other part of the code needs to give a shit – it still unclear what problem
they're trying to solve.

Modern operating systems have no problems scaling to thousands, and maybe
even tens of thousands of execution threads. Perhaps you don't have much
exposure to them, and are only exposed to technically flawed operating
system, with utterly crappy thread implementations?

> > > If you use native OS threads for i/o bound workloads, you are doing it
> > > wrong.
> >
> > Nope. One only needs to understand that I/O bound workloads is not the only
> > reason that threads exist.
>
> Clueless.

As opposed to someone who religiously believes that the only reason one
someone would use threads is in CPU-bound code?

Maybe you should talk to Chrome and Firefox folks, and teach them how to
write code, since you know so much about it, and explain how stupid they
are, creating dozens of threads even when most of their load is I/O bound.

Chris M. Thomasson

unread,

Aug 4, 2019, 4:58:26 PM8/4/19

to

On 8/4/2019 1:43 PM, Sam wrote:
> Chris Vine writes:
>
>>
>> You have it wrong.
>>
>> For toy programs or for cases where there are only a few concurrent i/o
>> operations running at any one time then native threads are fine. But
>> using native OS threads with blocking i/o on i/o-bound workloads does
>> not scale.
>
> Says who?
>
>> No one writes a server these days with one thread per
>> connection, which would be required for your blocking i/o: instead you
>> multiplex connections with poll/epoll/kqueue and tune the number of
>> threads to the workload and the number of cores available.
>
> Maybe you just don't know many people who write server code.

I remember using IOCP back on WinNT 4. The idea of a thread per
connection simply does not scale. However, creating a thread or two per
CPU, and using a queue, like IOCP can scale.

Chris M. Thomasson

unread,

Aug 4, 2019, 5:03:35 PM8/4/19

to

You treat each connection as a little state machine. This state has a
socket, file, whatever, along with a memory buffer, or a pointer into
some memory for io. It gets associated with the IOCP. Create several
worker threads, one or two per CPU, and set the concurrency level to the
number of CPUs. Now, worker threads wait on the IOCP for events. Each
event has the state of the connection. If a read event fires, the memory
buffer has the data, or there might be an error, whatever.

Actually, imvho, creating servers with IOCP is fairly straightforward.

Sam

unread,

Aug 4, 2019, 5:36:22 PM8/4/19

to

Chris M. Thomasson writes:

> On 8/4/2019 1:43 PM, Sam wrote:
>>> multiplex connections with poll/epoll/kqueue and tune the number of
>>> threads to the workload and the number of cores available.
>>
>> Maybe you just don't know many people who write server code.
>
> I remember using IOCP back on WinNT 4. The idea of a thread per connection
> simply does not scale. However, creating a thread or two per CPU, and using
> a queue, like IOCP can scale.

I agree that "a thread per connection simply does not scale" on Microsoft
Windows. However that's only true for Microsoft Windows, and its crap
multithreading.

A thread per connection scales perfectly fine, on Linux. Even with hundreds
of connections. Thread overhead is quite negligible on Linux, and you end up
writing clean, orderly code that runs logically from start to finish,
instead of getting torn into shreds in order to conform to event loop or
callback-based design patterns.

Mr Flibble

unread,

Aug 4, 2019, 5:43:57 PM8/4/19

to

Nonsense. You should have no more than one logical thread per physical
thread for particular task type.

Chris Vine

unread,

Aug 4, 2019, 5:53:24 PM8/4/19

to

I disagree. And "hundreds of connections" is not good enough.

Mr Flibble

unread,

Aug 4, 2019, 6:10:21 PM8/4/19

to

You are correct to disagree. Sam has obviously never heard of "epoll".

Alf P. Steinbach

unread,

Aug 4, 2019, 7:11:29 PM8/4/19

to

On 04.08.2019 11:37, Juha Nieminen wrote:
> C++20 will introduce support for coroutines. To this day, for reasons
> that are completely mysterious even to me, I have *absolutely no idea*
> what they are. I just don't get it.
>
> I go to the Wikipedia page "Coroutine"... and it tells me nothing.

[snip]

I'll address the conceptual only.

When I saw this posting earlier today I thought I'd whip up a concrete
example, like I implemented coroutines in the mid 1990's. However, I
discovered that I would then be late for a dinner, so I put it on hold.

Coroutines are just cooperative multitasking, multitasking with a single
thread of execution but multiple call stacks. When you CALL a coroutine
you create a new instance, with its own stack. There is nothing like
that in the standard library. When a coroutine TRANSFERs to another
coroutine it's like a `longjmp`, except that `longjmp` was designed to
jump up to an earlier point in a call chain, while a coroutine transfer
jumps to a separate call chain.

As I recall you're familiar with 16-bit Windows programming.

In 16-bit Windows (Windows 3.x in the early 1990s) each program
execution was a coroutine. When a program was launched, a stack area was
allocated for it. That's a call of a coroutine. When the program called
`GetMessage` or `Yield`, some other program instance would get a chance
to continue to run (after waiting for /its/ call to `GetMessage` or
`Yield` to return). That's a coroutine transfer.

The 16-bit Windows program execution were /huge/, heavy coroutines.

However, the main advantage of coroutines in ordinary programming is
with coroutines as a very light-weight multitasking solution. Since
there's only one thread of execution there are fewer synchronization
issues. In particular one doesn't have to worry about whether one
coroutine sees the memory changes effected by some other coroutine.

Cheers & hht.,

- Alf

Sam

unread,

Aug 4, 2019, 9:32:33 PM8/4/19

to

You can disagree all you want. Historical facts prove otherwise. Since
ancient days, the simplest task on Unixes were done as pipelined tasks, as
multiple execution contexts.

$ sort <file | uniq -c

That dates back decades.From the earlier days, shells were explicitly
designed to execute pipelined commands. Pipelined commands were their main
feature. All of this is pretty much I/O bound. Multiple execution threads.

Because of that, and battle-tested over decades, Unix was tuned to
effortlessly implement tasks that use multiple processes working in
parallel. Threads were just the next evolution, dropping most of what little
was left, in terms of per-execution context overhead. Linux inherited Unix's
legacy, and orientation, towards low-overhead multiple execution threads.

All the historical network server processes on Unix, then Linux, were multi-
processor based. Even though select() existed pretty much since 4.2 BSD
days, it was virtually unheard of for a network server to be a monolithic
process, multiplexing for its clients using select(). An incoming network
connection starts a new process, just for that network connection. Didn't
matter what network service it was. SMTP, finger, login, or what, it was all
one process per connection.

There was a reason for that. It didn't take an Einstein to figure out that
with a monolithic server, once it picked up an I/O event it had to do what
that event needs to do, and nothing else can happen until that's done. Even
if another client blurted out a packet, too bad, so sad. It needs to wait
until the current I/O event's processing is done. It made a lot more sense
to simply have another available CPU, not being busy with anything, run with
it. And, goshdarndammit, if that socket had a different execution thread
sucking from it, why, that spare CPU will know exactly what to do.

So, monolithic processes were rare. They are a little bit more common today
than before, but they are still a rarity. And, today, even network servers
that kick off multiple processes per client can be found. So, sorry, but
facts disagree: multiple execution contexts, either as standalone processes
or lightweight intra-process threads rule the roost in I/O bound contexts.
Linux inherits Unix's legacy in this respect, and the differences between
processes and threads, on Linux, are mostly cosmetic. clone() creates both
of them. You just specify which part of the parent process (virtual memory
map, file descriptors, filesystem namespace etc…) are shared with the new
execution context, and that's pretty much.

I can understand why coming from a MS-Windows background makes someone frown
at execution threads, unless all they do is stay away from the operating
system, and work entirely on their own, spinning the CPU. The only reason
stuff like IOCP exists on Windows is because windows sucks at multi-
threading and multi-processing. It always sucked, and will likely always
suck.

But that's not the case with the real world out there. Being able to have a
clean, encapsulated, logical execution thread, always busy with the forward
march of progress from the start pointing to the finish line, without being
forced into contorting into an event/dispatch based framework, results in
smaller, leaner code without all that event-based/dispatching cruft. It
doesn't matter which one of the file descriptors is now ready for I/O. It no
longer matters. The OS kernel already figured it out, and there's no good
reason for userspace to piss away more CPU cycles doing exactly the same
thing. The OS kernel knows which I/O is done, and which thread is waiting on
it, and it already knows whose ass to kick into gear.

A such, execution threads are perfect for I/O-based load. It is no wonder
that Linux rules the roost in TOP500 space. It's a dirty little secret that
all those supercomputers are just a bunch of machines with optimized
networking tying hundreds of thousands threads together. Yes, threads.

Gee whiz, how can that possibly happen?

Robert Wessel

unread,

Aug 4, 2019, 11:05:13 PM8/4/19

to

If you only have "hundreds" of connections, use any technique you
like, scaling isn't an issue.

Chris M. Thomasson

unread,

Aug 4, 2019, 11:10:22 PM8/4/19

to

On 8/4/2019 2:36 PM, Sam wrote:
> Chris M. Thomasson writes:
>
>> On 8/4/2019 1:43 PM, Sam wrote:
>>>> multiplex connections with poll/epoll/kqueue and tune the number of
>>>> threads to the workload and the number of cores available.
>>>
>>> Maybe you just don't know many people who write server code.
>>
>> I remember using IOCP back on WinNT 4. The idea of a thread per
>> connection simply does not scale. However, creating a thread or two
>> per CPU, and using a queue, like IOCP can scale.
>
> I agree that "a thread per connection simply does not scale" on
> Microsoft Windows. However that's only true for Microsoft Windows, and
> its crap multithreading.
>
> A thread per connection scales perfectly fine, on Linux. Even with
> hundreds of connections.

Hundreds? Try tens of thousands...

Back on WinNT 4.0, my server code could handle around 40,000 concurrent
TCP connections, using a handful of threads.

There were bursts of activity, where many connections were being rapidly
created and destroyed. My stress testing client programs tried to swamp
the server with various scenarios. This was a long time ago.

Paavo Helde

unread,

Aug 5, 2019, 2:37:00 AM8/5/19

to

IOCP is used in Windows to *reduce* the number of needed threads (at
least in user space), which is important because in Windows the thread
creation is a pretty heavyweight operation. So it's strange that IOCP is
mentioned as an example of "Threads can work fairly well with IO", it's
rather the opposite, at least on Windows.

The Linux equivalent of IOCP is AIO, see e.g. "man aio_read". However,
some googling suggests the Linux implementation is not yet optimal (I
guess this means it is not yet built into the kernel to the same extent
as in Windows).

Also, the popular Boost.Asio library has "asynchronous" already in its
name and uses async IO in the background as appropriate. It also has
implementations of coroutines, so maybe one can study its documentation
and examples in order to get familiar with coroutines.

BTW, Boost.Asio does not create a "thread per connection". Vice versa,
it expects the client program to set up a thread pool (with the number
of threads based on the number of cores, typically) which is reused for
all requests.

Martijn van Buul

unread,

Aug 5, 2019, 3:46:13 AM8/5/19

to

* Juha Nieminen:

> C++20 will introduce support for coroutines. To this day, for reasons
> that are completely mysterious even to me, I have *absolutely no idea*
> what they are. I just don't get it.

I've been using lua in the past, and I was heavily using coroutines at
some point. I don't claim to be an expert on these matters, but I'll
tell you why *I* used them.

The key is in the following snippet :

> It then gives a pseudocode example of coroutines... that look exactly
> like a function calling another (which then just returns to the
> original caller).
>
>==========================================
> var q := new queue
>
> coroutine produce
> loop
> while q is not full
> create some new items
> add the items to q
> yield to consume
>
> coroutine consume
> loop
> while q is not empty
> remove some items from q
> use the items
> yield to produce

... except that the snippet threw away the child with the bath water, by
explicitly yielding to a specific target in both directions. That doesn't
add any real benefit indeed, but when done differently, coroutines offer
two benefits:

1) They decouple the consumer and the producer.
2) You don't need to maintain state in an explicit container, you can
just use local variables on the stack.

While coroutines certainly aren't necessary here ("But I can do it with a
function call to an interface method" is a straw man argument, as noone
claimed otherwise - there are always multiple solutions to a single problem)
but they made some solutions certainly a lot more elegant.

I would suggest reading chapter 9 of "Programming in Lua". I know it's not
C++, but maybe it'll help as a primer. An older version is online at

https://www.lua.org/pil/9.1.html

--
Martijn van Buul - pi...@dohd.org

Juha Nieminen

unread,

Aug 5, 2019, 4:31:26 AM8/5/19

to

Louis Krupp <lkr...@nospam.pssw.com.invalid> wrote:
> A main program and a coroutine cooperate like this:
>
> The main program creates the coroutine object, associating it with a
> function to be executed. At some point, the main program hands control
> to the coroutine, possibly passing it arguments, and then waits for
> the coroutine to either yield (or return) and possibly pass back
> results. When the main program gets around to it, it resumes the
> coroutine and waits again for it to yield or return.
>
> The coroutine wakes up when it's told, possibly accepting arguments,
> and then does whatever it wants until it decides it's time to either
> yield control back to the main program or return, possibly passing
> results back to the main program. A coroutine that yields waits until
> it's told to resume by the main program; a coroutine that returns
> simply goes away and cannot be told to resume.

I'm still not sure how that's different from a regular function.

Or, perhaps more precisely, a lambda function (because as far as
I understand, a coroutine can have its own state, just like a
lambda can have captured variables, making the lambda effectively
a stateful functor.)

Is the difference that a coroutine can "return" (so to speak)
from anywhere inside itself, and the next time it gets "called"
again it resumes from that point forward, rather than from the
beginning (like a lambda would)?

I'm still not exactly sure how that's so useful in practice
(compared to stateful functors, like lambdas.)

Juha Nieminen

unread,

Aug 5, 2019, 4:37:29 AM8/5/19

to

Sam <s...@email-scan.com> wrote:
> This problem was solved a long time ago. It is called "threads".

I think that with "threads" you are implying pre-emptive multitasking.
Doesn't that introduce lots of problems relating to mutual exclusion?
Quite a significant portion of bugs out there are related to
multithreading problems.

Juha Nieminen

unread,

Aug 5, 2019, 4:42:56 AM8/5/19

to

Robert Wessel <robert...@yahoo.com> wrote:
> The point of a coroutine is that it can "suspend" at some arbitrary
> point in its logic (say a dozens subroutine calls and loops deep, plus
> a bunch of state from local variables), and then continue from that
> point (still those dozen subroutine calls and loops deep, with state
> intact), rather than at the beginning of the routine again.

From all the text and discussion out there I'm getting the picture that
coroutines are a bit like lambda functions (ie. essentially stateful
functors), with the difference that the coroutine can be "called" again
in such a manner that it continues from where it "returned" last time,
rather than always from the beginning (which would happen with lambdas).

Is that correct?

Is there a simple example of a situation where this is significantly
beneficial compared to a lambda or an explicitly written functor
(ie. a class with member variables and an operator())?

Juha Nieminen

unread,

Aug 5, 2019, 4:50:32 AM8/5/19

to

Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
> Coroutines are just cooperative multitasking, multitasking with a single
> thread of execution but multiple call stacks. When you CALL a coroutine
> you create a new instance, with its own stack.

Does this mean that if a coroutine calls other functions normally (which
themselves may then call other functions and so on), this chain of calls
will use its own stack that's separate and independent from the stack
that's used by main() and everything it calls?

And if any of those functions along the chain "yields" (or whatever the
term was to "exit" this "pseudo-thread"), to later be returned to that
point of execution, all that stack is still there and this chain of
function calls will continue as before?

If that's the case, then your explanation gave me completely new insight
into coroutines that I didn't know nor understand before.

Martijn van Buul

unread,

Aug 5, 2019, 5:51:12 AM8/5/19

to

* Juha Nieminen:

> Is the difference that a coroutine can "return" (so to speak)
> from anywhere inside itself, and the next time it gets "called"
> again it resumes from that point forward, rather than from the
> beginning (like a lambda would)?

Yup. It acts like a blocking call, in a way.

> I'm still not exactly sure how that's so useful in practice
> (compared to stateful functors, like lambdas.)

Consider a producer/consumer implemented using threads. In this
case 'publish' and 'consume' operate on some kind of queue, possibly
with a depth of 1. If a producer publishes something to a full queue,
it blocks. If the consumer tries to consume from an empty queue, it
blocks.

class ProducerClass
{
[...]

void mainLoop()
{
while ( ... )
{
Item newItem;

[... do work on newItem ...]

publish( newItem);
}
}
}

class ConsumerClass
{
[...]

void mainLoop()
{
while ( ... )
{
std::vector<Item> packet;
packet.reserve(10);

for (int i = 0; i < packet.capacity(); ++i)
{
Item newItem = consume();

[.. possibly do something on newItem ...]
packet.push_back(newItem);
}

[... do something on a packet of 10 items ...]

}
}
}

In this case, having the consumer and producer execute in their own
thread offers no performance benefit, but it might still be very beneficial
to implement it this way, depending on details of either consumer or
producer. As with all examples, the above example is a bit silly.

In this case, with a 1:1 relationship between consumer and producer, such
a threaded solution would offer no parallelisation, so using threads here
only serves to simplify the implementation of the consumer or producer -
performancewise it's detrimental. In this case, coroutines would offer
a compromise: The separation and implementation benefits of a threaded
solution, without the overhead caused by blocking.

Another possible use of coroutines would be iterators. Suppose I have a

struct Foo
{

};

struct Bar
{
[...]
std::list<Foo> foos;
}

struct Quux
{
[...]
std::vector<Bar> bars;
}

struct Wibble
{
std::map< ..., Quux> quuxMap;
}

Suppose I want to offer a way to iterate over all instances of "Foo" inside
a "Wibble", without exposing any of the classes inbetween (because they're
private to an implementation, for example). There are numerous options:

* Create a std::container<Foo> GetFoos(const Wibble &) method. Could be
expensive (because Foo is expensive to copy), impossible (because
Foo has a deleted copy constructor) - and is generally undesireable
anyway.
* Create an

ApplyOnFoos(const Wibble &, const std::function<void(const Foo&)> &callback)

method (or something similar using templates, details schmetails) that
iterates over all elements and calls the callback function for every
instance of 'Foo'. Could work, but the receiving end might have
to capture quite a bit to make this work.
* Create an iterator class along the line of

class outputFunctor
{
outputFunctor(Wibble &) {...}

std::optional<std::reference_wrapper<Foo>> operator() {...}
}

(Or create an STL iterator, which is essentially the same). Will
definately work, but implementation is going to be awkward.

The first two have the benefit that their implementation can use simple
constructs (more to the point: A nested range-based for loop will do the
trick). Without coroutines, implementing the functor will require storing
iterators to the intermediate class inside the functor. With coroutines,
you can implement it using the same simple nested-range for loop.

I haven't touched C++'s coroutines yet (I'm waiting for C++20 to arrive)
so I don't know whether the required boilerplate will offset any
implementation benefits, but in Lua it would be something like

function outputIterator(wibble)
local _generator = coroutine.create(
function()
for (_, quux) in pairs(wibble.quuxMap) do
for (_, bar) in pairs(quux.bars) do
for(_, foo) in pairs(bar.foos) do
coroutine.yield(foo)
end
end
end
end)

return function ()
local _, nextFoo = coroutine.resume(_generator)
return nextFoo
end
end

usage:

local myWbble = [....]

for (foo in outputIterator(myWibble)) do
print ("Yay, a foo!", tostring(foo))
end

(Untested, and my lua is a bit rusty)

David Brown

unread,

Aug 5, 2019, 6:06:20 AM8/5/19

to

Yes, it certainly does.

One way to think about coroutines is like threads, but cooperatively
multitasking instead of preemptive multitasking. You, the programmer,
have full control over which coroutine is running at a time, and when it
can be blocked. This has disadvantages, of course - it means more
manual control, and it cannot take advantage of multiple cores (you need
to run your coroutines from different threads for that). But it has the
advantage of being simpler and lighter (not every system is a multi-core
multi-GHz monster), and you don't need to worry about locking,
synchronisation, atomic accesses, races, contention, or anything like that.

I am looking forward to coroutines for my sub-GHz single core embedded
systems.

Juha Nieminen

unread,

Aug 5, 2019, 8:20:47 AM8/5/19

to

David Brown <david...@hesbynett.no> wrote:
> One way to think about coroutines is like threads, but cooperatively
> multitasking instead of preemptive multitasking.

From another forum I got the impression that C++20 coroutines are stackless.
Meaning that they don't have their own stack that's separate from the one
used by main(). Which means that you can't have a chain of function calls
in a coroutine, have one of those functions "yield", and then later return
back to that point and resume execution as normal. (Basically, and if I
understand correctly, only the coroutine function itself can "yield",
not any of the functions it might call in the normal way.)

This would make coroutines rather different from threads. Actual threads
have each their own stack that they use for function call parameters etc,
and which is independent from other threads.

In other words, it sounds to me like coroutines are, essentially,
lambda functions, or stateful functor objects, with a jump at the
beginning of the function to the position in the code that last yielded.

Chris Vine

unread,

Aug 5, 2019, 8:36:18 AM8/5/19

to

This is essentially correct although in speaking of lambda functions
with closures you are moving from the concept ("what is a coroutine?")
to one possible implementation.

Asymmetric coroutines (ones which can only suspend to the caller) are
indeed like ordinary functions (not necessarily lambda functions) except
that they can emit an enhanced return statement at some point in the
function's execution and, by some arrangement to be implemented, return
to that point later. Symmetric coroutines don't "return" in quite this
way - instead they yield to some other coroutine, which need not be the
same as the one that last resumed them.

One possible but fragile implementation of a coroutine is a stateful
functor built by hand along the lines you describe. For example, for an
asymmetric coroutine you could construct a functor whose operator() has
three return statements as possible exit points. The functor object
could have an enum as member data which describes what point the
function has at present reached. When the functor is to "yield" it
sets the enum to the correct point, saves its local variables to the
functor's data variables and then returns. When invoked again
("resumed") it first examines the enum and jumps to the correct part of
the operator() function where it is to resume, and if necessary
re-establishes its local state. Note also that C has setjmp() and
longjmp() as primitives which are similar to delimited continuations
(but not recommended for use in C++ because longjumps give undefined
behaviour if non-trivial objects are in scope).

In addition to symmetric and asymmetric coroutines, you can also have
stackfull and stackless coroutines. "Stackless" is a bit misleading:
it means that the coroutine does not need a stack while it is suspended
(clearly it needs one when resumed), allowing the stack to be reused
in the meantime.

Alf P. Steinbach

unread,

Aug 5, 2019, 10:59:37 AM8/5/19

to

On 05.08.2019 10:50, Juha Nieminen wrote:
> Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
>> Coroutines are just cooperative multitasking, multitasking with a single
>> thread of execution but multiple call stacks. When you CALL a coroutine
>> you create a new instance, with its own stack.
>
> Does this mean that if a coroutine calls other functions normally (which
> themselves may then call other functions and so on), this chain of calls
> will use its own stack that's separate and independent from the stack
> that's used by main() and everything it calls?

Conceptually, yes, and with a completely general implementation such as
Windows fibers or the old Boost coroutines, or e.g. Modula-2 coroutines,
yes.

However, C++ coroutines will be so called "stackless" coroutines as an
optimization and restriction. According to ¹cppreference, "Coroutines
are stackless: they suspend execution by returning to the caller". As I
read it that means all coroutines using the C++20 syntax.

"Stackless" is possible by restricting the place that a coroutine
transfer can be specified to the couroutine body itself, with no
transfer in any code that it calls.

With this restriction one ensures that the coroutine's stack has a
/very/ small maximum size when a transfer happens. AND its own
parameters and expression evaluation is then all that it needs its own
little stack for, because when a call out of the routine can't cause a
transfer, then such a call will return with the stack that's used for
the call, reverted back to the state at the call. Which means that calls
out of the coroutine can be done on a stack that's common to all such
"stackless" coroutines.

The restriction also plays another rôle: with C++20 coroutines a
function is designated as a coroutine, i.e. that's the way that you
specify that it is a coroutine, by containing one of the keywords that
specify a transfer, namely any of `co_await`, `co_yield` or `co_return`.

> And if any of those functions along the chain "yields" (or whatever the
> term was to "exit" this "pseudo-thread"), to later be returned to that
> point of execution, all that stack is still there and this chain of
> function calls will continue as before?

Yes, with general coroutines.

But not with C++20 coroutines, due to the optimization & restriction.

> If that's the case, then your explanation gave me completely new insight
> into coroutines that I didn't know nor understand before.

Thanks.

Cheers!,

- Alf

Links:
¹ https://en.cppreference.com/w/cpp/language/coroutines

Mr Flibble

unread,

Aug 5, 2019, 1:33:03 PM8/5/19

to

> multi-processor based. Even though select() existed pretty much since 4.2

> multi-threading and multi-processing. It always sucked, and will likely

> always suck.
>
> But that's not the case with the real world out there. Being able to have
> a clean, encapsulated, logical execution thread, always busy with the
> forward march of progress from the start pointing to the finish line,
> without being forced into contorting into an event/dispatch based
> framework, results in smaller, leaner code without all that
> event-based/dispatching cruft. It doesn't matter which one of the file
> descriptors is now ready for I/O. It no longer matters. The OS kernel
> already figured it out, and there's no good reason for userspace to piss
> away more CPU cycles doing exactly the same thing. The OS kernel knows
> which I/O is done, and which thread is waiting on it, and it already knows
> whose ass to kick into gear.
>
> A such, execution threads are perfect for I/O-based load. It is no wonder
> that Linux rules the roost in TOP500 space. It's a dirty little secret
> that all those supercomputers are just a bunch of machines with optimized
> networking tying hundreds of thousands threads together. Yes, threads.
>
> Gee whiz, how can that possibly happen?

epoll(...), dear.

Scott Lurndal

unread,

Aug 5, 2019, 1:48:18 PM8/5/19

to

Mr Flibble <flibbleREM...@i42.co.uk> writes:
>On 05/08/2019 02:32, Sam wrote:
>> Chris Vine writes:

>>
>> A such, execution threads are perfect for I/O-based load. It is no wonder
>> that Linux rules the roost in TOP500 space. It's a dirty little secret
>> that all those supercomputers are just a bunch of machines with optimized
>> networking tying hundreds of thousands threads together. Yes, threads.
>>
>> Gee whiz, how can that possibly happen?
>
>epoll(...), dear.
>

How does epoll help with high volume disk I/O?

Bart

unread,

Aug 5, 2019, 3:19:47 PM8/5/19

to

I don't get coroutines either. Since I have also implemented languages,
I sometimes put in features I've heard of which sound neat, although
that's often just for box-ticking as I then never use the feature again.

But coroutines, I wouldn't even bother ticking a box for.

However.... with 'generators', I do understand how they can be useful,
and which are something I've been thinking about adding. And generators,
I understand, can be implemented with coroutines.

(Although I would just create whatever I can see will be necessary. If I
end up inventing something like a coroutine, then that would be cool
too, even if I couldn't tell you what they are.)

Anyway, 'generators' might be one easy-to-understand application of
'coroutines'.

Christian Gollwitzer

unread,

Aug 5, 2019, 4:27:02 PM8/5/19

to

Am 05.08.19 um 10:42 schrieb Juha Nieminen:

> From all the text and discussion out there I'm getting the picture that
> coroutines are a bit like lambda functions (ie. essentially stateful
> functors), with the difference that the coroutine can be "called" again
> in such a manner that it continues from where it "returned" last time,
> rather than always from the beginning (which would happen with lambdas).
>
> Is that correct?

Similar to my understanding of it.

>
> Is there a simple example of a situation where this is significantly
> beneficial compared to a lambda or an explicitly written functor
> (ie. a class with member variables and an operator())?
>

When you yield() from inside a loop. With coroutines, you can implement
generators - a functor which returns a value from a set, one at a time -
similar to iterators. You can write your code *as if* the yielding would
be a simple function call.

For example, consider a function which prints the time as hours:minutes
from midnight to midnight:

void printtime() {
for (int hour=0; hour<24; hour++) {
for (int min=0; min<60; min++) {
std::cout<<hour<<":"<<min<<std::endl;
}
}
}

Now you want a functor that returns the formatted string, one at a time,
for each call. Coroutine solution:*

void printtime() {
for (int hour=0; hour<24; hour++) {
for (int min=0; min<60; min++) {
std::ostringstream os;
os<<hour<<":"<<min;
co_yield(os.str());
}
}
}

The only significant change is print -> yield. If you had to write this
thing manually, you would need to unroll the loops and correctly switch
the state when the minutes surpass 60.

With two for loops, that is still practically doable, albeit
inconvenient - but now consider to implement an iterator for
breadth-first traversal of a tree. Trivial algorithm for printing:

void bftraverse(Tree* root) {
std::cout<<root->content;
if (root->left) bftraverse(root->left);
if (root->right) bftraverse(root->right);
}

Now please unroll this into a functor which produces one value at a
time. Good luck getting this right at the first time, basically that
would be reinventing an iterative algorithm from TAOCP.

Coroutine solution:

void bftraverse(Tree* root) {
co_yield(root->content);
if (root->left) bftraverse(root->left);
if (root->right) bftraverse(root->right);
}

voilá.

If you're patient, search for videos from Gor Nishanov, he has provided
some examples in C++ (tree traversal is one of them).

Christian

(*) I haven't tried C++ coroutines, so the syntax may be off. I'm using
them in Python and Tcl only

Alf P. Steinbach

unread,

Aug 5, 2019, 5:10:09 PM8/5/19

to

On 05.08.2019 22:26, Christian Gollwitzer wrote:
> [snip]

As I understand it this generates a new coroutine instance, with
dynamically allocated state, for each call.

It needs to do that (modulo optimization) to represent the call stack
for the recursive calls.

Thus it's convenient, but possibly/probably inefficient.

> If you're patient, search for videos from Gor Nishanov, he has provided
> some examples in C++ (tree traversal is one of them).
>
> Christian
>
>
> (*) I haven't tried C++ coroutines, so the syntax may be off. I'm using
> them in Python and Tcl only

Same for me, except I'm only using my head. That's not intelligent,
because the C++20 coroutines are modeled on the Visual C++ coroutines,
except they've changed the keywords to more unreadable and tried to make
the documentation like impenetrable machine code. I wish I was more
intelligent, at least enough to just use the VC compiler I have...

Anyway, to avoid the dynamic allocation inefficiency I'd probably do
something like this:

auto bftraverse( Node* root )
-> optional<Data>
{
stack<Node*> nodes;
nodes.push( root );
while( not nodes.empty() ) {
const auto p = nodes.top();
nodes.pop();
if( p == nullptr ) {
co_return {};
}
co_yield p->data;
if( p->right ) { nodes.push( p->right ); }
if( p>left ) { nodes.push( p->left ); }
}
}

Except that from cppreference's page it seems that the trailing return
type syntax isn't supported for coroutines. If so then that's an
inconsistency that will probably be fixed later.

I guess that at some point someone is going to publish a guideline,
"Recursive coroutines considered harmful". For efficiency. Then someone
else is going to counter that with ""Recursive coroutines considered
harmful" considered harmful", for convenience and development time. And
then there will maybe be some discussion. In forums of the mainly
willfully ignorant those who post either naturally recursive or
unnaturally iterative coroutines will sometimes get blasts of anonymous
downvotes. For not following the perceived best practice. And so on. :)

Cheers!,

- Alf

Chris M. Thomasson

unread,

Aug 5, 2019, 5:34:33 PM8/5/19

to

Yeah. IOCP is the way to do things on Windows wrt creating scaleable
servers. I am fond of the following function:

https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile

It works well with IOCP. Also,

https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nc-mswsock-lpfn_transmitpackets

There is a neat way to wait on IOCP that can dequeue several events in
one shot:

https://docs.microsoft.com/en-us/windows/win32/fileio/getqueuedcompletionstatusex-func

Iirc, certain Windows os's limited the number of TransmitFile functions
to a max of two concurrent inflight calls at any time. The server
versions of the OS would allow for as many as the system could handle at
a time.

Chris M. Thomasson

unread,

Aug 5, 2019, 5:45:04 PM8/5/19

to

> performancewise it's detrimental. [...]

Actually, single producer, single consumer queues are pretty nice. The
consumer thread can work on things without bothering the procuder
thread, and vise versa. The sync can be implemented without using any
atomic RMW.

Sam

unread,

Aug 5, 2019, 8:14:22 PM8/5/19

to

Now look what you've done. You confused him with facts. Shame on you.

Chris M. Thomasson

unread,

Aug 6, 2019, 2:44:08 AM8/6/19

to

Damn. I thought they were trying to get something like a standard
getcontext/setcontext, or fibers.

leigh.v....@googlemail.com

unread,

Aug 6, 2019, 7:13:45 AM8/6/19

to

How do threads help with high volume disk I/O, dear?

leigh.v....@googlemail.com

unread,

Aug 6, 2019, 7:23:54 AM8/6/19

to

Of course by that I actually meant how does *lots* of threads help with high volume disk I /O, dear?

Bonita Montero

unread,

Aug 6, 2019, 7:58:33 AM8/6/19

to

> One way to think about coroutines is like threads, but
> cooperatively multitasking instead of preemptive multitasking.

It's not really multitasking since the corutine-context can only
live as long as the calling function. With cooperative multitasking
like with fibers in Windows, a fiber can end without stopping other
contexts.

David Brown

unread,

Aug 6, 2019, 9:06:59 AM8/6/19

to

The explanation here calls them "stackless":

<https://en.cppreference.com/w/cpp/language/coroutines>

But really, they are only "stackless" in the sense that C++ does not
actually need a stack. They have a local state - you can have normal
local variables, and a normal call chain of normal functions. This will
typically end up on a heap somewhere, but is effectively a local stack
for the coroutine execution state. It is entirely possible, given
simple enough coroutines and smart enough compilers, for this all to be
boiled down to a single allocatable block.

David Brown

unread,

Aug 6, 2019, 9:19:07 AM8/6/19

to

After reading some of Alf's posts in this thread, it looks like I was
wrong here and C++ coroutines will be more restricted than general
coroutines. If they can only yield from within the coroutine function
itself, not arbitrary external functions, then the compiler can figure
out the size of the local stack frame needed and generate it as a single
block, thus avoiding any variable sized stack.

A simple and lightweight version of this has been in use in embedded
systems in C for long time. For those that think Duff's device is cool,
macros are a joy, and the non-structured nature of "switch" is a great
idea, have a look at this:

<https://en.wikipedia.org/wiki/Protothread>
<http://dunkels.com/adam/pt/>

Manfred

unread,

Aug 6, 2019, 10:24:59 AM8/6/19

to

Excellent explanation, thanks!

On 8/5/2019 1:11 AM, Alf P. Steinbach wrote:

> On 04.08.2019 11:37, Juha Nieminen wrote:
>> C++20 will introduce support for coroutines. To this day, for reasons
>> that are completely mysterious even to me, I have *absolutely no idea*
>> what they are. I just don't get it.
>>

>> I go to the Wikipedia page "Coroutine"... and it tells me nothing.
>
> [snip]
>
> I'll address the conceptual only.
>
> When I saw this posting earlier today I thought I'd whip up a concrete
> example, like I implemented coroutines in the mid 1990's. However, I
> discovered that I would then be late for a dinner, so I put it on hold.

>
> Coroutines are just cooperative multitasking, multitasking with a single
> thread of execution but multiple call stacks. When you CALL a coroutine

> you create a new instance, with its own stack. There is nothing like
> that in the standard library. When a coroutine TRANSFERs to another
> coroutine it's like a `longjmp`, except that `longjmp` was designed to
> jump up to an earlier point in a call chain, while a coroutine transfer
> jumps to a separate call chain.
>
> As I recall you're familiar with 16-bit Windows programming.
>
> In 16-bit Windows (Windows 3.x in the early 1990s) each program
> execution was a coroutine. When a program was launched, a stack area was
> allocated for it. That's a call of a coroutine. When the program called
> `GetMessage` or `Yield`, some other program instance would get a chance
> to continue to run (after waiting for /its/ call to `GetMessage` or
> `Yield` to return). That's a coroutine transfer.
>
> The 16-bit Windows program execution were /huge/, heavy coroutines.
>
> However, the main advantage of coroutines in ordinary programming is
> with coroutines as a very light-weight multitasking solution. Since
> there's only one thread of execution there are fewer synchronization
> issues. In particular one doesn't have to worry about whether one
> coroutine sees the memory changes effected by some other coroutine.
>
>
> Cheers & hht.,
>
> - Alf

Chris Vine

unread,

Aug 6, 2019, 5:19:33 PM8/6/19

to

No, you have fallen victim to a classic misdirection play by Scott. The
claim made in response to a posting of mine, originally by someone by
the name of "Sam", was that native OS threads on linux (but not
windows) can deal equally well with the kind of i/o things that
coroutines are good at, and are easier to use. Both these propositions
are in my opinion wrong, but it stands as the argument.

High volume disk i/o is irrelevant to this issue, because coroutines
aren't well suited to high volume disk i/o, nor are poll/select/epoll.
Native threads on the other hand are reasonably well suited to this work
load because operations on high volume disks tend to be cpu-bound and
not i/o-bound. And apart from anything else, block devices representing
hard disks are always signalled as ready (at any rate, they are with
select and poll, I do not know about epoll, so poll and select are
useless with them).

The real answer to Scott is "how do coroutines help with high volume
disk i/o"? I agree with what seems to be his suggestion - that they
don't - so they are irrelevant to your posting about epoll.

Chris M. Thomasson

unread,

Aug 6, 2019, 6:20:03 PM8/6/19

to

AIO should be fine, its POSIX but oh well:

http://man7.org/linux/man-pages/man7/aio.7.html

https://pubs.opengroup.org/onlinepubs/009695399/basedefs/aio.h.html

IOCP can be used with files, even memory mapped ones.

Alf P. Steinbach

unread,

Aug 6, 2019, 6:52:42 PM8/6/19

to

On 07.08.2019 00:38, Stefan Ram wrote:

> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>> I'll address the conceptual only.
>

> Not wanting to contradict anything,
> just as an extension:
>
> A coroutine is a function that contains
> either a coroutine-return-statement or
> an await-expression or a yield-expression.

You mean, "A C++20 coroutine is...".

General implementations of coroutines do not have that or corresponding
restriction.

Even if the restriction is a special case of coroutines, I think it can
be argued that "coroutine" is a misnomer for the C++20 thing. :-)
Because it implies much that isn't there.

---

By the way, I learned a bit more, by looking around at tutorials, and
I'm appalled at all the machinery it seems one must define for the
return type of even the simplest C++20 coroutine.

It's akin to the infamous 600+ line Microsoft OLE technology "Hello,
world!"...

Hopefully they'll get around to provide some default machinery, before
it's accepted in the standard.

Cheers!,

- Alf

Chris Vine

unread,

Aug 6, 2019, 7:01:21 PM8/6/19

to

On Tue, 6 Aug 2019 15:19:48 -0700
"Chris M. Thomasson" <invalid_chris_t...@invalid.com>
wrote: [snip]

> AIO should be fine, its POSIX but oh well:
>
> http://man7.org/linux/man-pages/man7/aio.7.html
>
> https://pubs.opengroup.org/onlinepubs/009695399/basedefs/aio.h.html

Yes as I understand it AIO works for asynchronous i/o with block devices
representing hard disks, even though you can't use select/poll and I
think epoll for them. But AIO is not my area and I have to say that I
can't really understand the point of it with disks: the issue with such
block devices is that in polling terms they are always available to
read or write in the absence of an error. For reading in particular,
you don't do it for the fun of it: you do it to process the data you
obtain, which is likely to require use of the CPU. So why not just use
threads without AIO? You are not dealing with network latency.

I notice incidentally that the documentation for glibc's implementation
of AIO says: "This has a number of limitations, most notably that
maintaining multiple threads to perform I/O operations is expensive
and scales poorly". All the more reason to do that particular work
synchronously I would have thought. The alternative seems to be to use
signals.

Do you have any experience of AIO? (As I say, I don't.)

Scott Lurndal

unread,

Aug 6, 2019, 7:49:24 PM8/6/19

to

Indeed, and that's what I used in the Oracle RDMS os dependent code I wrote.
(well, technically, lio_listio() because we'd submit hundreds of
aiocbs with a single request - we had 128 physical disk drives on
64 SCSI controllers running Oracle Parallel Server on 64 tightly coupled
P6 processors running a distributed single-system-image micro-kernel
based operating system (written in C++, using cfront)).

Let the OS I/O subsystem do the scheduling, don't try to do it in user mode.

Note that modern network workloads that require high packet throughput
generally use DPDK which provides the capability to leverage any hardware
acceleration or offloading capabilities to improve I/O performance. This
uses threads for parallelism.

https://www.dpdk.org/

Responding to Sam's assertion regarding the TOP 500; they, for the most
part, use Infiniband RDMA Verbs interface for communications between nodes,
not ethernet. And it's all buried under openMP, PVM or other standard
communication library APIs integrated with C and C++.

Scott Lurndal

unread,

Aug 6, 2019, 7:52:56 PM8/6/19

to

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> writes:
>On Tue, 6 Aug 2019 15:19:48 -0700
>"Chris M. Thomasson" <invalid_chris_t...@invalid.com>
>wrote: [snip]
>> AIO should be fine, its POSIX but oh well:
>>
>> http://man7.org/linux/man-pages/man7/aio.7.html
>>
>> https://pubs.opengroup.org/onlinepubs/009695399/basedefs/aio.h.html

>I notice incidentally that the documentation for glibc's implementation
>of AIO says: "This has a number of limitations, most notably that
>maintaining multiple threads to perform I/O operations is expensive
>and scales poorly". All the more reason to do that particular work
>synchronously I would have thought. The alternative seems to be to use
>signals.
>
>Do you have any experience of AIO? (As I say, I don't.)

I used it on real unix extensively (mainly lio_listio()). It's a good
way to let the operating system I/O scheduler handling the I/O in the
most efficient way and was quite efficient particularly with multiple
disk drives on multiple controllers. Granted the application was
the oracle RDMS, for which the I/O requirements are not typical of
most applications.

Öö Tiib

unread,

Aug 6, 2019, 7:57:15 PM8/6/19

to

On Wednesday, 7 August 2019 02:01:21 UTC+3, Chris Vine wrote:
> But AIO is not my area and I have to say that I
> can't really understand the point of it with disks: the issue with such
> block devices is that in polling terms they are always available to
> read or write in the absence of an error. For reading in particular,
> you don't do it for the fun of it: you do it to process the data you
> obtain, which is likely to require use of the CPU. So why not just use
> threads without AIO?

It is better to use either async or blocking I/O but not to mix.
Translation between the two is possible to do with a thread. What
difference there is if such translation from blocking to async I/O
was made by glibc or ourselves (other than less work for
ourselves when glibc did it)?

Scott Lurndal

unread,

Aug 7, 2019, 10:41:47 AM8/7/19

to

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> writes:
>On Wednesday, 7 August 2019 02:01:21 UTC+3, Chris Vine wrote:
>> But AIO is not my area and I have to say that I
>> can't really understand the point of it with disks: the issue with such
>> block devices is that in polling terms they are always available to
>> read or write in the absence of an error. For reading in particular,
>> you don't do it for the fun of it: you do it to process the data you
>> obtain, which is likely to require use of the CPU. So why not just use
>> threads without AIO?
>
>It is better to use either async or blocking I/O but not to mix.

For the same file descriptor or for the program? why?

Use whatever is necessary to provide the required functionality
and performance.

>Translation between the two is possible to do with a thread. What
>difference there is if such translation from blocking to async I/O
>was made by glibc or ourselves (other than less work for
>ourselves when glibc did it)?

What's wrong with using blocking I/O for some files and async I/O for
others? (or even both on the same file, for that matter).

Öö Tiib

unread,

Aug 8, 2019, 10:11:02 AM8/8/19

to

On Wednesday, 7 August 2019 17:41:47 UTC+3, Scott Lurndal wrote:
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> writes:
> >On Wednesday, 7 August 2019 02:01:21 UTC+3, Chris Vine wrote:
> >> But AIO is not my area and I have to say that I
> >> can't really understand the point of it with disks: the issue with such
> >> block devices is that in polling terms they are always available to
> >> read or write in the absence of an error. For reading in particular,
> >> you don't do it for the fun of it: you do it to process the data you
> >> obtain, which is likely to require use of the CPU. So why not just use
> >> threads without AIO?
> >
> >It is better to use either async or blocking I/O but not to mix.
>
> For the same file descriptor or for the program? why?

I meant for a single stream, regardless if it is file or something
else.

> Use whatever is necessary to provide the required functionality
> and performance.

Exactly.
With tiny files of size measured in kilobytes it is cheapest
and most robust to read whole file and then to process the data.
If that starts affect performance then the issue is keeping data
in too lot of too tiny files.
With bigger files when either reading or processing what was read is way
more expensive than other or both are inexpensive enough (for
what is expected) then linear, blocking read and processing
cycle is good option. On case when it is large file and both cost of
reading and processing data affect performance then it is better
to keep input asynchronous and so to do these two things in parallel.

> >Translation between the two is possible to do with a thread. What
> >difference there is if such translation from blocking to async I/O
> >was made by glibc or ourselves (other than less work for
> >ourselves when glibc did it)?
>
> What's wrong with using blocking I/O for some files and async I/O for
> others? (or even both on the same file, for that matter).

I know of no example where it was optimal for same file.
I have experienced that attempts to mix the two for same file have
resulted with something pointlessly fragile and complicated. If
there is point to read parts of file asynchronously from processing
then switching between asynchronous and blocking mode have cost and
are complication. There have to be switching since file descriptors
that I've seen are not meant to do both at same time. Pointlessly
complex logic is then easier to break with maintenance. So it is
better to use blocking reads and to process the parts where processing
is expensive in separate threads instead.

Tim Rentsch

unread,

Aug 9, 2019, 9:12:18 PM8/9/19

to

"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> On 07.08.2019 00:38, Stefan Ram wrote:
>
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>
>>> I'll address the conceptual only.
>>
>> Not wanting to contradict anything,
>> just as an extension:
>>
>> A coroutine is a function that contains
>> either a coroutine-return-statement or
>> an await-expression or a yield-expression.
>
> You mean, "A C++20 coroutine is...".
>
> General implementations of coroutines do not have that or
> corresponding restriction.
>
> Even if the restriction is a special case of coroutines, I think
> it can be argued that "coroutine" is a misnomer for the C++20

> thing. :-) [...]

Yes, and it's an easy argument to make. The C++ construct I have
seen described bears almost no resemblance to coroutines as they
were originally described (I first read about them in Knuth v. 1).
A C++20 "coroutine" is more like a detachable, resumable function,
not at all like the symmetric arrangement of mutually calling
functions described in Knuth.

To be fair, it's probably worth mentioning that in those days (and
especially in Knuth's TAOCP), call linkage was a lot more static,
and what we now take for granted in the way of automatic storage
was often nowhere to be seen in many program environments. Call
stack? Local variables? Reentrant functions? These elements are
pretty much not found in environments where coroutines were
applied, and vice versa. It's hard to unify these two kinds of
control structures. So it isn't surprising that "coroutine" takes
on a rather different form in C++, compared to what coroutines
were originally.

Tim Rentsch

unread,

Aug 9, 2019, 10:50:07 PM8/9/19

to

"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

>> [...]
>
> Anyway, to avoid [a recursive coroutine] I'd probably do

> something like this:
>
>
> auto bftraverse( Node* root )
> -> optional<Data>
> {
> stack<Node*> nodes;
> nodes.push( root );
> while( not nodes.empty() ) {
> const auto p = nodes.top();
> nodes.pop();
> if( p == nullptr ) {
> co_return {};
> }
> co_yield p->data;
> if( p->right ) { nodes.push( p->right ); }
> if( p>left ) { nodes.push( p->left ); }
> }
> }

The function body can be simplified thus:

stack<Node*> nodes;
nodes.push( root );
while( not nodes.empty() ) {
const auto p = nodes.top();
nodes.pop();

if( !p ) continue;
co_yield p->data;
nodes.push( p->right );
nodes.push( p->left );
}

Alf P. Steinbach

unread,

Aug 9, 2019, 11:14:05 PM8/9/19

to

Thanks.

Don't know why I wrote that so complicated, with no nullpointers pushed
except possibly the root. But I guess I was just transforming the
original code, which had those conditional pushes.

Cheers!,

- Alf

Christian Gollwitzer

unread,

Aug 10, 2019, 2:23:22 AM8/10/19

to

Am 05.08.19 um 23:09 schrieb Alf P. Steinbach:

> On 05.08.2019 22:26, Christian Gollwitzer wrote:
>> Coroutine solution:
>>
>> void bftraverse(Tree* root) {
>>     co_yield(root->content);
>>     if (root->left) bftraverse(root->left);
>>     if (root->right) bftraverse(root->right);
>> }
>>
>> voilá.
>
> As I understand it this generates a new coroutine instance, with
> dynamically allocated state, for each call.

yes, of course.

> It needs to do that (modulo optimization) to represent the call stack
> for the recursive calls.
>
> Thus it's convenient, but possibly/probably inefficient.
>

>

> auto bftraverse( Node* root )
>     -> optional<Data>
> {
>     stack<Node*> nodes;
>     nodes.push( root );
>     while( not nodes.empty() ) {
>         const auto p = nodes.top();
>         nodes.pop();
>         if( p == nullptr ) {
>             co_return {};
>         }
>         co_yield p->data;
>         if( p->right ) { nodes.push( p->right ); }
>         if( p>left )   { nodes.push( p->left ); }
>     }
> }
>

This code now manages the stack manually and is half-way to the
transformation into the iterative versino which you need to write
*without* coroutines.

I'm not sure my message was received, I try again. So consider, you
decide to go from breadth-first traversal to in-order traversal.

Coroutine:

void bftraverse(Tree* root) {
if (root->left) [<-] bftraverse(root->left);
co_yield(root->content);
if (root->right) [<-] bftraverse(root->right);
}

The only thing that has changed, is the order of the yield instructions.
(plus I added the forgotten [<-] thing as shown in the TR paper, which
means the same as "yield from" in Python)

How to transform your unfolded code? Let's replace:

> co_yield p->data;
> if( p->right ) { nodes.push( p->right ); }
> if( p>left ) { nodes.push( p->left ); }

by

> if( p->right ) { nodes.push( p->right ); }

> co_yield p->data;

> if( p>left ) { nodes.push( p->left ); }

and surprisingly, it is still doing breadth-first search. Transforming
this correctly means either a different stack management, or a special
algorithm invented by geniuses which can walks along the tree leafs-

The point of coroutines is not to maximize efficieny, but to write code
*as if* it were the original simple, clear algorithm, but resulting in a
state machine. For some cases this can actually improve efficiency, e.g
for asynchronous network code, because the manual async thing is so
complex that noone would do it.

The price to pay is actually not that high. Your stack uses one node*
for the coroutine state. The compiler-generated version uses all local
variables (just Node*) plus an instruction pointer, so yes it is more,
but not that much. Gor Nishanov actually claimed in his talk, that VC++
coroutines are efficient enough for asynchronous access to *main memory*

https://www.youtube.com/watch?v=j9tlJAqMV7U

Christian

Chris M. Thomasson

unread,

Aug 10, 2019, 3:19:50 AM8/10/19

to

On 8/5/2019 1:26 PM, Christian Gollwitzer wrote:
> Am 05.08.19 um 10:42 schrieb Juha Nieminen:

[...]

> Coroutine solution:
>
> void bftraverse(Tree* root) {
>    co_yield(root->content);
>    if (root->left) bftraverse(root->left);
>    if (root->right) bftraverse(root->right);
> }

Looks nice to me.

Alf P. Steinbach

unread,

Aug 10, 2019, 4:43:57 AM8/10/19

to

I think you mean go from pre-order to in-order.

> Coroutine:
>
> void bftraverse(Tree* root) {
>      if (root->left) [<-] bftraverse(root->left);
>      co_yield(root->content);
>      if (root->right) [<-] bftraverse(root->right);
> }
>
> The only thing that has changed, is the order of the yield instructions.
> (plus I added the forgotten [<-] thing as shown in the TR paper, which
> means the same as "yield from" in Python)

I haven't yet stumbled upon that notation, so I think it must be
something that once was proposed but has been rejected?

Anyway, I'm now aware that for C++20 coroutines one needs a return type
that implements a lot of stuff. With Visual C++ that's apparently
provided by `std::future`. I don't know about the standard.

> How to transform your unfolded code? Let's replace:
>
> >          co_yield p->data;
> >          if( p->right ) { nodes.push( p->right ); }
> >          if( p>left )   { nodes.push( p->left ); }
>
> by
>
> >          if( p->right ) { nodes.push( p->right ); }
> >          co_yield p->data;
> >          if( p>left )   { nodes.push( p->left ); }
>
> and surprisingly, it is still doing breadth-first search. Transforming
> this correctly means either a different stack management, or a special
> algorithm invented by geniuses which can walks along the tree leafs-

Iterative in-order isn't that hard.

After having output (or whatever) the left subtree of node N, one
outputs N.data and registers a sufficient subset of nodes in the right
subtree in the data structure, which can be a stack, so that the same
general process of popping out a node, displaying it, and registering
some more, can work.

What's necessary then is just that the first node popped out and thus
next output, should be the leftmost leaf of the right subtree. To do
that it's necessary to register all the nodes down to that left. I.e., a
loop, to register (push) the nodes down the chain to that leaf node.

void process_tree_node( const Node& node )
{
cout << node.data;
}

void process_tree_nodes_inorder( const Type_<const Node*> p_root )
{
stack<const Node*> nodes;
const Node* p_current = p_root;
for( ;; ) {
while( p_current ) {
nodes.push( p_current );
p_current = p_current->left;
}
if( is_empty( nodes ) ) {
break;
}
p_current = popped_top( nodes );
process_tree_node( *p_current );
p_current = p_current->right;

}
}

> The point of coroutines is not to maximize efficieny, but to write code
> *as if* it were the original simple, clear algorithm, but resulting in a
> state machine. For some cases this can actually improve efficiency, e.g
> for asynchronous network code, because the manual async thing is so
> complex that noone would do it.
>
> The price to pay is actually not that high. Your stack uses one node*
> for the coroutine state. The compiler-generated version uses all local
> variables (just Node*) plus an instruction pointer, so yes it is more,
> but not that much. Gor Nishanov actually claimed in his talk, that VC++
> coroutines are efficient enough for asynchronous access to *main memory*
>
> https://www.youtube.com/watch?v=j9tlJAqMV7U

Cheers!,

- Alf

Bonita Montero

unread,

Aug 10, 2019, 4:50:13 AM8/10/19

to

> From all the text and discussion out there I'm getting the picture that
> coroutines are a bit like lambda functions (ie. essentially stateful
> functors), with the difference that the coroutine can be "called" again
> in such a manner that it continues from where it "returned" last time,
> rather than always from the beginning (which would happen with lambdas).

The difference is that the coroutine can resume its work after
returning and the coroutine can maintain its state between returns.

Christian Gollwitzer

unread,

Aug 11, 2019, 4:19:30 PM8/11/19

to

Am 10.08.19 um 10:43 schrieb Alf P. Steinbach:

> On 10.08.2019 08:23, Christian Gollwitzer wrote:
>> and surprisingly, it is still doing breadth-first search. Transforming
>> this correctly means either a different stack management, or a special
>> algorithm invented by geniuses which can walks along the tree leafs-
>
> Iterative in-order isn't that hard.
>

By the algorithm "invented by geniuses" for this particular case I was
referring to Morris traversal, which walks a tree in in-order sequence
with O(1) space and time. See e.g.
https://www.geeksforgeeks.org/inorder-tree-traversal-without-recursion-and-without-stack/

It modifies the tree along its way, but restores it when you do the full
traversal. That would be the most efficient method, at the expense of
head-scratching and that you can't stop it in the middle without ruining
your data structure.

Christian

Juha Nieminen

unread,

Aug 11, 2019, 5:16:25 PM8/11/19

to

Christian Gollwitzer <auri...@gmx.de> wrote:
> Coroutine solution:
>
> void bftraverse(Tree* root) {
> co_yield(root->content);
> if (root->left) bftraverse(root->left);
> if (root->right) bftraverse(root->right);
> }

Since, as far as I understand, C++20 coroutines are stackless, I don't
understand how that can work. What happens when a coroutine calls itself
recursively?

Does it create a new set of local variables with operator new every time
it calls itself recursively? That sounds like it would be horrendously
inefficient in all kinds of ways.

Tim Rentsch

unread,

Aug 14, 2019, 11:26:13 AM8/14/19

to

Christian Gollwitzer <auri...@gmx.de> writes:

> Am 10.08.19 um 10:43 schrieb Alf P. Steinbach:
>
>> On 10.08.2019 08:23, Christian Gollwitzer wrote:
>>
>>> and surprisingly, it is still doing breadth-first
>>> search. Transforming this correctly means either a different stack
>>> management, or a special algorithm invented by geniuses which can
>>> walks along the tree leafs-
>>
>> Iterative in-order isn't that hard.
>
> By the algorithm "invented by geniuses" for this particular case I was
> referring to Morris traversal, which walks a tree in in-order sequence

> with O(1) space and time. [...]

Of course you meant to say O(1) space and O(n) time.