[boost] [Fibers] Performance

975 views
Skip to first unread message

Hartmut Kaiser

unread,
Jan 10, 2014, 8:43:49 PM1/10/14
to bo...@lists.boost.org
Oliver,

Do you have some performance data for your fiber implementation? What is the
(amortized) overhead introduced for one fiber (i.e. the average time
required to create, schedule, execute, and delete one fiber which runs an
empty function, when executing a larger number of those, perhaps 500.000
fibers)? It would be interesting to see this number when giving 1..N cores
to the scheduler.

Thanks!
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Oliver Kowalke

unread,
Jan 11, 2014, 4:31:30 AM1/11/14
to boost
2014/1/11 Hartmut Kaiser <hartmut...@gmail.com>

> Oliver,
>
> Do you have some performance data for your fiber implementation? What is
> the
> (amortized) overhead introduced for one fiber (i.e. the average time
> required to create, schedule, execute, and delete one fiber which runs an
> empty function, when executing a larger number of those, perhaps 500.000
> fibers)? It would be interesting to see this number when giving 1..N cores
> to the scheduler.
>

unfortunately I've no performance tests yet - maybe I'll write one after
some optimizations (like replacing the
stl containers by a single linked list of intrusive_ptr).

I'm not sure what a fiber should execute within such a test. should the
fiber-function have an empty body
(e.g. execute nothing)? or should it at least yield one time?
if the code executed by the fiber does nothing then the execution time will
be determined by the algorithm
for memory allocation of the clib. the context switches for resuming ans
suspending the fiber and the time
required to insert and remove the fiber from the ready-queue inside the the
fiber-scheduler.
this queue is currently a stl container and will be replaced by a
single-linked list of intrusive-ptrs.
a context switch (suspending/resuming a coroutine) needs ca. 80 CPU cycles
on Intel Core2 Q6700 (64bit Linux).

Andreas Schäfer

unread,
Jan 11, 2014, 2:11:40 PM1/11/14
to bo...@lists.boost.org
On 10:31 Sat 11 Jan , Oliver Kowalke wrote:
> unfortunately I've no performance tests yet - maybe I'll write one after
> some optimizations (like replacing the
> stl containers by a single linked list of intrusive_ptr).

My suggestion: write the performance tests first. There's no better
tool to drive code optimization.

Best
-Andreas


--
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
signature.asc

Hartmut Kaiser

unread,
Jan 11, 2014, 3:12:59 PM1/11/14
to bo...@lists.boost.org
Oliver,

> > Do you have some performance data for your fiber implementation? What
> > is the
> > (amortized) overhead introduced for one fiber (i.e. the average time
> > required to create, schedule, execute, and delete one fiber which runs
> > an empty function, when executing a larger number of those, perhaps
> > 500.000 fibers)? It would be interesting to see this number when
> > giving 1..N cores to the scheduler.
> >
>
> unfortunately I've no performance tests yet - maybe I'll write one after
> some optimizations (like replacing the stl containers by a single linked
> list of intrusive_ptr).

I'd write the test before starting to do optimizations.

> I'm not sure what a fiber should execute within such a test. should the
> fiber-function have an empty body (e.g. execute nothing)? or should it at
> least yield one time?

Well, that are two separate performance tests already :-P
However, having it yielding just adds two more context switches and a
scheduling cycle, thus I'd expect not too much additional insight from this.

While you're at it, I'd suggest to also write a test measuring the overhead
of using futures.

For an idea how such tests could look like, you might want to glance here:
https://github.com/STEllAR-GROUP/hpx/tree/master/tests/performance.

> if the code executed by the fiber does nothing then the execution time
> will be determined by the algorithm for memory allocation of the clib. the
> context switches for resuming ans suspending the fiber and the time
> required to insert and remove the fiber from the ready-queue inside the
> the fiber-scheduler.

That's assumptions you're having which are by no means conclusive. From our
experience with HPX (https://github.com/STEllAR-GROUP/hpx) the overheads for
a fiber (which is a hpx::thread in our case) are determined by many more
factors than just the memory allocator. Things like contention caused by the
work stealing or by NUMA effects such when you start stealing across NUMA
domains usually overshadow the memory allocation costs. Additionally, the
quality of the scheduler implementation affects things gravely.

> this queue is currently a stl container and will be replaced by a single-
> linked list of intrusive-ptrs.

If you had a performance test you'd immediately see whether this improves
your performance. Doing optimizations based on gut feelings are most of the
time not very effective, you need measurements to support your work.

> a context switch (suspending/resuming a coroutine) needs ca. 80 CPU cycles
> on Intel Core2 Q6700 (64bit Linux).

Sure, but this does not tell you how much time is consumed by executing
those. The actual execution time will be determined by many factors, such a
caching effects, TLB misses, memory bandwidth limitations and other
contention effects.

IMHO, for this library to be accepted, it has to prove to be of high quality
which implies best possible performance. You might want to compare the
performance of your library with other existing solutions (for instance TBB,
qthreads, openmp, HPX). The link I provided above will give you a set of
trivial tests for those. Moreover, we'd be happy to add an equivalent test
for your library to our repository.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Hartmut Kaiser

unread,
Jan 13, 2014, 8:37:52 AM1/13/14
to bo...@lists.boost.org

<snip>

> IMHO, for this library to be accepted, it has to prove to be of high
> quality which implies best possible performance. You might want to compare
> the performance of your library with other existing solutions (for
> instance TBB, qthreads, openmp, HPX). The link I provided above will give
> you a set of trivial tests for those. Moreover, we'd be happy to add an
> equivalent test for your library to our repository.

Ping? Any news? I'd make my vote depend on the outcome of the performance
tests.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Oliver Kowalke

unread,
Jan 13, 2014, 11:43:42 AM1/13/14
to boost
2014/1/13 Hartmut Kaiser <hartmut...@gmail.com>

> > IMHO, for this library to be accepted, it has to prove to be of high
> > quality which implies best possible performance. You might want to
> compare
> > the performance of your library with other existing solutions (for
> > instance TBB, qthreads, openmp, HPX). The link I provided above will give
> > you a set of trivial tests for those. Moreover, we'd be happy to add an
> > equivalent test for your library to our repository.
>
> Ping? Any news? I'd make my vote depend on the outcome of the performance
> tests.
>

I'll add performance tests but I doubt that I could finish to implement the
tests till Wednesday.
I've to look through the tests provided by the HPX-project, select some and
try to understand
what they are doing.

Brian Wood

unread,
Jan 13, 2014, 5:33:50 PM1/13/14
to bo...@lists.boost.org
Oliver Kowalke wrote:
2014/1/13 Hartmut Kaiser <hartmut...@gmail.com>

>>
>> Ping? Any news? I'd make my vote depend on the outcome of the performance
>> tests.
>>
>
> I'll add performance tests but I doubt that I could finish to implement
the
> tests till Wednesday.
> I've to look through the tests provided by the HPX-project, select some
and
> try to understand
> what they are doing.

Maybe a few weeks could be given to Oliver to do the testing.


--
Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

Oliver Kowalke

unread,
Jan 13, 2014, 10:51:48 PM1/13/14
to boost
2014/1/11 Hartmut Kaiser <hartmut...@gmail.com>

> It would be interesting to see this number when giving 1..N cores
> to the scheduler.



> Things like contention caused by the
> work stealing or by NUMA effects such when you start stealing across NUMA
> domains usually overshadow the memory allocation costs. Additionally, the
> quality of the scheduler implementation affects things gravely.
>


> You might want to compare the
> performance of your library with other existing solutions (for instance
> TBB,
> qthreads, openmp, HPX). The link I provided above will give you a set of
> trivial tests for those. Moreover, we'd be happy to add an equivalent test
> for your library to our repository.


after re-reading I have the the impression that there is a misunderstanding.
boost.fiber is a thin wrapper over coroutines (each fiber contains on
coroutine)
- the library schedules and synchronizes fibers (as requested on the
developer list
in 2013) in one thread.
the fibers in this lib are agnostic of threads - I've only added some
support that the
classes (mutex, condition_variable) could be used in a multi-threaded
context.
combining fibers with threads should be done in another, more sophisticated
library (at higher level).

I believe you can't and shouldn't compare fibers with qthreads, TBB or
openmp.
I'll write a test measuring the overhead of a fiber running in one thread
(as already described above) first.

Antony Polukhin

unread,
Jan 14, 2014, 2:05:16 AM1/14/14
to boost@lists.boost.org List
2014/1/14 Oliver Kowalke <oliver....@gmail.com>
<...>

> I believe you can't and shouldn't compare fibers with qthreads, TBB or
> openmp.
> I'll write a test measuring the overhead of a fiber running in one thread
> (as already described above) first.
>

How about comparing fiber construction and joining with thread construction
and joining? This will help the users to decide, is it beneficial to start
a new thread or to start a fiber.

A few ideas for tests:
* compare construction+join of a single thread and construction+join of
single fiber (empty functors in both cases)
* compare construction+join of a multiple threads and construction+join of
multiple fibers (empty functors in both cases)
* compare construction of a thread and construction of fiber (empty
functors in both cases)

Pseudocode:

void foo(){}
const unsigned N = 1000;

// Test #1
timer start
for (unsigned i = 0; i < N; ++i) {
fiber f(&foo);
f.join();
}

cout << "Fibers: " << timer stop;


timer start
for (unsigned i = 0; i < N; ++i) {
boost::thread f(&foo);
f.join();
}

cout << "Threads: " << timer stop;

// Test #2
timer start
for (unsigned i = 0; i < N; ++i) {
fiber f1(&foo), f2(&foo), f3(&foo), f4(&foo), f5(&foo);
f1.join(); f2.join(); f3.join(); f4.join(); f5.join();
}

cout << "Fibers: " << timer stop;

timer start
for (unsigned i = 0; i < N; ++i) {
boost::thread f1(&foo), f2(&foo), f3(&foo), f4(&foo), f5(&foo);
f1.join(); f2.join(); f3.join(); f4.join(); f5.join();
}

cout << "Threads: " << timer stop;

// Test #3
timer start
for (unsigned i = 0; i < N; ++i) {
fiber(&foo).detach();
}

cout << "Fibers: " << timer stop;

timer start
for (unsigned i = 0; i < N; ++i) {
boost::thread(&foo).detach();
}

cout << "Threads: " << timer stop;

--
Best regards,
Antony Polukhin

Oliver Kowalke

unread,
Jan 14, 2014, 6:10:48 AM1/14/14
to boost
2014/1/14 Antony Polukhin <anto...@gmail.com>

>
> How about comparing fiber construction and joining with thread construction
> and joining? This will help the users to decide, is it beneficial to start
> a new thread or to start a fiber.
>
> A few ideas for tests:
> * compare construction+join of a single thread and construction+join of
> single fiber (empty functors in both cases)
>

== compares the overhead of constructing between fiber and thread


> * compare construction+join of a multiple threads and construction+join of
> multiple fibers (empty functors in both cases)
> * compare construction of a thread and construction of fiber (empty
> functors in both cases)
>

I believe this is not a valid, because you compare the execution-time of N
fibers
running the test-function (concurrent but not parallel) in *one* thread
with the execution-time of N
threads (running parallel) while each single thread runs the test-function
once.

fibers do *not* introduce parallelism, e.g. using fibers does not gain
benefits of multi-core systems
at the first glance.

Of course you could combine threads and fibers but this is not the focus of
boost.fiber this should be done
by another library.

Oliver Kowalke

unread,
Jan 14, 2014, 6:41:31 AM1/14/14
to boost
2014/1/14 Antony Polukhin <anto...@gmail.com>
I did a quick hack and the code using fibers is 2-3 times faster than the
threads.
boost.fiber does not contain the suggested optimizations (like replacing
stl containers)

Antony Polukhin

unread,
Jan 14, 2014, 8:13:08 AM1/14/14
to boost@lists.boost.org List
2014/1/14 Oliver Kowalke <oliver....@gmail.com>

> 2014/1/14 Antony Polukhin <anto...@gmail.com>
> > * compare construction+join of a multiple threads and construction+join
> of
> > multiple fibers (empty functors in both cases)
> > * compare construction of a thread and construction of fiber (empty
> > functors in both cases)
> >
>
> I believe this is not a valid, because you compare the execution-time of N
> fibers
> running the test-function (concurrent but not parallel) in *one* thread
> with the execution-time of N
> threads (running parallel) while each single thread runs the test-function
> once.
>

Not exactly. Test function is *empty*, so you'll see the influence of
*additional* fiber/thread (overhead change beacuse of already spawned
fibers/threads).

In other words: Threads require synchronizations and OS context switches.
With growth of threads those overheads may grow. Fibers must be free from
such effects, however they can be less CPU cache friendly (in theory).

--
Best regards,
Antony Polukhin

Hartmut Kaiser

unread,
Jan 14, 2014, 9:03:13 AM1/14/14
to bo...@lists.boost.org
> > It would be interesting to see this number when giving 1..N cores to
> > the scheduler.
>
> > Things like contention caused by the
> > work stealing or by NUMA effects such when you start stealing across
> > NUMA domains usually overshadow the memory allocation costs.
> > Additionally, the quality of the scheduler implementation affects things
> gravely.
>
> > You might want to compare the
> > performance of your library with other existing solutions (for
> > instance TBB, qthreads, openmp, HPX). The link I provided above will
> > give you a set of trivial tests for those. Moreover, we'd be happy to
> > add an equivalent test for your library to our repository.
>
> after re-reading I have the the impression that there is a
> misunderstanding.

I hope not.

> boost.fiber is a thin wrapper over coroutines (each fiber contains on
> coroutine)
> - the library schedules and synchronizes fibers (as requested on the
> developer list in 2013) in one thread.
> the fibers in this lib are agnostic of threads - I've only added some
> support that the classes (mutex, condition_variable) could be used in a
> multi-threaded context.
> combining fibers with threads should be done in another, more
> sophisticated library (at higher level).
>
> I believe you can't and shouldn't compare fibers with qthreads, TBB or
> openmp.
> I'll write a test measuring the overhead of a fiber running in one thread
> (as already described above) first.

I beg to disagree. Surely, you run fibers on top of OS-threads (in your case
using the coroutines mechanism). However, every fiber is semantically
indistinguishable from a std::thread (if implemented properly). It has a
dedicated function to execute, it represents a context of execution, you can
synchronize it with other fibers, etc. In fact nothing in the C++ Standard
implies that a std::thread has to be implemented using OS (kernel) threads,
why we decided to name our lightweight tasks 'hpx::thread' which expose 100%
of the mandated interface for std::threads.

If you run on several cores (OS-threads), you start executing your fibers
concurrently. AFAIU, your library is clearly designed for this, otherwise
you wouldn't have implemented special, fiber-oriented synchronization
primitives or work stealing capabilities.

To clarify, I'm not talking about measuring the performance of (kernel)
threads, rather I would like for you to give us performance data for
Boost.Fiber so we can understand what are the overheads imposed by using
fibers in the first place.

The only way to not only get quantitative numbers which do not mean anything
beyond a single machine, I was suggesting to run equivalent performance
benchmarks using other, similar libraries, such a TBB, openmp, HPX, etc. as
this would allow to get a qualitative picture regardless of the machine the
tests are run on. And the libraries I listed clearly implement a
semantically equivalent idiom: lightweight parallelism (be it a task in TBB,
a fiber in Boost.Fiber, a hpx::thread, or a qthread, etc.).

Hope this clarifies what I had in mind.
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Hartmut Kaiser

unread,
Jan 14, 2014, 9:05:11 AM1/14/14
to bo...@lists.boost.org

> > How about comparing fiber construction and joining with thread
> > construction and joining? This will help the users to decide, is it
> > beneficial to start a new thread or to start a fiber.
> >
> > A few ideas for tests:
> > * compare construction+join of a single thread and construction+join
> > of single fiber (empty functors in both cases)
> >
>
> == compares the overhead of constructing between fiber and thread
>
>
> > * compare construction+join of a multiple threads and
> > construction+join of multiple fibers (empty functors in both cases)
> > * compare construction of a thread and construction of fiber (empty
> > functors in both cases)
> >
>
> I believe this is not a valid, because you compare the execution-time of N
> fibers running the test-function (concurrent but not parallel) in *one*
> thread with the execution-time of N threads (running parallel) while each
> single thread runs the test-function once.
>
> fibers do *not* introduce parallelism, e.g. using fibers does not gain
> benefits of multi-core systems at the first glance.
>
> Of course you could combine threads and fibers but this is not the focus
> of boost.fiber this should be done by another library.

If you constrain executing the std::threads to one core you'd get comparable
results. OTOH, if you allow to run the fibers concurrently on more than one
core you'd get comparable results again. I miss to understand why this
shouldn't be viable.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Hartmut Kaiser

unread,
Jan 14, 2014, 1:42:53 PM1/14/14
to bo...@lists.boost.org
I'd be disappointed if the overheads imposed by Boost.Fiber are only 2-3
times smaller than for kernel threads. I'd expect it to impose at least 10
times, if not 15-20 times less overheads than kernel threads (at least
that's the numbers we're seeing from HPX).

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Oliver Kowalke

unread,
Jan 14, 2014, 2:00:18 PM1/14/14
to boost
2014/1/14 Hartmut Kaiser <hartmut...@gmail.com>

> I'd be disappointed if the overheads imposed by Boost.Fiber are only 2-3
> times smaller than for kernel threads. I'd expect it to impose at least 10
> times, if not 15-20 times less overheads than kernel threads (at least
> that's the numbers we're seeing from HPX).
>

yes - I'm disappointed too. but, as I already explaned, I was focused on
boost.asio's async-result
and to provide a ways to synchronized coroutiens with an interface similar
to std::thread.

tuning seams to need more attention from my side - at first I've to
identify the bottlenecks.

Antony Polukhin

unread,
Jan 15, 2014, 5:44:38 AM1/15/14
to boost@lists.boost.org List
2014/1/14 Oliver Kowalke <oliver....@gmail.com>

> I did a quick hack and the code using fibers is 2-3 times faster than the
> threads.
> boost.fiber does not contain the suggested optimizations (like replacing
> stl containers)
>

Not bad! Did the tests run on Linux or Windows?

--
Best regards,
Antony Polukhin

Oliver Kowalke

unread,
Jan 15, 2014, 6:04:23 AM1/15/14
to boost
2014/1/15 Antony Polukhin <anto...@gmail.com>

> 2014/1/14 Oliver Kowalke <oliver....@gmail.com>
>
> > I did a quick hack and the code using fibers is 2-3 times faster than the
> > threads.
> > boost.fiber does not contain the suggested optimizations (like replacing
> > stl containers)
> >
>
> Not bad! Did the tests run on Linux or Windows?
>

I've tested it on Linux (32bit/64bit), Windows will follow this evening.
Because the code is not optimized (memory allocations in context/coroutine
and fiber) it will
not be competitive with qthreads, tbb, hpx yet.
But this was not the main aim - the lib tries to integrate with other boost
lib and to support
new use-cases (for instance code writen like its synchronous counterpart
but using asynchronous operations).

Niall Douglas

unread,
Jan 15, 2014, 6:36:06 AM1/15/14
to bo...@lists.boost.org
On 14 Jan 2014 at 12:42, Hartmut Kaiser wrote:

> > I did a quick hack and the code using fibers is 2-3 times faster than the
> > threads.
> > boost.fiber does not contain the suggested optimizations (like replacing
> > stl containers)
>
> I'd be disappointed if the overheads imposed by Boost.Fiber are only 2-3
> times smaller than for kernel threads. I'd expect it to impose at least 10
> times, if not 15-20 times less overheads than kernel threads (at least
> that's the numbers we're seeing from HPX).

Like any C++ probably Boost.Fiber makes many malloc calls per context
switch. It adds up. If I ever had a willing employer, I could get
clang to spit out far more malloc optimal C++ at the cost of a new
ABI, but I never could get an employer to bite.

I think coming within 50% of the performance of Windows Fibers would
be more than plenty. After all Boost.Fiber "does more" than Windows
Fibers.

Niall

--
Currently unemployed and looking for work.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/



Hartmut Kaiser

unread,
Jan 15, 2014, 8:18:58 AM1/15/14
to bo...@lists.boost.org

> > > I did a quick hack and the code using fibers is 2-3 times faster
> > > than the threads.
> > > boost.fiber does not contain the suggested optimizations (like
> > > replacing stl containers)
> >
> > I'd be disappointed if the overheads imposed by Boost.Fiber are only
> > 2-3 times smaller than for kernel threads. I'd expect it to impose at
> > least 10 times, if not 15-20 times less overheads than kernel threads
> > (at least that's the numbers we're seeing from HPX).
>
> Like any C++ probably Boost.Fiber makes many malloc calls per context
> switch. It adds up.

I don't think that things like a context switch require any memory
allocation. All you do is to flush the registers, flip the stack pointer,
and load the registers from the new stack.

> If I ever had a willing employer, I could get clang to
> spit out far more malloc optimal C++ at the cost of a new ABI, but I never
> could get an employer to bite.

Sorry for sidestepping, are you sure compilers do memory allocation as part
of their way to conform to ABI's? I was always assuming memory allocation is
done only when explicitly requested by user code.

> I think coming within 50% of the performance of Windows Fibers would be
> more than plenty. After all Boost.Fiber "does more" than Windows Fibers.

It might be sufficient for you but not for everybody else. It wouldn't be
sufficient for us, for instance. If you build systems relying on fine grain
parallelism, then efficiently implemented fibers are the only way to go. If
you need to create billions of threads (fibers), then every microsecond of
overhead counts billion-fold.

Oliver Kowalke

unread,
Jan 15, 2014, 8:25:47 AM1/15/14
to boost
2014/1/15 Hartmut Kaiser <hartmut...@gmail.com>

> > Like any C++ probably Boost.Fiber makes many malloc calls per context
> > switch. It adds up.
>
> I don't think that things like a context switch require any memory
> allocation. All you do is to flush the registers, flip the stack pointer,
> and load the registers from the new stack.
>

the context switch itself does not required memory allocation but the
function/functor
to be executed in the fiber must be stored (type erased) inside the fiber.
the current implementation of fiber does allocate internally an object
holding the fiber-function.
I think it is possible to store the function/functor on top of the stack
used by the fiber and thus prevent
the need for memory allocation to hold the function/functor.

Hartmut Kaiser

unread,
Jan 15, 2014, 8:26:27 AM1/15/14
to bo...@lists.boost.org

> >> Ping? Any news? I'd make my vote depend on the outcome of the
> >> performance tests.
> >>
> >
> > I'll add performance tests but I doubt that I could finish to
> > implement
> the
> > tests till Wednesday.
> > I've to look through the tests provided by the HPX-project, select
> > some
> and
> > try to understand
> > what they are doing.
>
> Maybe a few weeks could be given to Oliver to do the testing.

Sure, but I have to cast my vote today and not a couple of weeks down the
road...

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Niall Douglas

unread,
Jan 15, 2014, 9:19:24 AM1/15/14
to bo...@lists.boost.org
On 15 Jan 2014 at 7:18, Hartmut Kaiser wrote:

> > Like any C++ probably Boost.Fiber makes many malloc calls per context
> > switch. It adds up.
>
> I don't think that things like a context switch require any memory
> allocation. All you do is to flush the registers, flip the stack pointer,
> and load the registers from the new stack.

A fiber implementation would also need to maintain work and sleep
queues. They're all STL containers at present.

> > I think coming within 50% of the performance of Windows Fibers would be
> > more than plenty. After all Boost.Fiber "does more" than Windows Fibers.
>
> It might be sufficient for you but not for everybody else. It wouldn't be
> sufficient for us, for instance. If you build systems relying on fine grain
> parallelism, then efficiently implemented fibers are the only way to go. If
> you need to create billions of threads (fibers), then every microsecond of
> overhead counts billion-fold.

Firstly I think you underestimate how quick Windows Fibers are - they
have been highly tuned to win SQL Server benchmarks. Secondly,
Boost.Fiber does a ton load more work than Windows Fibers, so no one
can reasonably expect it to be as quick.

> > If I ever had a willing employer, I could get clang to
> > spit out far more malloc optimal C++ at the cost of a new ABI, but I never
> > could get an employer to bite.
>
> Sorry for sidestepping, are you sure compilers do memory allocation as part
> of their way to conform to ABI's? I was always assuming memory allocation is
> done only when explicitly requested by user code.

This is very off topic for this mailing list. However, one of the
projects I proposed at BlackBerry before I was removed was to solve
the substantial Qt allocation overhead because of PIMPL by getting
clang to replace much use of individual operator new's for temporary
objects with a single alloca() at the base of the call stack. This
broke ABI because you need to generate an additional copy of every
constructor, one which uses the new purely stack based allocation
mechanism for temporary dynamic memory allocations (also, we'd need
to spit out additional metadata to help the link and LTCG layer
assemble the right code). Anyway the idea was deemed too weird to see
any business case, and then of course I was eliminated shortly
thereafter anyway. I should mention that this idea was one of mine
long before joining BlackBerry, and therefore nothing proprietary is
being leaked.

Hartmut Kaiser

unread,
Jan 15, 2014, 4:02:53 PM1/15/14
to bo...@lists.boost.org

> > > Like any C++ probably Boost.Fiber makes many malloc calls per
> > > context switch. It adds up.
> >
> > I don't think that things like a context switch require any memory
> > allocation. All you do is to flush the registers, flip the stack
> > pointer, and load the registers from the new stack.
>
> A fiber implementation would also need to maintain work and sleep queues.
> They're all STL containers at present.

I can see that. You explicitly referred to the context switch, thus my
request for clarification.

> > > I think coming within 50% of the performance of Windows Fibers would
> > > be more than plenty. After all Boost.Fiber "does more" than Windows
> Fibers.
> >
> > It might be sufficient for you but not for everybody else. It wouldn't
> > be sufficient for us, for instance. If you build systems relying on
> > fine grain parallelism, then efficiently implemented fibers are the
> > only way to go. If you need to create billions of threads (fibers),
> > then every microsecond of overhead counts billion-fold.
>
> Firstly I think you underestimate how quick Windows Fibers are - they have
> been highly tuned to win SQL Server benchmarks. Secondly, Boost.Fiber does
> a ton load more work than Windows Fibers, so no one can reasonably expect
> it to be as quick.

Whatever the speed of Boost.Fiber, all I would like to see is a measure of
its imposed overheads which would allow everybody to decide whether the
implementation is sufficiently performing for a particular use case. That's
what I was asking for in the very beginning. At the same time, our own
implementation in HPX (on the Windows platform) is using Windows Fibers for
our lightweight thread implementation, so I perfectly understand what's
their imposed overheads are.

I also understand that Boost.Fiber does more than the Windows Fibers which
are used just for the underlying context switch operation. Still, my main
incentive for voting YES to this review and for considering using this
library as a replacement for HPX's thread implementation would be if it had
superior performance. This is even more true as I know (and have evidence)
that it is possible to come close to the Windows Fibers performance for
lightweight threads exposing the same API as std::thread does (see HPX).

IMHO, Boost.Fiber is a library which - unlike other Boost libraries - has
not been developed as a prototype for a particular API (in which case I'd be
all for accepting subpar performance). It clearly has been developed to
provide a higher performing implementation for an existing API. That means
that if Oliver is not able to demonstrate superior performance over existing
implementations, I wouldn't see any point in having the library in Boost in
the first place.

> > > If I ever had a willing employer, I could get clang to spit out far
> > > more malloc optimal C++ at the cost of a new ABI, but I never could
> > > get an employer to bite.
> >
> > Sorry for sidestepping, are you sure compilers do memory allocation as
> > part of their way to conform to ABI's? I was always assuming memory
> > allocation is done only when explicitly requested by user code.
>
> This is very off topic for this mailing list. However, one of the projects
> I proposed at BlackBerry before I was removed was to solve the substantial
> Qt allocation overhead because of PIMPL by getting clang to replace much
> use of individual operator new's for temporary objects with a single
> alloca() at the base of the call stack. This broke ABI because you need to
> generate an additional copy of every constructor, one which uses the new
> purely stack based allocation mechanism for temporary dynamic memory
> allocations (also, we'd need to spit out additional metadata to help the
> link and LTCG layer assemble the right code). Anyway the idea was deemed
> too weird to see any business case, and then of course I was eliminated
> shortly thereafter anyway. I should mention that this idea was one of mine
> long before joining BlackBerry, and therefore nothing proprietary is being
> leaked.

Thanks for this explanation.

Oliver Kowalke

unread,
Jan 15, 2014, 4:13:30 PM1/15/14
to boost
2014/1/15 Hartmut Kaiser <hartmut...@gmail.com>

> IMHO, Boost.Fiber is a library which - unlike other Boost libraries - has
> not been developed as a prototype for a particular API (in which case I'd
> be
> all for accepting subpar performance). It clearly has been developed to
> provide a higher performing implementation for an existing API. That means
> that if Oliver is not able to demonstrate superior performance over
> existing
> implementations, I wouldn't see any point in having the library in Boost in
> the first place.
>

As I explained several times in this review - boost.fiber aims to provide a
way
to synchronize/coordinate coroutines as it was requested on the dev-list
some months ago.
-> boost.asio

Hartmut Kaiser

unread,
Jan 15, 2014, 4:31:00 PM1/15/14
to bo...@lists.boost.org

> > IMHO, Boost.Fiber is a library which - unlike other Boost libraries -
> > has not been developed as a prototype for a particular API (in which
> > case I'd be all for accepting subpar performance). It clearly has been
> > developed to provide a higher performing implementation for an
> > existing API. That means that if Oliver is not able to demonstrate
> > superior performance over existing implementations, I wouldn't see any
> > point in having the library in Boost in the first place.
>
> As I explained several times in this review - boost.fiber aims to provide
> a way to synchronize/coordinate coroutines as it was requested on the dev-
> list some months ago.
> -> boost.asio

In that case you might be surprised to learn that libraries often have a
life of their own which opens up unexpected opportunities way beyond
whatever you might have imagined. IHMO it is a mistake to constrain
Boost.Fiber to just what you said as this is only a minor use case (as
convenient as it might be) for such a library.

Threading with minimal overheads supporting fine grain parallelism is the
future. Building convenient means for managing that parallelism is the
future. Application scalability, which today might be just a problem for
high end computing problems, gains footprint in everyday computing at an
exceptionally high rate. Two years from now, my desktop will support 288
concurrent threads (Intel Knight's Landing [1]). Massive multi-threading is
here to stay.

At the same time, application scalability is limited by the 4 horsemen of
the apocalypse [2]: Starvation, Latencies, Overheads, and Waiting for
contention resolution (SLOW). IOW, minimizing overheads is one of the
critical pieces of the puzzle. Libraries such as Boost.Fiber are critical to
solve the problems of insufficient scalability and parallel efficiency
achievable by using existing technologies only.

Wake up Oliver - you're up to the forefront of parallel computing and you
don't realize it!

Boost has to define the future of C++ libraries (as imposed on us by the
computer architectures to come), it has not to focus on covering the past.
Let's not let this opportunity slip.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu


[1]
http://www.extremetech.com/extreme/171678-intel-unveils-72-core-x86-knights-
landing-cpu-for-exascale-supercomputing
[2] http://stellar.cct.lsu.edu/2012/01/is-the-free-lunch-over-really/

Nat Goodspeed

unread,
Jan 15, 2014, 4:31:20 PM1/15/14
to bo...@lists.boost.org
On Wed, Jan 15, 2014 at 4:02 PM, Hartmut Kaiser
<hartmut...@gmail.com> wrote:

> IMHO, Boost.Fiber is a library which - unlike other Boost libraries - has
> not been developed as a prototype for a particular API (in which case I'd be
> all for accepting subpar performance). It clearly has been developed to
> provide a higher performing implementation for an existing API. That means
> that if Oliver is not able to demonstrate superior performance over existing
> implementations, I wouldn't see any point in having the library in Boost in
> the first place.

Strongly disagree with your assumption. To me it's the semantics of
Boost.Fiber that matter.

Before launching any code on a new thread, both Boost.Thread and
std::thread require that you must sanitize that code against potential
race conditions. With a large, ancient code base, that sanitizing
effort becomes almost prohibitive. Running one thread with multiple
fibers is guaranteed to introduce no new race conditions.

Emulating the std::thread API is intended to minimize coder confusion
-- not to provide a drop-in replacement.

Nat Goodspeed

unread,
Jan 15, 2014, 4:34:55 PM1/15/14
to bo...@lists.boost.org
On Wed, Jan 15, 2014 at 4:31 PM, Hartmut Kaiser
<hartmut...@gmail.com> wrote:

> Wake up Oliver - you're up to the forefront of parallel computing and you
> don't realize it!
>
> Boost has to define the future of C++ libraries (as imposed on us by the
> computer architectures to come), it has not to focus on covering the past.
> Let's not let this opportunity slip.

I can absolutely agree with this. My divergence is with your previous
remark that Boost.Fiber can *only* be justified by its performance.

Boost.Fiber has *potential future* performance benefits. It has
*present* semantic benefits.

Hartmut Kaiser

unread,
Jan 15, 2014, 6:03:13 PM1/15/14
to bo...@lists.boost.org

> > IMHO, Boost.Fiber is a library which - unlike other Boost libraries -
> > has not been developed as a prototype for a particular API (in which
> > case I'd be all for accepting subpar performance). It clearly has been
> > developed to provide a higher performing implementation for an
> > existing API. That means that if Oliver is not able to demonstrate
> > superior performance over existing implementations, I wouldn't see any
> > point in having the library in Boost in the first place.
>
> Strongly disagree with your assumption. To me it's the semantics of
> Boost.Fiber that matter.

The semantics are well defined by the C++11 Standard, no news here.

> Before launching any code on a new thread, both Boost.Thread and
> std::thread require that you must sanitize that code against potential
> race conditions. With a large, ancient code base, that sanitizing effort
> becomes almost prohibitive. Running one thread with multiple fibers is
> guaranteed to introduce no new race conditions.

If that's the case, then why does Boost.Fiber provide synchronization
primitives to synchronize between fibers and support for work stealing? Or
why is it implemented using atimics all over the place? Your statement does
not make sense to me, sorry.

> Emulating the std::thread API is intended to minimize coder confusion
> -- not to provide a drop-in replacement.

That's only one particular use case. Clearly we're talking about two
different viewpoints where mine focusses on a very broad application for
this kind of library, while yours is trying to limit it to a small set of
use cases. While I accept the validity of these use cases I think it's not
worth having a full Boost library just for those.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Nat Goodspeed

unread,
Jan 15, 2014, 6:21:54 PM1/15/14
to bo...@lists.boost.org
On Wed, Jan 15, 2014 at 6:03 PM, Hartmut Kaiser
<hartmut...@gmail.com> wrote:

>> Running one thread with multiple fibers is
>> guaranteed to introduce no new race conditions.

> If that's the case, then why does Boost.Fiber provide synchronization
> primitives to synchronize between fibers and support for work stealing? Or
> why is it implemented using atimics all over the place? Your statement does
> not make sense to me, sorry.

Because the library is more general than either of our individual use
cases. It addresses the scenario in which you need to coordinate a
fiber in one thread with a fiber (perhaps the one and only fiber)
running in another thread.

>> Emulating the std::thread API is intended to minimize coder confusion
>> -- not to provide a drop-in replacement.

> That's only one particular use case. Clearly we're talking about two
> different viewpoints where mine focusses on a very broad application for
> this kind of library, while yours is trying to limit it to a small set of
> use cases. While I accept the validity of these use cases I think it's not
> worth having a full Boost library just for those.

Hmm! With respect, it sounds to me as though you're saying: my use
case is important, yours is not.

You're saying that for your use case, performance is critical, and
performance would be the only reason you would choose Boost.Fiber over
other libraries available to you. I can respect that.

I'm saying that for my use case, performance is not critical, and
there is nothing in the standard library or presently in Boost that
addresses my semantic requirements. That sounds like reason enough to
have a Boost library.

Hartmut Kaiser

unread,
Jan 15, 2014, 6:29:08 PM1/15/14
to bo...@lists.boost.org
> > That's only one particular use case. Clearly we're talking about two
> > different viewpoints where mine focusses on a very broad application
> > for this kind of library, while yours is trying to limit it to a small
> > set of use cases. While I accept the validity of these use cases I
> > think it's not worth having a full Boost library just for those.
>
> Hmm! With respect, it sounds to me as though you're saying: my use case is
> important, yours is not.

Please don't start splitting hairs. I said 'While I accept the validity of
these use cases' after all.

> You're saying that for your use case, performance is critical, and
> performance would be the only reason you would choose Boost.Fiber over
> other libraries available to you. I can respect that.
>
> I'm saying that for my use case, performance is not critical, and there is
> nothing in the standard library or presently in Boost that addresses my
> semantic requirements. That sounds like reason enough to have a Boost
> library.

Fine. So you vote YES and I vote NO. What's the problem?

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Nat Goodspeed

unread,
Jan 15, 2014, 6:34:01 PM1/15/14
to bo...@lists.boost.org
On Wed, Jan 15, 2014 at 6:29 PM, Hartmut Kaiser
<hartmut...@gmail.com> wrote:

> Please don't start splitting hairs. I said 'While I accept the validity of
> these use cases' after all.

I apologize. I do not wish to be irritating.

james

unread,
Jan 16, 2014, 1:39:55 AM1/16/14
to bo...@lists.boost.org, Hartmut Kaiser
On 15/01/2014 23:29, Hartmut Kaiser wrote:
> Fine. So you vote YES and I vote NO. What's the problem?
It depends on whether the vote is just selfish, or reflects that:
- the presence of an additional not-quite-what-you-need doesn't
negatively impact you
- it is valuable for others and technically reasonable

I would hope that votes are not entirely selfish.

Under the circumstances I would think you could abstain but
unless there is some reason for its presence to cause a problem,
why would you Nack?

Oliver Kowalke

unread,
Jan 16, 2014, 2:26:07 AM1/16/14
to boost
2014/1/15 Hartmut Kaiser <hartmut...@gmail.com>

> Wake up Oliver - you're up to the forefront of parallel computing and you
> don't realize it!
>

why do you allege that?

for me semantics and usability of boost.fiber is more important than
performance.
performance tuning is mostly an issue of implementation details and can be
done
after the API is stable and proven.
but this might not to apply to all developers - you are free to choose the
tool of your preference.

Hartmut Kaiser

unread,
Jan 16, 2014, 7:19:16 AM1/16/14
to bo...@lists.boost.org
> > Wake up Oliver - you're up to the forefront of parallel computing and
> > you don't realize it!
>
> why do you allege that?

I was hoping to raise your awareness that you're up to something much bigger
than 'just' fibers. I didn't mean to offend and I apologize if I did.

> for me semantics and usability of boost.fiber is more important than
> performance.
> performance tuning is mostly an issue of implementation details and can be
> done after the API is stable and proven.

I still don't get it. There is no API stability question. The API is well
defined for over 2 years now in the C++11 Standard (and even longer in
Boost.Thread). So performance is the main incentive for such a library (what
could there be else?). If you don't need the extra performance - use
std::thread.

Boost.Fiber does not add any new semantics beyond what the Standard
mandates. Instead, it adds more constraints to the context where the API can
be used (somebody mentioned interaction with Asio, and single-threaded
legacy applications) - thus it narrows down existing semantics.

> but this might not to apply to all developers - you are free to choose the
> tool of your preference.

Sure, that's out of question. My concern is that we're about to add a Boost
library targeting some minor use cases only, while it has the potential to
change the way we do parallel computing.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Hartmut Kaiser

unread,
Jan 16, 2014, 7:26:59 AM1/16/14
to bo...@lists.boost.org

> On 15/01/2014 23:29, Hartmut Kaiser wrote:
> > Fine. So you vote YES and I vote NO. What's the problem?
> It depends on whether the vote is just selfish, or reflects that:
> - the presence of an additional not-quite-what-you-need doesn't
> negatively impact you
> - it is valuable for others and technically reasonable
>
> I would hope that votes are not entirely selfish.
>
> Under the circumstances I would think you could abstain but unless there
> is some reason for its presence to cause a problem, why would you Nack?

With all due respect, I'm contributing to Boost for over 10 years now, and I
have 4 major libraries in Boost I'm authoring/contributing too. I have
managed numerous Boost reviews in the past. I'm a member of the Boost
steering committee, and I'm one of the main organizers of BoostCon and
C++Now. I have invested more time into Boost than you can even start to
imagine. Nobody has so far managed to allege I could be selfish with regards
to Boost. I'm very much inclined to ask you rethink what you wrote.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Oliver Kowalke

unread,
Jan 16, 2014, 7:35:07 AM1/16/14
to boost
2014/1/16 Hartmut Kaiser <hartmut...@gmail.com>

> I still don't get it. There is no API stability question. The API is well
> defined for over 2 years now in the C++11 Standard (and even longer in
> Boost.Thread).


I could have choosen a different API for fibers - but I think the
developers are
more familiar with std::thread/boost::thread API.


> So performance is the main incentive for such a library (what
> could there be else?).


with fibers you can suspend your execution context while keep the thread
running (might execute something else). this is not possible with threads
if they are suspend (yield(), waiting on mutex/condition_variable).

this feature of fiber enables you to write (again asio eve nif you don't
care this
use case).

for (;;) {
...
boost::asio::async_read( socket_, buffer, yield[ec]);
...
}

async_write() suspends the current execution context (not the thread
itself) and
resumes it if all data have been read. without fibers you can't write the
code like
above (for-loop for instance).
in the thread itself you can have more than one fibers running such
for-loops.

with threads you would have to pass a callback to async_read() and you
could not
invoke it inside a for-loop.

the example directory of boost.fiber contains several asio examples
demonstrating
this feature.


> If you don't need the extra performance - use
> std::thread.
>

I could have choosen a different API.


> Boost.Fiber does not add any new semantics beyond what the Standard
> mandates.


it adds 'suspend/resume' a execution context while the hosting thread is
not suspended.


> Instead, it adds more constraints to the context where the API can
> be used (somebody mentioned interaction with Asio, and single-threaded
> legacy applications) - thus it narrows down existing semantics.
>

I think this statement is false.


> Sure, that's out of question. My concern is that we're about to add a Boost
> library targeting some minor use cases only, while it has the potential to
> change the way we do parallel computing.
>

be sure that I've performance on my target after this discussions!
I've already started to write code for performance-measurements.

Giovanni Piero Deretta

unread,
Jan 16, 2014, 7:51:48 AM1/16/14
to bo...@lists.boost.org
[sorry for joining the discussion so late]

On Thu, Jan 16, 2014 at 12:35 PM, Oliver Kowalke
<oliver....@gmail.com>wrote:
I think that Harmut point is that you can very well use threads for the
same thing. In this particular case you would just perform a syncronous
read. Yes, to mantain the same level of concurrency you need to spawn ten
of thousands of threads, but that's feasible on a modern os/hardware pair.
The point of using fibers (i.e. M:N threading) is almost purely
performance.

-- gpd

Dean Michael Berris

unread,
Jan 16, 2014, 2:29:29 AM1/16/14
to bo...@lists.boost.org
On Thu, Jan 16, 2014 at 6:26 PM, Oliver Kowalke
<oliver....@gmail.com> wrote:
> 2014/1/15 Hartmut Kaiser <hartmut...@gmail.com>
>
>> Wake up Oliver - you're up to the forefront of parallel computing and you
>> don't realize it!
>>
>
> why do you allege that?
>
> for me semantics and usability of boost.fiber is more important than
> performance.
> performance tuning is mostly an issue of implementation details and can be
> done
> after the API is stable and proven.
> but this might not to apply to all developers - you are free to choose the
> tool of your preference.
>

I realize I'm jumping in on a conversation that I'm not involved in
but as mostly a user of Boost libraries, I for one would encourage you
to think about the users and their needs more than what you the
developer think is important. This is one critical piece of feedback
you're getting, and I seriously hope you consider prioritizing this in
your continued development of Boost.Fiber.

Oliver Kowalke

unread,
Jan 16, 2014, 8:11:50 AM1/16/14
to boost
014/1/16 Dean Michael Berris <dbe...@google.com>

> I realize I'm jumping in on a conversation that I'm not involved in
> but as mostly a user of Boost libraries, I for one would encourage you
> to think about the users and their needs more than what you the
> developer think is important. This is one critical piece of feedback
> you're getting, and I seriously hope you consider prioritizing this in
> your continued development of Boost.Fiber.
>

Did I say that I'll ignore Hartmuts concerns?
For Hartmut only speed matters and I've agreed that I'll address
this issue.

Hartmut Kaiser

unread,
Jan 16, 2014, 8:22:28 AM1/16/14
to bo...@lists.boost.org

> > I still don't get it. There is no API stability question. The API is
> > well defined for over 2 years now in the C++11 Standard (and even
> > longer in Boost.Thread).
>
> I could have choosen a different API for fibers - but I think the
> developers are more familiar with std::thread/boost::thread API.

But you have not (and for a good reason!). So this argument is moot.

> > So performance is the main incentive for such a library (what could
> > there be else?).
>
> with fibers you can suspend your execution context while keep the thread
> running (might execute something else). this is not possible with threads
> if they are suspend (yield(), waiting on mutex/condition_variable).
>
> this feature of fiber enables you to write (again asio eve nif you don't
> care this use case).
>
> for (;;) {
> ...
> boost::asio::async_read( socket_, buffer, yield[ec]);
> ...
> }
>
> async_write() suspends the current execution context (not the thread
> itself) and
> resumes it if all data have been read. without fibers you can't write the
> code like above (for-loop for instance).
> in the thread itself you can have more than one fibers running such for-
> loops.
>
> with threads you would have to pass a callback to async_read() and you
> could not invoke it inside a for-loop.
>
> the example directory of boost.fiber contains several asio examples
> demonstrating this feature.

The only benefit you're getting from using fibers for this (and you could
achieve the same semantics using plain ol'threads as well, Boost.Asio is
doing it for years after all) is - now guess - performance. So please make
up your mind. Are you trying to improve performance or what?

> > If you don't need the extra performance - use std::thread.
> >
>
> I could have choosen a different API.

As said, you didn't. So this is moot.

> > Boost.Fiber does not add any new semantics beyond what the Standard
> > mandates.
>
> it adds 'suspend/resume' a execution context while the hosting thread is
> not suspended.

Who cares about this if not for performance reasons. However I still
believe, that Fibers _are_ threads. You can't do anything with them you
couldn't do with std::thread directly (semantically).

> > Instead, it adds more constraints to the context where the API can be
> > used (somebody mentioned interaction with Asio, and single-threaded
> > legacy applications) - thus it narrows down existing semantics.
>
> I think this statement is false.

Care to elaborate? Why is this false? You're implementing the Standards API
with Standards semantics and constrain the usability of the outcome to a
couple of minor use cases. Sorry, still no new semantics.

> > Sure, that's out of question. My concern is that we're about to add a
> > Boost library targeting some minor use cases only, while it has the
> > potential to change the way we do parallel computing.
>
> be sure that I've performance on my target after this discussions!
> I've already started to write code for performance-measurements.

Performance is never an after-thought. Your implementation quality so far is
not up to Boost standards (as others have pointed out), the performance of
the library is not convincing either. I'd suggest you withdraw your
submission at this point, rework the library and try for another review once
that's achieved. We have more substandard code in Boost already than
necessary because of this 'let's fix it later' attitude. This 'later' never
happens, most of the time - sadly.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Daniel James

unread,
Jan 16, 2014, 8:27:14 AM1/16/14
to bo...@lists.boost.org
On 16 January 2014 07:29, Dean Michael Berris <dbe...@google.com> wrote:
>
> I realize I'm jumping in on a conversation that I'm not involved in
> but as mostly a user of Boost libraries, I for one would encourage you
> to think about the users and their needs more than what you the
> developer think is important. This is one critical piece of feedback
> you're getting, and I seriously hope you consider prioritizing this in
> your continued development of Boost.Fiber.

It isn't really user feedback, but feedback from a developer of
similar functionality, which is something very different.
Pedantically, the developer could also be a user of the library, but
their main point of view is as a developer of such functionality, and
their opinions are influenced by that. If they've put a lot of effort
into something, then it's likely that they will overvalue it. Feedback
from other developers is of course extremely useful, but the
difference should be appreciated.

Nat Goodspeed

unread,
Jan 16, 2014, 8:27:42 AM1/16/14
to bo...@lists.boost.org
On Thu, Jan 16, 2014 at 7:51 AM, Giovanni Piero Deretta
<gpde...@gmail.com> wrote:

> I think that Harmut point is that you can very well use threads for the
> same thing. ...
> The point of using fibers (i.e. M:N threading) is almost purely
> performance.

Again, for a large class of use cases, fibers and threads are not the same.

Writing thread-safe code remains something of an art, a specialty
within the already-rarefied realm of good C++ coding. With care, code
review and testing, it is of course possible to produce good
thread-safe code when you are writing it from scratch.

But retrofitting existing single-threaded code to be thread-safe can
be extremely costly. At this moment in history, we have a very large
volume of existing code whose developers (perhaps unconsciously)
relied on having exclusive access to certain in-process resources.
Some of us do not have the option to discard it and rewrite from
scratch.

Yes, this is a subset of the possible use cases of the Fiber library.
It is an important subset because threads provide no equivalent.

Yes, I also want a Boost library that will concurrently process very
large numbers of tasks, with each of a number of threads running very
many fibers. I think the Fiber library gives us a foundation on which
to build that support. But even with its present feature set, with
Oliver responding to the community, it has great value. I feel
frustrated when people dismiss the very real benefit of cooperative
context switching as irrelevant to them.

Hartmut Kaiser

unread,
Jan 16, 2014, 8:28:10 AM1/16/14
to bo...@lists.boost.org

> > I realize I'm jumping in on a conversation that I'm not involved in
> > but as mostly a user of Boost libraries, I for one would encourage you
> > to think about the users and their needs more than what you the
> > developer think is important. This is one critical piece of feedback
> > you're getting, and I seriously hope you consider prioritizing this in
> > your continued development of Boost.Fiber.

Thanks Michael!

> Did I say that I'll ignore Hartmuts concerns?
> For Hartmut only speed matters and I've agreed that I'll address this
> issue.

I've never said that it's only speed that matters. I said that because the
API is set there is only performance which could be used as a criteria to
decide whether your library is a worthy addition to Boost (besides
implementation quality, which is sub-standard as others have pointed out).

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Oliver Kowalke

unread,
Jan 16, 2014, 8:33:09 AM1/16/14
to boost
2014/1/16 Giovanni Piero Deretta <gpde...@gmail.com>

> I think that Harmut point is that you can very well use threads for the
> same thing. In this particular case you would just perform a syncronous
> read. Yes, to mantain the same level of concurrency you need to spawn ten
> of thousands of threads, but that's feasible on a modern os/hardware pair.
> The point of using fibers (i.e. M:N threading) is almost purely
> performance.
>

In the context of C10K problem and using the one-thread-per-client pattern
I doubt
that this would scale (even on modern hardware). Do you have some data
showing
the performance of an modern operating system and hardware by increasing
thread count?

Hartmut Kaiser

unread,
Jan 16, 2014, 8:41:58 AM1/16/14
to bo...@lists.boost.org

> > I think that Harmut point is that you can very well use threads for
> > the same thing. ...
> > The point of using fibers (i.e. M:N threading) is almost purely
> > performance.
>
> Again, for a large class of use cases, fibers and threads are not the
> same.
>
> Writing thread-safe code remains something of an art, a specialty within
> the already-rarefied realm of good C++ coding. With care, code review and
> testing, it is of course possible to produce good thread-safe code when
> you are writing it from scratch.
>
> But retrofitting existing single-threaded code to be thread-safe can be
> extremely costly. At this moment in history, we have a very large volume
> of existing code whose developers (perhaps unconsciously) relied on having
> exclusive access to certain in-process resources.
> Some of us do not have the option to discard it and rewrite from scratch.
>
> Yes, this is a subset of the possible use cases of the Fiber library.
> It is an important subset because threads provide no equivalent.

If the main target of Boost.Fiber is this use case (support
'multi-threading' in single threaded applications), then the way it's
implemented does not make sense to me. Why would you need a single atomic if
all you have is a single thread? And the source code has atomics all over
the place - thus I gather this use case was not what Oliver had in mind.

> Yes, I also want a Boost library that will concurrently process very large
> numbers of tasks, with each of a number of threads running very many
> fibers. I think the Fiber library gives us a foundation on which to build
> that support. But even with its present feature set, with Oliver
> responding to the community, it has great value. I feel frustrated when
> people dismiss the very real benefit of cooperative context switching as
> irrelevant to them.

Why accept a library which is over-engineered for the advertised use case
(see above) and not (yet) fit for the broader one?

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Hartmut Kaiser

unread,
Jan 16, 2014, 8:42:05 AM1/16/14
to bo...@lists.boost.org

> > I realize I'm jumping in on a conversation that I'm not involved in
> > but as mostly a user of Boost libraries, I for one would encourage you
> > to think about the users and their needs more than what you the
> > developer think is important. This is one critical piece of feedback
> > you're getting, and I seriously hope you consider prioritizing this in
> > your continued development of Boost.Fiber.
>
> It isn't really user feedback, but feedback from a developer of similar
> functionality, which is something very different.

Yes, it guarantees that the viewpoint expressed by that developer can be
assumed to be well educated as that developer understands the issues
perfectly, probably much better than any user could.

> Pedantically, the developer could also be a user of the library, but their
> main point of view is as a developer of such functionality, and their
> opinions are influenced by that. If they've put a lot of effort into
> something, then it's likely that they will overvalue it. Feedback from
> other developers is of course extremely useful, but the difference should
> be appreciated.

Why is it that there is again this 'selfishness' being silently alleged. Do
you imply that just because I claim to have a better understanding of the
applicability of a particular idiom/library then many others (because I'm
working in the field for years) my opinion is too biased to be considered
useful?

However, I have to admit that I very much would like to replace some of our
code with an external library to lessen our maintenance burden. But alas, as
it seems it will not happen this time.

But I'm tired of this pointless discussion. I'm outa here.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Niall Douglas

unread,
Jan 16, 2014, 8:42:27 AM1/16/14
to bo...@lists.boost.org
On 16 Jan 2014 at 7:22, Hartmut Kaiser wrote:

> Performance is never an after-thought. Your implementation quality so far is
> not up to Boost standards (as others have pointed out), the performance of
> the library is not convincing either. I'd suggest you withdraw your
> submission at this point, rework the library and try for another review once
> that's achieved. We have more substandard code in Boost already than
> necessary because of this 'let's fix it later' attitude. This 'later' never
> happens, most of the time - sadly.

I think it isn't unreasonable for a library to enter Boost if it has
good performance *scaling* to load (e.g. O(log N)), even if
performance in the absolute or comparative-to-near-alternatives sense
is not great.

Absolute performance can always be incrementally improved later,
whereas poor performance scaling to load usually means the design is
wrong and you're going to need a whole new library with new API.

This is why I really wanted to see performance scaling graphs. If
they show O(N log N) or worse, then the design is deeply flawed and
the library must not enter Boost. Until we have such a graph, we
can't know as there is no substitute for empirical testing.

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/



Oliver Kowalke

unread,
Jan 16, 2014, 8:44:09 AM1/16/14
to boost
2014/1/16 Hartmut Kaiser <hartmut...@gmail.com>

>
> > > I still don't get it. There is no API stability question. The API is
> > > well defined for over 2 years now in the C++11 Standard (and even
> > > longer in Boost.Thread).
> >
> > I could have choosen a different API for fibers - but I think the
> > developers are more familiar with std::thread/boost::thread API.
>
> But you have not (and for a good reason!). So this argument is moot.
>

what I tried to tell is, that I the boost community could comme to the
conclusion that the
choosen API (thread-API or any another) is not appropriate for the suggested
semantics.
as the review figured out is that the thread-API would be accepted by the
reviewers -
and that's what I was referring with 'stable API for boost.fiber'.


> The only benefit you're getting from using fibers for this (and you could
> achieve the same semantics using plain ol'threads as well, Boost.Asio is
> doing it for years after all) is - now guess - performance. So please make
> up your mind. Are you trying to improve performance or what?
>

As I wrote before - with thread you would have to scatter your code with
callbacks.
With fibers you don't - you could write the code as it would by synchronous
operations.
That makes the code easier to read and understandable.

Andreas Schäfer

unread,
Jan 16, 2014, 8:50:31 AM1/16/14
to bo...@lists.boost.org
On 13:27 Thu 16 Jan , Daniel James wrote:
> It isn't really user feedback, but feedback from a developer of
> similar functionality, which is something very different.

Yes and no. Usually someone who implemented similar code did just that
because he wanted to use it for himself. At least in an Open Source
environment. Thus the developer becomes the first user.

> Pedantically, the developer could also be a user of the library, but
> their main point of view is as a developer of such functionality, and
> their opinions are influenced by that. If they've put a lot of effort
> into something, then it's likely that they will overvalue it. Feedback
> from other developers is of course extremely useful, but the
> difference should be appreciated.

Let me try to rephrase that: said developer's point of view might be
biased, thus his arguments carry less weight. Is that what you're
saying? I'd then add to the discussion that his experience also makes
him a domain expert, which reinforces his authority. This road is
called "ad hominem" and doesn't lead anywhere. Lets get back to the
facts.

I, as a potential user of user-level threads a.k.a. fibers would only
use them if they allowed me to do something std::thread can't do for
me: many many, fine-grained threads, which relieve me of the burden of
having to adapt the decomposition of my compute problem. And this
again boils down to performance: if it's not going to be much faster,
why shouldn't I hand over the problem to the OS?

Being a library developer myself, I can assure you that performance is
not something you can easily bolt on afterwards. Rather, it has to be
built-in from the beginning. Otherwise you'll end up reimplementing
class after class. Just my 0.02€.

Cheers
-Andreas


--
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
signature.asc

Oliver Kowalke

unread,
Jan 16, 2014, 8:52:26 AM1/16/14
to boost
2014/1/16 Hartmut Kaiser <hartmut...@gmail.com>

> I've never said that it's only speed that matters.


but I got this impression from your last postings - sorry if I'm wrong


> I said that because the
> API is set there is only performance which could be used as a criteria to
> decide whether your library is a worthy addition to Boost


this your opinion and I disagree - it is not the only one


> (besides
> implementation quality, which is sub-standard as others have pointed out).
>

not very kind from you - nicht sehr nett von Dir
copy-and-paste errors happen

Daniel James

unread,
Jan 16, 2014, 8:55:18 AM1/16/14
to bo...@lists.boost.org
On 16 January 2014 13:42, Hartmut Kaiser <hartmut...@gmail.com> wrote:
>>
>> Pedantically, the developer could also be a user of the library, but their
>> main point of view is as a developer of such functionality, and their
>> opinions are influenced by that. If they've put a lot of effort into
>> something, then it's likely that they will overvalue it. Feedback from
>> other developers is of course extremely useful, but the difference should
>> be appreciated.
>
> Why is it that there is again this 'selfishness' being silently alleged. Do
> you imply that just because I claim to have a better understanding of the
> applicability of a particular idiom/library then many others (because I'm
> working in the field for years) my opinion is too biased to be considered
> useful?

I said, "Feedback from other developers is of course extremely
useful". I have no idea how you managed to interpret that as saying
that you're opinion isn't useful.

Peter Dimov

unread,
Jan 16, 2014, 8:58:14 AM1/16/14
to bo...@lists.boost.org
Oliver Kowalke wrote:
> In the context of C10K problem and using the one-thread-per-client pattern
> I doubt that this would scale (even on modern hardware). Do you have some
> data showing the performance of an modern operating system and hardware by
> increasing thread count?

Spawning 10000 threads, each executing Sleep(100), takes about 350 ms for
me, waiting for them to finish adds another 100 ms. Not sure how relevant is
this benchmark though. I was just curious.

Thomas Heller

unread,
Jan 16, 2014, 9:18:35 AM1/16/14
to bo...@lists.boost.org
On 01/16/2014 02:52 PM, Oliver Kowalke wrote:
> 2014/1/16 Hartmut Kaiser <hartmut...@gmail.com>
>
>> I've never said that it's only speed that matters.
>
>
> but I got this impression from your last postings - sorry if I'm wrong
>
>
>> I said that because the
>> API is set there is only performance which could be used as a criteria to
>> decide whether your library is a worthy addition to Boost
>
>
> this your opinion and I disagree - it is not the only one

It should be one of the major criteria for the decision for the library
to get accepted for the arguments brought up by Hartmut.

>
>
>> (besides
>> implementation quality, which is sub-standard as others have pointed out).
>>
>
> not very kind from you - nicht sehr nett von Dir
> copy-and-paste errors happen

It might not be very kind, but it reflects the current state of the
library. In addition, the library is not useful on the advertised
platforms. The PPC64 implementation of Boost.Context is not tested and
does not work (sure it's not the fault of Fiber per se), for example.

Andreas Schäfer

unread,
Jan 16, 2014, 9:04:09 AM1/16/14
to bo...@lists.boost.org
On 14:33 Thu 16 Jan , Oliver Kowalke wrote:
> In the context of C10K problem and using the one-thread-per-client pattern
> I doubt
> that this would scale (even on modern hardware). Do you have some data
> showing
> the performance of an modern operating system and hardware by increasing
> thread count?

Here are two peer-reviewed publications with extensive performance
data on various modern architectures. So yes: this can go very fast,
if done right.

http://stellar.cct.lsu.edu/pubs/isc2012.pdf
http://stellar.cct.lsu.edu/pubs/scala13.pdf

HTH
signature.asc

Giovanni Piero Deretta

unread,
Jan 16, 2014, 9:03:52 AM1/16/14
to bo...@lists.boost.org
On Thu, Jan 16, 2014 at 1:33 PM, Oliver Kowalke <oliver....@gmail.com>wrote:

> 2014/1/16 Giovanni Piero Deretta <gpde...@gmail.com>
>
> > I think that Harmut point is that you can very well use threads for the
> > same thing. In this particular case you would just perform a syncronous
> > read. Yes, to mantain the same level of concurrency you need to spawn ten
> > of thousands of threads, but that's feasible on a modern os/hardware
> pair.
> > The point of using fibers (i.e. M:N threading) is almost purely
> > performance.
> >
>
> In the context of C10K problem and using the one-thread-per-client pattern
> I doubt
> that this would scale (even on modern hardware). Do you have some data
> showing
> the performance of an modern operating system and hardware by increasing
> thread count?
>
>
I do not have hard numbers (do you?), but consider that the C10K page is
quite antiquated today.

On a previous life I worked on relatively low-latency applications that did
handle multiple thousands requests per second per machine. We never
bothered with anything but with the one thread per connection model. This
was on windows, on, IIRC, octa-core 64 bits machines (today you can
"easily" get 24 cores or more on a standard intel server class machine).


Now, if we were talking about hundreds of thousands of threads or milions
of threads, it would be interesting to see numbers for both threads and
fibers...

Thomas Heller

unread,
Jan 16, 2014, 9:27:59 AM1/16/14
to bo...@lists.boost.org
On 01/16/2014 02:27 PM, Nat Goodspeed wrote:
> On Thu, Jan 16, 2014 at 7:51 AM, Giovanni Piero Deretta
> <gpde...@gmail.com> wrote:
>
>> I think that Harmut point is that you can very well use threads for the
>> same thing. ...
>> The point of using fibers (i.e. M:N threading) is almost purely
>> performance.
>
> Again, for a large class of use cases, fibers and threads are not the same.
>
> Writing thread-safe code remains something of an art, a specialty
> within the already-rarefied realm of good C++ coding. With care, code
> review and testing, it is of course possible to produce good
> thread-safe code when you are writing it from scratch.
>
> But retrofitting existing single-threaded code to be thread-safe can
> be extremely costly. At this moment in history, we have a very large
> volume of existing code whose developers (perhaps unconsciously)
> relied on having exclusive access to certain in-process resources.
> Some of us do not have the option to discard it and rewrite from
> scratch.

Even in the context of a Boost.Fiber like library, you have to take
extra care to secure your data structures from concurrent access. Even
though it is not necessarily running any threads in parallel, a fiber
can suspend while being in a critical section. BTW, from our experiences
with HPX, such a behavior (suspending a user level thread while a lock
is held) is very dangerous and often leads to deadlocks.
That being said, even when you decide to use fiber with your legacy
code, the cost to make it safe is not really negligible.

>
> Yes, this is a subset of the possible use cases of the Fiber library.
> It is an important subset because threads provide no equivalent.
>
> Yes, I also want a Boost library that will concurrently process very
> large numbers of tasks, with each of a number of threads running very
> many fibers. I think the Fiber library gives us a foundation on which
> to build that support. But even with its present feature set, with
> Oliver responding to the community, it has great value. I feel
> frustrated when people dismiss the very real benefit of cooperative
> context switching as irrelevant to them.

Noone said it's irrelevant. The point was that performance should be the
major criteria to accept the library.

Oliver Kowalke

unread,
Jan 16, 2014, 9:19:07 AM1/16/14
to boost
2014/1/16 Thomas Heller <thom....@gmail.com>

> It might not be very kind, but it reflects the current state of the
> library. In addition, the library is not useful on the advertised
> platforms. The PPC64 implementation of Boost.Context is not tested and does
> not work (sure it's not the fault of Fiber per se), for example.
>

boost.context is irrelevant in this discussion.

do you think I've a machine for each architecture at home? I can only write
the code if requested by some users and I have to rely on the willing of
community members to test the code on the specific hardware.

You asked me about boost.context support of PPC64 and I told you that the
code is untested from my side and boost-regression tests do not exist for
PPC64.
But you did not respond to my email. As I did with other users it hoped
that we could fix the problem together but I didn't get any feedback from
you - don't blame me.

Andreas Schäfer

unread,
Jan 16, 2014, 9:22:50 AM1/16/14
to bo...@lists.boost.org
Hi,

On 15:19 Thu 16 Jan , Oliver Kowalke wrote:
> do you think I've a machine for each architecture at home? I can only write
> the code if requested by some users and I have to rely on the willing of
> community members to test the code on the specific hardware.
>
> You asked me about boost.context support of PPC64 and I told you that the
> code is untested from my side and boost-regression tests do not exist for
> PPC64.

I can get you ssh access to a PPC64 node if you're interested. Just
send me a private mail.
signature.asc

Nat Goodspeed

unread,
Jan 16, 2014, 9:23:07 AM1/16/14
to bo...@lists.boost.org
On Thu, Jan 16, 2014 at 9:27 AM, Thomas Heller <thom....@gmail.com> wrote:

> Even in the context of a Boost.Fiber like library, you have to take extra
> care to secure your data structures from concurrent access. Even though it
> is not necessarily running any threads in parallel, a fiber can suspend
> while being in a critical section. BTW, from our experiences with HPX, such
> a behavior (suspending a user level thread while a lock is held) is very
> dangerous and often leads to deadlocks.
> That being said, even when you decide to use fiber with your legacy code,
> the cost to make it safe is not really negligible.

Your point is well-taken. Introducing a new fiber -- let's say "a
cooperatively-concurrent thread" -- into code *partially* retrofitted
for kernel threads is more dangerous than in code which has always
before run on a single thread; in other words, code with no
kernel-thread synchronization constructs.

You can grep for kernel-thread synchronization constructs, though.
Being certain that you have located and adequately defended every
potential access to a process-global resource is significantly harder.

Hartmut Kaiser

unread,
Jan 16, 2014, 9:32:30 AM1/16/14
to bo...@lists.boost.org

> > 2014/1/16 Giovanni Piero Deretta <gpde...@gmail.com>
> >
> > > I think that Harmut point is that you can very well use threads for
> > > the same thing. In this particular case you would just perform a
> > > syncronous read. Yes, to mantain the same level of concurrency you
> > > need to spawn ten of thousands of threads, but that's feasible on a
> > > modern os/hardware
> > pair.
> > > The point of using fibers (i.e. M:N threading) is almost purely
> > > performance.
> > >
> >
> > In the context of C10K problem and using the one-thread-per-client
> > pattern I doubt that this would scale (even on modern hardware). Do
> > you have some data showing the performance of an modern operating
> > system and hardware by increasing thread count?
> >
> >
> I do not have hard numbers (do you?), but consider that the C10K page is
> quite antiquated today.
>
> On a previous life I worked on relatively low-latency applications that
> did handle multiple thousands requests per second per machine. We never
> bothered with anything but with the one thread per connection model. This
> was on windows, on, IIRC, octa-core 64 bits machines (today you can
> "easily" get 24 cores or more on a standard intel server class machine).
>
> Now, if we were talking about hundreds of thousands of threads or milions
> of threads, it would be interesting to see numbers for both threads and
> fibers...

FWIW, the use cases I'm seeing (and trust me those are very commonplace at
least in scientific computing) involve not just hundreds or thousands of
threads, but hundreds of millions of threads (billions of threads a couple
of years from now).

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Thomas Heller

unread,
Jan 16, 2014, 9:49:21 AM1/16/14
to bo...@lists.boost.org
On 01/16/2014 03:19 PM, Oliver Kowalke wrote:
> 2014/1/16 Thomas Heller <thom....@gmail.com>
>
>> It might not be very kind, but it reflects the current state of the
>> library. In addition, the library is not useful on the advertised
>> platforms. The PPC64 implementation of Boost.Context is not tested and does
>> not work (sure it's not the fault of Fiber per se), for example.
>>
>
> boost.context is irrelevant in this discussion.

I don't think so. Boost.Fiber builts on context and without a working
context implementation, the fiber is useless.

>
> do you think I've a machine for each architecture at home? I can only write
> the code if requested by some users and I have to rely on the willing of
> community members to test the code on the specific hardware.

Absolutely. But context is shipped with code for PPC64 so it should be
assumed it works.

>
> You asked me about boost.context support of PPC64 and I told you that the
> code is untested from my side and boost-regression tests do not exist for
> PPC64.
> But you did not respond to my email. As I did with other users it hoped
> that we could fix the problem together but I didn't get any feedback from
> you - don't blame me.

Yes, I got side tracked. I am not blaming you for not delivering a PPC64
Boost.Context implementation. When i get back to the project where i
need the context switch for PPC64, i will certainly get back to you. I
was just trying to point to a case where Fiber is not working (with no
indication in the docs or elsewhere). Sorry if you got the wrong impression.

Oliver Kowalke

unread,
Jan 16, 2014, 9:49:40 AM1/16/14
to boost
2014/1/16 Thomas Heller <thom....@gmail.com>

> I don't think so. Boost.Fiber builts on context and without a working
> context implementation, the fiber is useless.


because I don't get feedback about the implementation of an architecture,
nor regression-tests exists, I should throw away the work.
of course I let it in the library and wait if someone tests it and reports
the bug. otherwise it would be impossible to get any feedback.


> do you think I've a machine for each architecture at home? I can only
>> write
>> the code if requested by some users and I have to rely on the willing of
>> community members to test the code on the specific hardware.
>>
>
> Absolutely. But context is shipped with code for PPC64 so it should be
> assumed it works.


...


> Yes, I got side tracked. I am not blaming you for not delivering a PPC64
> Boost.Context implementation. When i get back to the project where i need
> the context switch for PPC64, i will certainly get back to you. I was just
> trying to point to a case where Fiber is not working (with no indication in
> the docs or elsewhere). Sorry if you got the wrong impression.
>

after reading the postings of you and other members of your group I must
get the impression you really try to dis me

Please keep in mind that I do this work only in my spare time (beside I've
a family too), I've not the time as you in your daily work at the
university.
And then you and your fellows tell me that I'm to stupid - beside that some
of the criticised issues are questionable - copy-and-paste error happen.

Giovanni Piero Deretta

unread,
Jan 16, 2014, 9:51:58 AM1/16/14
to bo...@lists.boost.org
On a single machine? That would be impressive!

-- gpd

Daniel James

unread,
Jan 16, 2014, 10:12:25 AM1/16/14
to bo...@lists.boost.org
On 16 January 2014 13:50, Andreas Schäfer <gen...@gmx.de> wrote:
> On 13:27 Thu 16 Jan , Daniel James wrote:

>> Pedantically, the developer could also be a user of the library, but
>> their main point of view is as a developer of such functionality, and
>> their opinions are influenced by that. If they've put a lot of effort
>> into something, then it's likely that they will overvalue it. Feedback
>> from other developers is of course extremely useful, but the
>> difference should be appreciated.
>
> Let me try to rephrase that: said developer's point of view might be
> biased, thus his arguments carry less weight. Is that what you're
> saying?

No, of course it isn't.

> I'd then add to the discussion that his experience also makes
> him a domain expert, which reinforces his authority. This road is
> called "ad hominem" and doesn't lead anywhere.

Your response is called a "straw man argument", or an "Aunt Sally".

Thomas Heller

unread,
Jan 16, 2014, 10:29:54 AM1/16/14
to bo...@lists.boost.org
On 01/16/2014 03:49 PM, Oliver Kowalke wrote:
> 2014/1/16 Thomas Heller <thom....@gmail.com>
>
>> I don't think so. Boost.Fiber builts on context and without a working
>> context implementation, the fiber is useless.
>
>
> because I don't get feedback about the implementation of an architecture,
> nor regression-tests exists, I should throw away the work.
> of course I let it in the library and wait if someone tests it and reports
> the bug. otherwise it would be impossible to get any feedback.

No objection of keeping it in the develop branch. But I think it is bad
practice to release a code which is not tested.

>
>
>> do you think I've a machine for each architecture at home? I can only
>>> write
>>> the code if requested by some users and I have to rely on the willing of
>>> community members to test the code on the specific hardware.
>>>
>>
>> Absolutely. But context is shipped with code for PPC64 so it should be
>> assumed it works.
>
>
> ...
>
>
>> Yes, I got side tracked. I am not blaming you for not delivering a PPC64
>> Boost.Context implementation. When i get back to the project where i need
>> the context switch for PPC64, i will certainly get back to you. I was just
>> trying to point to a case where Fiber is not working (with no indication in
>> the docs or elsewhere). Sorry if you got the wrong impression.
>>
>
> after reading the postings of you and other members of your group I must
> get the impression you really try to dis me

I am not trying to diss you. My apologies if anything i said offended
you personally.

>
> Please keep in mind that I do this work only in my spare time (beside I've
> a family too), I've not the time as you in your daily work at the
> university.
> And then you and your fellows tell me that I'm to stupid - beside that some
> of the criticised issues are questionable - copy-and-paste error happen.

Sure they happen. Happen to everyone. Again, noone said your stupid, we
are just giving feedback about your work. As a side effect of our daily
work we happen to have gained some experience with the library you
propose and in addition think that performance should be an important
and critical feature of your library. That's all. No offense intended.

Bjorn Reese

unread,
Jan 16, 2014, 11:19:10 AM1/16/14
to bo...@lists.boost.org
On 01/16/2014 01:51 PM, Giovanni Piero Deretta wrote:

> I think that Harmut point is that you can very well use threads for the
> same thing. In this particular case you would just perform a syncronous
> read. Yes, to mantain the same level of concurrency you need to spawn ten

Let me add two use cases that cannot be handled reasonably that way.

First, many third-party libraries have callbacks as their primary
interaction mechanism, and unlike Asio, they do not provide a
synchronous alternative for the interaction. In this case fibers can
be of great help.

Second, when decoding/parsing streaming data (data that is received
piecemeal) that is separated by delimiters, you have to start decoding
to see if you have received the delimiter. If not, then you have to
receive more data and decode again. Rather than having to decode from
the beginning every time, it is preferable to remember how far you got
and continue from there. This can be done by integrating fibers with
the decoder.

In these use cases performance is of secondary importance.

Giovanni Piero Deretta

unread,
Jan 16, 2014, 11:28:13 AM1/16/14
to bo...@lists.boost.org
On Thu, Jan 16, 2014 at 4:19 PM, Bjorn Reese <bre...@mail1.stofanet.dk>wrote:

> On 01/16/2014 01:51 PM, Giovanni Piero Deretta wrote:
>
> I think that Harmut point is that you can very well use threads for the
>> same thing. In this particular case you would just perform a syncronous
>> read. Yes, to mantain the same level of concurrency you need to spawn ten
>>
>
> Let me add two use cases that cannot be handled reasonably that way.
>
> First, many third-party libraries have callbacks as their primary
> interaction mechanism, and unlike Asio, they do not provide a
> synchronous alternative for the interaction.


You do not need to sell me the advantage of using continuations for
managing callback hell :) ...


> In this case fibers can
> be of great help.
>
>
... but we already have boost.coroutine for that ...


> Second, when decoding/parsing streaming data (data that is received
> piecemeal) that is separated by delimiters, you have to start decoding
> to see if you have received the delimiter. If not, then you have to
> receive more data and decode again. Rather than having to decode from
> the beginning every time, it is preferable to remember how far you got
> and continue from there. This can be done by integrating fibers with
> the decoder.
>

... also a perfect match for coroutines.


>
> In these use cases performance is of secondary importance.
>
>
What boost.fibers add is a scheduler and a compatibility layer for
boost/std thread, including locks, condvars and futures. You do not really
need these if you just need to thread your callbacks.

-- gpd

Hartmut Kaiser

unread,
Jan 16, 2014, 11:44:33 AM1/16/14
to bo...@lists.boost.org
> > > Now, if we were talking about hundreds of thousands of threads or
> > > milions of threads, it would be interesting to see numbers for both
> > > threads and fibers...
> >
> > FWIW, the use cases I'm seeing (and trust me those are very
> > commonplace at least in scientific computing) involve not just
> > hundreds or thousands of threads, but hundreds of millions of threads
> > (billions of threads a couple of years from now).
> >
> >
> On a single machine? That would be impressive!

Well, it depends on the size of the machine, doesn't it? The no. 1 machine
on the top 500 list [1] (Tianhe-2 [2]) has 3120000 cores (in 16,000 compute
nodes).

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

[1] http://www.top500.org/
[2] http://www.top500.org/list/2013/11/

Giovanni Piero Deretta

unread,
Jan 16, 2014, 11:56:57 AM1/16/14
to bo...@lists.boost.org
On Thu, Jan 16, 2014 at 4:44 PM, Hartmut Kaiser <hartmut...@gmail.com>wrote:

> > > > Now, if we were talking about hundreds of thousands of threads or
> > > > milions of threads, it would be interesting to see numbers for both
> > > > threads and fibers...
> > >
> > > FWIW, the use cases I'm seeing (and trust me those are very
> > > commonplace at least in scientific computing) involve not just
> > > hundreds or thousands of threads, but hundreds of millions of threads
> > > (billions of threads a couple of years from now).
> > >
> > >
> > On a single machine? That would be impressive!
>
> Well, it depends on the size of the machine, doesn't it? The no. 1 machine
> on the top 500 list [1] (Tianhe-2 [2]) has 3120000 cores (in 16,000 compute
> nodes).
>
>
Oh, right!

Do they usually present a single OS image to the application? I.e. do all
the cores share a single memory address space or nodes communicate via
message passing (MPI I presume)? std::thread-like scaling is relevant for
the first case, less so for the later.

-- gpd

Thomas Heller

unread,
Jan 16, 2014, 12:19:36 PM1/16/14
to Boost mailing list
Am 16.01.2014 17:57 schrieb "Giovanni Piero Deretta" <gpde...@gmail.com>:
>
> On Thu, Jan 16, 2014 at 4:44 PM, Hartmut Kaiser <hartmut...@gmail.com
>wrote:
>
> > > > > Now, if we were talking about hundreds of thousands of threads or
> > > > > milions of threads, it would be interesting to see numbers for
both
> > > > > threads and fibers...
> > > >
> > > > FWIW, the use cases I'm seeing (and trust me those are very
> > > > commonplace at least in scientific computing) involve not just
> > > > hundreds or thousands of threads, but hundreds of millions of
threads
> > > > (billions of threads a couple of years from now).
> > > >
> > > >
> > > On a single machine? That would be impressive!
> >
> > Well, it depends on the size of the machine, doesn't it? The no. 1
machine
> > on the top 500 list [1] (Tianhe-2 [2]) has 3120000 cores (in 16,000
compute
> > nodes).
> >
> >
> Oh, right!
>
> Do they usually present a single OS image to the application? I.e. do all
> the cores share a single memory address space or nodes communicate via
> message passing (MPI I presume)? std::thread-like scaling is relevant for
> the first case, less so for the later.

If you decide to program with MPI that's certainly true. However HPX[1]
provides the ability to spawn threads remotely, completely embedded in a
standard conforming API. For those remote procedure calls a small overhead
is crucial in order to efficiently utilize your whole machine. We
demonstrated the capability to do exactly that [2].

[1]: http://stellar.cct.lsu.edu
[2]: http://stellar.cct.lsu.edu/pubs/scala13.pdf

Hartmut Kaiser

unread,
Jan 16, 2014, 12:30:43 PM1/16/14
to bo...@lists.boost.org
> > > > > Now, if we were talking about hundreds of thousands of threads
> > > > > or milions of threads, it would be interesting to see numbers
> > > > > for both threads and fibers...
> > > >
> > > > FWIW, the use cases I'm seeing (and trust me those are very
> > > > commonplace at least in scientific computing) involve not just
> > > > hundreds or thousands of threads, but hundreds of millions of
> > > > threads (billions of threads a couple of years from now).
> > > >
> > > >
> > > On a single machine? That would be impressive!
> >
> > Well, it depends on the size of the machine, doesn't it? The no. 1
> > machine on the top 500 list [1] (Tianhe-2 [2]) has 3120000 cores (in
> > 16,000 compute nodes).
> >
> >
> Oh, right!
>
> Do they usually present a single OS image to the application? I.e. do all
> the cores share a single memory address space or nodes communicate via
> message passing (MPI I presume)? std::thread-like scaling is relevant for
> the first case, less so for the later.

The conventional way how it's done is to use MPI.
However, if you used HPX you'd see one global address space (for the things
to be sharable).

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



Gavin Lambert

unread,
Jan 17, 2014, 12:54:19 AM1/17/14
to bo...@lists.boost.org
On 17/01/2014 02:44, Quoth Oliver Kowalke:
> As I wrote before - with thread you would have to scatter your code with
> callbacks.
> With fibers you don't - you could write the code as it would by synchronous
> operations.
> That makes the code easier to read and understandable.

Boost.Asio already supports using Boost.Coroutine for that purpose; an
extra library seems unnecessary if that is your target.

My understanding is that the new thing that Fibers tries to bring to the
table is std::thread-like cross-fiber synchronisation. Which is
something that only matters if you have fibers running in multiple
threads, and does not seem related to the use case you're mentioning
above, unless I'm missing something.

So I'm a little confused as to what you're trying to focus on.

Bjorn Reese

unread,
Jan 17, 2014, 2:06:17 AM1/17/14
to bo...@lists.boost.org
On 01/16/2014 05:28 PM, Giovanni Piero Deretta wrote:

> ... but we already have boost.coroutine for that ...

Yes, but once we want to coordinate between such continuations, whether
through condition variables or message queues, then we are right back
in fiber-land.

Oliver Kowalke

unread,
Jan 17, 2014, 2:19:13 AM1/17/14
to boost
2014/1/17 Bjorn Reese <bre...@mail1.stofanet.dk>

> On 01/16/2014 05:28 PM, Giovanni Piero Deretta wrote:
>
> ... but we already have boost.coroutine for that ...
>>
>
> Yes, but once we want to coordinate between such continuations, whether
> through condition variables or message queues, then we are right back
> in fiber-land.
>

correct - and such features was already requested on the developer mailing
list in 2013

Nat Goodspeed

unread,
Jan 17, 2014, 10:01:34 AM1/17/14
to bo...@lists.boost.org
On Fri, Jan 17, 2014 at 12:54 AM, Gavin Lambert <gav...@compacsort.com> wrote:

> On 17/01/2014 02:44, Quoth Oliver Kowalke:
>
>> As I wrote before - with thread you would have to scatter your code with
>> callbacks.
>> With fibers you don't - you could write the code as it would by
>> synchronous operations.
>> That makes the code easier to read and understandable.

> Boost.Asio already supports using Boost.Coroutine for that purpose; an extra
> library seems unnecessary if that is your target.

What if you're using an asynchronous API that's not Boost.Asio?

What if you're using several different async APIs?

Wouldn't you want something like future and promise to interface
between your coroutine and an arbitrary asynchronous API?

Then there's the lifespan question. In a classic coroutine scenario,
you instantiate a coroutine object, you chat with it for a bit, then
you destroy it. But launching a "cooperatively context-switched
thread" is more of a fire-and-forget operation. Who owns the object?
Who cleans up when it's done?

Then there's control flow. A coroutine has a caller. When it
context-switches away, it specifically resumes that caller. What if
you have several different coroutines you're using as cooperative
threads, and you want to run whichever of them is ready next?

Clearly all of this can be done with coroutines, yes. (Fiber does
build it on coroutines!) But it's a whole additional abstraction
layer. Must every developer facing this kind of use case build that
layer by hand?

> My understanding is that the new thing that Fibers tries to bring to the
> table is std::thread-like cross-fiber synchronisation. Which is something
> that only matters if you have fibers running in multiple threads, and does
> not seem related to the use case you're mentioning above, unless I'm missing
> something.

Consider a producer fiber obtaining input from some async source,
pumping items into a queue. That queue is consumed by several
different consumer fibers, each interacting with an async sink. All of
it is running on a single thread. That's just one example.

Gavin Lambert

unread,
Jan 19, 2014, 5:46:13 PM1/19/14
to bo...@lists.boost.org
On 18/01/2014 04:01, Quoth Nat Goodspeed:
> Clearly all of this can be done with coroutines, yes. (Fiber does
> build it on coroutines!) But it's a whole additional abstraction
> layer. Must every developer facing this kind of use case build that
> layer by hand?

I asked that because Oliver seems to me to be focusing many of his
replies in this thread and elsewhere to "it makes Asio syntax cleaner",
which I don't feel is a sufficient justification for this library to
exist by itself, because Coroutine already does that.

I'm not saying that the library doesn't have merit for other reasons,
just that it's not being expressed very well.

> Consider a producer fiber obtaining input from some async source,
> pumping items into a queue. That queue is consumed by several
> different consumer fibers, each interacting with an async sink. All of
> it is running on a single thread. That's just one example.

If you are running on a single thread, you do not require any locks at
all on the queue, and barely any kind of synchronisation to have
consumers go to sleep when idle and be woken up when new work arrives.

Although granted this library would theoretically make life easier than
the alternatives if the consumers also needed to sleep on things other
than the queue itself -- though again, if you're running in one thread
you don't need locks, so there's not much you need to sleep on.

It's only really when you go to M:N that something like this becomes
especially valuable.

(Don't get me wrong -- I'm eagerly waiting for something like this,
because I *do* have a M:N situation in some of my code. But that code
is currently using Windows fibers, so I'm also interested in a
performance comparison.)

Oliver Kowalke

unread,
Jan 20, 2014, 2:07:22 AM1/20/14
to boost
2014/1/19 Gavin Lambert <gav...@compacsort.com>

> I asked that because Oliver seems to me to be focusing many of his replies
> in this thread and elsewhere to "it makes Asio syntax cleaner", which I
> don't feel is a sufficient justification for this library to exist by
> itself, because Coroutine already does that.
>

correct - but think on the one-thread-per-clientpattern (which most
developers are familiar with) which is easy to write and understand than
using callbacks - both using asnyc I/O.
one-thread-per-client -> one-fiber-per-client
with coroutines you can't use the one-fiber-per-client because you are
missing the synchronization classes.


> If you are running on a single thread, you do not require any locks at all
> on the queue, and barely any kind of synchronisation to have consumers go
> to sleep when idle and be woken up when new work arrives.
>

hmm - the library doesn't uses locks in the sense of thread-locks

fibers are a thin wrapper around coroutines and you are able to do
something like:

void fn1() {};
fiber f1(fn1);

void fn2() { f1.join(); }
fiber f2( fn2);

e.g. a fiber can join another fiber as you know it from threads (coroutines
do not provide this feature or you would have to implement it as
boost.fiber already tries to provide).


>
> Although granted this library would theoretically make life easier than
> the alternatives if the consumers also needed to sleep on things other than
> the queue itself -- though again, if you're running in one thread you don't
> need locks, so there's not much you need to sleep on.
>

fiber does not sleep - they are suspended (its stack and registers are
preserved) and if the condition for which the fiber was suspended becomes
true it will be resumed, e.g. the registers are restored in the cpu and
the stackpoint is restored too. so it is not real locking as threads but
the fiber library provides classes as std::thread API (but the internal
implementation and the used mechanisms are different).


> It's only really when you go to M:N that something like this becomes
> especially valuable.
>

not realy even if you do cooperative scheduling == userland-threads (which
can run concurrently in one thread) you need classes for coordinating the
fibers
_______________________________________________

Gavin Lambert

unread,
Jan 20, 2014, 6:44:03 PM1/20/14
to bo...@lists.boost.org
On 20/01/2014 20:07, Quoth Oliver Kowalke:
> with coroutines you can't use the one-fiber-per-client because you are
> missing the synchronization classes.

You can if you don't require synchronisation. Something that's just
serving up read-only data (eg. basic in-memory web server) or handing
off complex requests to a worker thread via a non-blocking queue, would
be an example of that. Every fiber is completely independent of every
other -- they don't care what the others are up to.

The thing is though that the main advantage of thread-per-client is
handling multiple requests simultaneously. And you lose that advantage
with fiber-per-client unless you sprinkle your processing code with
fiber interruption points (either manually or via calls to the sync
classes you're proposing) -- and even then I think that only provides
much benefit for long-running-connection protocols (like IRC or telnet),
not request-response protocols (like HTTP), and where individual
processing time is very short.

For a system that has longer processing times but still wants to handle
multiple requests (where processing time is CPU bound, rather than
waiting on other fibers), the best design would be a limited size
threadpool that can run any of the per-client fibers. And AFAIK your
proposed library has no support for this scenario.

I'm not saying it necessarily *needs* this, but if you're going to talk
about fibers-as-useful-to-ASIO I think this case is going to come up
sooner rather than later, so it may be worthy of consideration in the
library design.

Another scenario that doesn't require fiber migration, but does require
cross-thread-fiber-synch, is:
- one thread running N client fibers
- M worker threads each running one fiber
If a client thread wants to make a blocking call (eg. database/file I/O)
it could post a request to a worker, which would do the blocking call
and then post back once it was done. This would allow the client fibers
to keep running but the system would still bottleneck once it had M
simultaneous blocking calls. (A thread per client system wouldn't
bottleneck there, but it loses performance if there are too many
non-blocked threads.)

Neither design seems entirely satisfactory. (An obvious solution is to
never use blocking calls, but that's not always possible.)

Hartmut Kaiser

unread,
Jan 20, 2014, 8:48:51 PM1/20/14
to bo...@lists.boost.org
Oliver,

> > I asked that because Oliver seems to me to be focusing many of his
> > replies in this thread and elsewhere to "it makes Asio syntax
> > cleaner", which I don't feel is a sufficient justification for this
> > library to exist by itself, because Coroutine already does that.
>
> correct - but think on the one-thread-per-clientpattern (which most
> developers are familiar with) which is easy to write and understand than
> using callbacks - both using asnyc I/O.
> one-thread-per-client -> one-fiber-per-client with coroutines you can't
> use the one-fiber-per-client because you are missing the synchronization
> classes.

After some more thinking I believe to start understanding the angle you're
coming from. The proposed library has been designed for the sole purpose of
complementing Boost.Asio (or similar asynchronous libraries), allowing it to
be used in a more straightforward way. I apologize for being slow or dense.

Using the name Boost.Fiber implies a much broader use case (and that's what
got me confused). I think it would be sensible to choose another name for
this library.

BTW, if the author would have referred to
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3747.pdf, all these
misunderstandings could have been avoided...

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

>
>

Niall Douglas

unread,
Jan 21, 2014, 7:44:55 AM1/21/14
to bo...@lists.boost.org
On 20 Jan 2014 at 19:48, Hartmut Kaiser wrote:

> BTW, if the author would have referred to
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3747.pdf, all these
> misunderstandings could have been avoided...

That's a good paper, but I wish it didn't claim to be a *universal*
model for asynchronous operations because that model is completely
unsuitable for persistent storage i/o. I had that argument with
Nicholas @ Microsoft Research actually, and I think I may well have
persuaded him as they're seeing the same problems of fit.

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/



Nat Goodspeed

unread,
Jan 21, 2014, 10:09:16 AM1/21/14
to bo...@lists.boost.org
On Tue, Jan 21, 2014 at 7:44 AM, Niall Douglas
<s_sour...@nedprod.com> wrote:

> On 20 Jan 2014 at 19:48, Hartmut Kaiser wrote:

>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3747.pdf

> That's a good paper, but I wish it didn't claim to be a *universal*
> model for asynchronous operations because that model is completely
> unsuitable for persistent storage i/o. I had that argument with
> Nicholas @ Microsoft Research actually, and I think I may well have
> persuaded him as they're seeing the same problems of fit.

Niall, would you be able to propose a more universal model? Please
read this as a simple invitation rather than a challenge. The goal of
the paper seems laudable: accepting an argument that allows the caller
to specify whether to provide results with a callback, a future or
suspend-and-resume.

Niall Douglas

unread,
Jan 21, 2014, 1:20:27 PM1/21/14
to bo...@lists.boost.org
On 21 Jan 2014 at 10:09, Nat Goodspeed wrote:

> >> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3747.pdf
>
> > That's a good paper, but I wish it didn't claim to be a *universal*
> > model for asynchronous operations because that model is completely
> > unsuitable for persistent storage i/o. I had that argument with
> > Nicholas @ Microsoft Research actually, and I think I may well have
> > persuaded him as they're seeing the same problems of fit.
>
> Niall, would you be able to propose a more universal model? Please
> read this as a simple invitation rather than a challenge. The goal of
> the paper seems laudable: accepting an argument that allows the caller
> to specify whether to provide results with a callback, a future or
> suspend-and-resume.

I think I effectively have via AFIO which uses an asynchronous
execution precedence graph model with explicit gathering for error
propagation, which unlike the proposed model works as well with
seekable i/o as fifo i/o. The problem with the AFIO model though is
that it is very heavily reliant on the performance of futures (and
therefore the memory allocator) as those are used to transport state
between closures, and for those with very low latency very small
packet (e.g. UDP) socket i/o it's unsuitable (which is of course
pointed out in the WG21 paper). My opinion there is that callbacks
can become a low level interface for those who need it, while the
execution precedence graph model is much easier to program against
for everything else.

You're probably about to ask me for a N-paper, so I'll save an email
cycle by explaining my hesitation on that. Bjorn Reece has been
working extensively with me off list to de-wart AFIO when combined
with ASIO such that async socket and async disc i/o seamlessly
interoperate, specifically he's persuaded me to supply async_io_op to
completion handlers which is going to be a technical challenge for me
to implement in a safe and quick way, but I think I am capable though
it's going to hurt. I will start that work item once I clear my maths
coursework (hopefully tomorrow), and file my consulting company's
annual accounts (hopefully end of this week). Also, I might actually
have a job interview soon, which amazes me as I had expected at least
three months to elapse before any jobs in C++ turned up (there are
about six a year total in this region).

Both Artur and Niklas who are leading out Microsoft's work on async
in C++ are aware of AFIO's design approach, and last time I heard
they found themselves coming ever closer to AFIO's design as they
find blocking issues and corner cases in their proposals. I think it
will be more productive for now for me to keep nagging them on their
N-papers rather than write one of my own. After all, being in Europe
and being unemployed makes it extremely tough to attend C++ standards
meetings, and I don't personally rate a N paper's chances if someone
isn't at the meetings to champion it i.e. present in the bar
afterwards to argue merits with people.

Hopefully this explains things. Do point out any problems in my
thoughts.

Bjorn Reese

unread,
Jan 22, 2014, 12:37:31 PM1/22/14
to bo...@lists.boost.org
On 01/21/2014 04:09 PM, Nat Goodspeed wrote:

> Niall, would you be able to propose a more universal model? Please
> read this as a simple invitation rather than a challenge. The goal of

A good place to start is to understand the limitations of the Asio
model.

The first limitation pertains to chained operations. In Asio you chain
operations by initiating the next operation when you receive a callback
from the previous operation. Although we can initiate several operations
at the same time, these operations are not chained. I am going to ignore
scatter I/O here because they have their own limitations (e.g. they
either multiple reads or multiple writes, but not combinations thereof.)

The second limitation is about multiple types. The Asio model assume
that there is a one-to-one correspondence between the type that I
request and the type I receive. This is perfectly fine for Asio because
it just deals with buffers. However, if you create an asynchronous RPC
server using the same kind of callback mechanism as Asio, then you want
request a function of any type and receive a function of a specific
type. In this design you have multiple "return types" (received function
signatures.)

It can be useful to put the second limitation into perspective. The RPC
example above fits best into the event listener mentioned below.
Inspired by a classification by Eric Meijer, we can say that:

1. If we have a single return value of a single type then we use T
(or expected<T>) in the sync case and future<T> in the async case.

2. If we have multiple return values of a single type then we use an
iterator in the sync case and an observer pattern (e.g. signal2 or
Asio callbacks) in the async case.

3. If we have multiple return values of multiple types then we use
a variant<T> visitor in the sync case and an event listener in the
async case.

> the paper seems laudable: accepting an argument that allows the caller
> to specify whether to provide results with a callback, a future or
> suspend-and-resume.

Definitely, and it is a significant step forward.

Niall Douglas

unread,
Jan 22, 2014, 6:55:57 PM1/22/14
to bo...@lists.boost.org
On 22 Jan 2014 at 18:37, Bjorn Reese wrote:

> > Niall, would you be able to propose a more universal model? Please
> > read this as a simple invitation rather than a challenge. The goal of
>
> A good place to start is to understand the limitations of the Asio
> model.

A great post Bjorn. You definitely explained it better than I. And my
public thanks to you for all your work with me on improving AFIO
(I'll be replying to your email soon, I ought to submit my maths
coursework tomorrow).

I should elaborate on the importance of easily chaining operations
into patterns as it's probably non-obvious. Storage i/o, unlike fifo
i/o, does not scale linearly to queue depth due to highly non-linear
variance in callback latencies due to OS caching effects overlaid on
mechanical motors, so ease for the programmer to fiddle with
operation chain patterns is paramount for maximum performance,
especially as it's mainly a trial and error thing given the
complexities of the systems which make up filing systems etc. One
basically designs an access pattern you think ought to be performant
under both cold and warm cache scenarios, try testing it and find
yourself a bit wide from the mark, so now you hunt around for the
goldilocks zone through repeated testing cycles. I haven't personally
found any better way than this yet sadly, storage is so very
non-linear at the micro-level.

Doing this using ASIO style callbacks involves a ton load of cutting
and pasting code around, several iterations of compilation to remedy
the unenforced syntax errors, more iterations of debugging because
now you've broken the storage etc. Alternatively, doing this using
AFIO style chained ops is far easier and quicker because you simply
tweak the op dependency graph you're sending to the async closure
engine to execute, and you let the engine figure out how best to send
it to the OS and ASIO.

There is also an additional debugging option made available because
with a formal op dependency graph one could have code check that your
sequence of reads and writes is power-loss safe and race condition
free with other processes reading and writing the same files. Using
an ASIO style callback model that would be hard without additional
metadata being supplied.

As much as AFIO style async code is a bit heavy due to the use of
futures, for the kind of latencies you see with big chunks of data
being moved around it's an affordable overhead given the convenience.

Gavin Lambert

unread,
Jan 22, 2014, 7:26:30 PM1/22/14
to bo...@lists.boost.org
On 23/01/2014 06:37, Quoth Bjorn Reese:
> The first limitation pertains to chained operations. In Asio you chain
> operations by initiating the next operation when you receive a callback
> from the previous operation. Although we can initiate several operations
> at the same time, these operations are not chained. I am going to ignore
> scatter I/O here because they have their own limitations (e.g. they
> either multiple reads or multiple writes, but not combinations thereof.)

I'm having trouble understanding this. A chained operation must by
definition be one operation being called as some other operation
completes, and can never possibly refer to operations running in parallel.

You can certainly have multiple operations related in fashions other
than chains, either by giving them the same callback target object, or
by calling them through the same strand, or by calling them on some
object that has some internal policy about how concurrent operations are
managed, or by making a new composite operation that internally manages
sub-operations in a fashion invisible to the caller.

> The second limitation is about multiple types. The Asio model assume
> that there is a one-to-one correspondence between the type that I
> request and the type I receive. This is perfectly fine for Asio because
> it just deals with buffers. However, if you create an asynchronous RPC
> server using the same kind of callback mechanism as Asio, then you want
> request a function of any type and receive a function of a specific
> type. In this design you have multiple "return types" (received function
> signatures.)

Or each "function" is just a custom async operation. You don't request
a function and then try to interrogate it, you just execute operations.

It's pretty easy to define new I/O objects in the ASIO model and give
them whatever async functionality you want.

Granted, it's all in-process, so you'd have the added complication of
injecting some sort of serialisation and remoting to make it RPC, but
it's still fairly readily achievable, I think.

Bjorn Reese

unread,
Jan 24, 2014, 7:13:43 AM1/24/14
to bo...@lists.boost.org
On 01/23/2014 01:26 AM, Gavin Lambert wrote:

> I'm having trouble understanding this. A chained operation must by
> definition be one operation being called as some other operation
> completes, and can never possibly refer to operations running in parallel.

Think of the execution of chained operations as analogous to the
execution of CPU instructions.

Niall has already explained the situation where all chained operations
should be passed to the scheduler to avoid latency. This is analogous
to avoid flushing the CPU pipeline.

You can also have chained operations that are commutative, so the
scheduler can reorder them for better performance. This is analogous
to out-of-order CPU execution.

> Or each "function" is just a custom async operation. You don't request
> a function and then try to interrogate it, you just execute operations.

Can you elaborate? If I have the following event listener, how would
it look and be used with your suggestion?

class gui_event {
public:
virtual void on_key(int key);
virtual void on_help(int x, int y);
};

Niall Douglas

unread,
Jan 24, 2014, 7:46:05 AM1/24/14
to bo...@lists.boost.org
On 24 Jan 2014 at 13:13, Bjorn Reese wrote:

> > I'm having trouble understanding this. A chained operation must by
> > definition be one operation being called as some other operation
> > completes, and can never possibly refer to operations running in parallel.
>
> Think of the execution of chained operations as analogous to the
> execution of CPU instructions.
>
> Niall has already explained the situation where all chained operations
> should be passed to the scheduler to avoid latency. This is analogous
> to avoid flushing the CPU pipeline.

That's a good analogy, but there are significant differences in
orders of scaling. Where a pipeline stall in a CPU may cost you 10x,
and a main memory cache line miss may cost you 200x, you're talking a
50,000x cost to a warm filing system cache miss. There are also very
different queue depth scaling differences, so for example the SATA
AHCI driver on Windows gets exponentially slow if you queue more than
a few hundred ops to it simultaneously, whereas the Windows FS cache
layer will happily scale to tens of thousands of simultaneous ops
without blinking. How many FS cache layer ops turn into how many SATA
AHCI driver ops is very non-trivial, and essentially it becomes a
statistical analysis of black box behaviour which I would assume is
not even static across OS releases.

> You can also have chained operations that are commutative, so the
> scheduler can reorder them for better performance. This is
analogous
> to out-of-order CPU execution.

Indeed that is the very point of chaining: you can say to AFIO that
this group A here of operations can complete in any order and I don't
care, but I don't want that this group B here of operations to occur
until the very last operation in group A completes. This affords
maximum scope to the OS kernel to reorder operations to complete as
fast as possible without losing data integrity/causing races. It's
this sort of metadata that the ASIO callback model simply doesn't
specify.

It's actually really unfortunate that more of this stuff isn't
documented explicitly in OS documentation. If you're into filing
systems, then you know it, but otherwise people just assume that
reading and writing persistent data is just like any other kind of
i/o. The Unix abstraction of making fd's identical for any kind of
i/o when there are very significant differences underneath in
semantics is mainly to blame I assume.

Gavin Lambert

unread,
Jan 27, 2014, 5:16:16 PM1/27/14
to bo...@lists.boost.org
On 25/01/2014 01:13, Quoth Bjorn Reese:
>> Or each "function" is just a custom async operation. You don't request
>> a function and then try to interrogate it, you just execute operations.
>
> Can you elaborate? If I have the following event listener, how would
> it look and be used with your suggestion?
>
> class gui_event {
> public:
> virtual void on_key(int key);
> virtual void on_help(int x, int y);
> };

You flip it around. Instead of having an event listener object that is
registered on some event provider source, where the provider source
invokes the methods explicitly when an event arrives, you have anything
that is interested in events invoke an async request on the source
object. So it'd be something more like this:

class gui_source {
public:
// actually using templates to make the callback more generic
void async_key(void (*callback)(error_code ec, int key));
void async_help(void (*callback)(error_code ec, int x, int y));
};

The code on the receiving side just handles a callback instead of
receiving an explicit call, but otherwise it's basically the same.

You still have to externally define your threading model (eg. GUI events
typically assume they're always called back on the same thread), and the
policy on whether events are forwarded to all listeners or only
first-come-first-served, if callbacks are supposed to be ordered in some
way, and if callbacks are "persistent" (request once, called many times)
or "single-shot" (one callback per request, as in ASIO), and if the
latter, what happens if an event arrives when a particular listener is
in between listen calls.

Because of the extra complexity, it's definitely *easier* to use the
direct-notifier pattern, which is why most UI frameworks do that. But
it's *possible* to use these successfully even with single-shot
callbacks -- just look at AJAX long-polling for a real-world example.

Bjorn Reese

unread,
Jan 31, 2014, 4:28:39 AM1/31/14
to bo...@lists.boost.org
On 01/27/2014 11:16 PM, Gavin Lambert wrote:

> You flip it around. Instead of having an event listener object that is
> registered on some event provider source, where the provider source
> invokes the methods explicitly when an event arrives, you have anything
> that is interested in events invoke an async request on the source
> object. So it'd be something more like this:
>
> class gui_source {
> public:
> // actually using templates to make the callback more generic
> void async_key(void (*callback)(error_code ec, int key));
> void async_help(void (*callback)(error_code ec, int x, int y));
> };
>
> The code on the receiving side just handles a callback instead of
> receiving an explicit call, but otherwise it's basically the same.

This is an intesting idea. Although it does involve a lot of plumbing,
I agree that it can be done.

As we are exploring the limitations of the Asio model, let me introduce
a use case that is difficult to do within this kind of parallel
initiator-callback paradigm.

Consider a secure RPC server, whose full API only can be used if the
client has the correct privileges. For simplicity, let us assume that
this is a single-client server. There are two modes: unauthenticated
and authenticated. In unauthenticated mode, the server should reject all
but the authentication requests. The way you typically would do this is
to have separate implementations of the API for each mode, and when the
client has been authenticated, the server will switch from the
unauthenticated to the authenticated implementation. This wholesale
replacement of the underlying implementation is much more difficult
to do with the parallel initiator-callback style. We could solve the
problem with another level of indirection, but that would effectively
re-introduce the event listener.

Gavin Lambert

unread,
Feb 2, 2014, 5:05:30 PM2/2/14
to bo...@lists.boost.org
On 31/01/2014 22:28, Quoth Bjorn Reese:
> Consider a secure RPC server, whose full API only can be used if the
> client has the correct privileges. For simplicity, let us assume that
> this is a single-client server. There are two modes: unauthenticated
> and authenticated. In unauthenticated mode, the server should reject all
> but the authentication requests. The way you typically would do this is
> to have separate implementations of the API for each mode, and when the
> client has been authenticated, the server will switch from the
> unauthenticated to the authenticated implementation. This wholesale
> replacement of the underlying implementation is much more difficult
> to do with the parallel initiator-callback style. We could solve the
> problem with another level of indirection, but that would effectively
> re-introduce the event listener.

The way to handle that, I would think, would be to have the "public" API
be more limited in scope (not an identical copy that mostly returns
not-authenticated errors), and to provide an "async_authenticate"
request that calls back (single-shot) with an interface that provides
the complete API.

Async requests wouldn't carry over between the authenticated and
unauthenticated API, but you wouldn't want that anyway -- most clients
would authenticate first before making any other requests anyway, and
one of the clients might be a "broker" that wants to maintain multiple
independently authenticated connections with different credentials.

Granted that's outside the scope of a single-client server, but the
design would be easier to scale if it turns out that single-client isn't
sufficient.


Again, I'm not saying that Asio-style callbacks are the "best" way of
implementing RPC or UI models. Just that it's not impossible to do so.
(But at some level, you need a serialisation+networking layer to
actually transfer requests between processes or machines. This can be
made completely transparent to both server and client, but something has
to be able to translate all the possible types of request.)

Bjorn Reese

unread,
Feb 6, 2014, 5:27:30 AM2/6/14
to bo...@lists.boost.org
On 02/02/2014 11:05 PM, Gavin Lambert wrote:

> The way to handle that, I would think, would be to have the "public" API
> be more limited in scope (not an identical copy that mostly returns
> not-authenticated errors), and to provide an "async_authenticate"
> request that calls back (single-shot) with an interface that provides
> the complete API.

Authentication is just one example of how the API may need to change its
operational mode dynamically. You could also have a maintenance mode,
a defensive mode (against denial-of-service attacks), a budget vs
premium mode, and so on.

How do I change from defensive mode back to normal mode?

> Again, I'm not saying that Asio-style callbacks are the "best" way of
> implementing RPC or UI models. Just that it's not impossible to do so.

It may just be my lack of imagination, but I cannot see how to do a
mode change with the Asio model.

> (But at some level, you need a serialisation+networking layer to
> actually transfer requests between processes or machines. This can be
> made completely transparent to both server and client, but something has
> to be able to translate all the possible types of request.)

Yes, and that applies to any model, so I am not holding that against
the Asio model.

Gavin Lambert

unread,
Feb 6, 2014, 5:07:06 PM2/6/14
to bo...@lists.boost.org
On 6/02/2014 23:27, Quoth Bjorn Reese:
> On 02/02/2014 11:05 PM, Gavin Lambert wrote:
>
>> The way to handle that, I would think, would be to have the "public" API
>> be more limited in scope (not an identical copy that mostly returns
>> not-authenticated errors), and to provide an "async_authenticate"
>> request that calls back (single-shot) with an interface that provides
>> the complete API.
>
> Authentication is just one example of how the API may need to change its
> operational mode dynamically. You could also have a maintenance mode,
> a defensive mode (against denial-of-service attacks), a budget vs
> premium mode, and so on.
>
> How do I change from defensive mode back to normal mode?

How are you imagining that the modes change?

For example if a client can dynamically upgrade from budget to premium
mode then that's just another case of authentication.

If it's a global server state change, then probably it would disconnect
all currently subscribed clients (calling them back with an error code)
and let them reconnect to its new API provider, which might refuse
certain operations entirely or place different limits on them.

Bjorn Reese

unread,
Feb 8, 2014, 8:14:00 AM2/8/14
to bo...@lists.boost.org
On 02/06/2014 11:07 PM, Gavin Lambert wrote:

> How are you imagining that the modes change?

As we are exploring the limitations of the Asio model, I would say that
the mode can change in any imaginable manner: it could alternate between
two modes on each request, or it could change only a subset of the
requests rather than all, just to name two. And this would be completely
transparent to the client, because it the general case it is not simply
a matter of accepting or rejecting requests, but about processing them
differently.

So I guess that it boils down to:

1. Can I replace a continuation after the function has been
initiated?
2. Can I group several of such replacements so that they will be
replaced at the same time (or at least before the next event
occurs)?

Gavin Lambert

unread,
Feb 9, 2014, 7:20:42 PM2/9/14
to bo...@lists.boost.org
On 9/02/2014 02:14, Quoth Bjorn Reese:
> As we are exploring the limitations of the Asio model, I would say that
> the mode can change in any imaginable manner: it could alternate between
> two modes on each request, or it could change only a subset of the
> requests rather than all, just to name two. And this would be completely
> transparent to the client, because it the general case it is not simply
> a matter of accepting or rejecting requests, but about processing them
> differently.

I still think you're imagining a scenario that doesn't make sense in
practice.

> So I guess that it boils down to:
>
> 1. Can I replace a continuation after the function has been
> initiated?

You have to have something that is holding the continuation so that it
can be called later. There's no reason *in principle* why this cannot
be handed from one object to another as desired as long as the
conceptual operation is still in progress as far as the caller is
concerned. (In fact, this actually happens in ASIO -- a pending
operation is held by the I/O object until it completes, then is passed
to the generic scheduler to execute the user handler.)

If something significant happens that the caller is likely to want to
know about (such as changing license state etc), then it may be
worthwhile cancelling the connection (by calling back with an error
code) and getting the client to reconnect, because it might want to set
up a different set of pending operations / subscriptions given the new
state.

When there are multiple pending operations, there's no reason why you
can't cancel some, initiate others, and leave the rest going (depending
on what makes sense for the particular change in question).

(I said "in principle" above because the current ASIO model makes heavy
use of templates for performance and to enable auxiliary features such
as strands and custom allocation; the way that it's implemented at the
moment these can't survive a type-erasure boundary such as would be
required to pass through an RPC system. Instead you'd have to replicate
these on each side -- but you'd probably want that anyway.)

> 2. Can I group several of such replacements so that they will be
> replaced at the same time (or at least before the next event
> occurs)?

In the context of "a thing that implements ASIO-like callbacks", sure.
As I said above, you'd just have to move the list of pending operations
from one implementation to the other, and the caller wouldn't know the
difference as long as whatever handle it uses to make async requests is
still valid. (Things get hairier if you're processing events on
multiple threads, though.)

In the context of ASIO specifically, not really, at least not once the
operations have been marked as complete and ready to execute. But I'm
still not really sure why you'd want to.
Reply all
Reply to author
Forward
0 new messages