[boost] Review of proposed Boost.Async

193 views

Skip to first unread message

Marcelo Zimbres Silva via Boost

unread,

Aug 13, 2023, 3:46:13 AM8/13/23

to Boost, Marcelo Zimbres Silva

Hi, this is my review of the proposed Boost.Async. Before I issue my
final decision (ACCEPT/REJECT) I would like to ask the author some
questions.

First of all, I have plenty of reasons to want a library such as
Boost.Async into Boost. Some of them are

* A common concern I hear from my peers is that Asio is too difficult
to get started with. One of the reasons for that is that its Universal
Asynchronous Model built around the completion token is too generic
for most use cases and in some cases encourages just bad practice like
the use_future token. A library like Boost.Async that adds a thin
coroutine-only layer over Asio and its corresponding facilities like
select, gather etc. will lower the entrance barrier in the Asio
asynchronous world.

* Docs: Even Asio experts agree that Asio documentation is not easily
digestible (async_initiate I am looking at you) and I think it will be
difficult for its author alone to address users' complaints given how
low-level and generic Asio is. Also, regular users should not be
bothered with many of the details. Boost.Async documentation is linear
and to the point. It uses names that are more easily recognized by
people coming from other languages, like select and with.

* NodeJs and Go Tokio, etc. have shown us that successful networking
libraries will flourish once a decent asynchronous environment is
available. We have more than that, Asio is robust and battle tested,
but I am afraid it might not be the correct ground on which high-level
abstractions should be prototyped and evolved, not every library needs
Duff's device to perform some milliseconds better. As I mentioned
above, it is perhaps too generic and low-level and a layer over it
might be necessary. Boost.Async looks like the correct step in that
direction. We will be one step closer to writing code that is as
simple as python, NodeJs, etc. but that runs at C++ speed.

* It will help us gather more field experience with C++20 coroutines
in domains where C++ is extensively used, like high-performance
network servers.

* Boost.Async has good defaults, for example a single threaded
context. Too many people play around with multi-thread io_context and
strands not knowing it might actually have a negative impact on
performance. This is also likely to play well with other network Boost
networking libraries like Boost.Beast, Boost.MySql and Boost.Redis.
Also, the automatic installation of signal handling is a good thing.

Q1: Default completion token for other Boost libraries
================================================================

I guess it will be a very common thing for all apps using Boost.Async
to have to change the default completion token to use_op or use_task.
Like you do in the examples

> using tcp_acceptor = async::use_op_t::as_default_on_t<tcp::acceptor>;
> using tcp_socket = async::use_op_t::as_default_on_t<tcp::socket>;

Could this be provided automatically, at least for other Boost
libraries? I know this would be a lot of work, but it would make user
code less verbose and users would not have to face the token concept
at first.

Q2: Lazy vs Eager
================================================================

> It’s an eager coroutine and recommended as the default;

Great, we can experiment with eagerness and laziness. But why is an
eager coroutine recommended as default?

Q3: async_ready looks great
================================================================

> We can however implement our own ops, that can also utilize the
> async_ready optimization. To leverage this coroutine feature, async
> provides an easy way to create a skipable operation:

I think I have a use case for this feature: My first implementation of
the RESP3 parser for Boost.Redis was based on Asio's async_read_until,
which like every Asio async function, calls completion as if by post.
The cost of this design is however high in situations where the next
\r\n delimiter is already in the buffer when async_read_until is
called again. The resulting rescheduling with post is actually
unnecessary and greatly impacts performance, being able to skip the
post has performance benefits. But what does *an easy way to create a
skipable operation* actually mean? Does it

- avoid a suspension point?
- avoid a post?
- act like a regular function call?

Q4: Synchronization primitives
================================================================

Will this library ever add synchronization primitives like async
mutexes, condition variables, barriers etc. Or are they supposed to be
used from an external library like proposed Boost.Sem.

Q5: Boost.Async vs Boost.Asio
================================================================

I use C++20 coroutines whenever I can but know very little about their
implementation. They just seem to work in Asio and look very flexible
with use_awaitable, deferred, use_promise and use_coro. What advantage
will Boost.Async bring over using plain Asio in regards to coroutine?

NOTE1
================================================================

> Please state your experience with C++ coroutines and ASIO in your
> review, and how many hours you spent on the review.

I have spent a bit more than a day reading the docs, writing the
review and integrating Boost.Redis.

My experience with C++20 coroutines is limited to using them with Asio.

I would also like to thank Klemens for submitting Boost.Async and
Niall for offering to be its review manager.

Marcelo

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Klemens Morgenstern via Boost

unread,

Aug 13, 2023, 4:49:47 AM8/13/23

to bo...@lists.boost.org, Klemens Morgenstern

On Sun, Aug 13, 2023 at 3:46 PM Marcelo Zimbres Silva via Boost
<bo...@lists.boost.org> wrote:
>
> Hi, this is my review of the proposed Boost.Async. Before I issue my
> final decision (ACCEPT/REJECT) I would like to ask the author some
> questions.

Thanks for the review and letting me address those questions first!

>
> Q1: Default completion token for other Boost libraries
> ================================================================
>
> I guess it will be a very common thing for all apps using Boost.Async
> to have to change the default completion token to use_op or use_task.
> Like you do in the examples
>
> > using tcp_acceptor = async::use_op_t::as_default_on_t<tcp::acceptor>;
> > using tcp_socket = async::use_op_t::as_default_on_t<tcp::socket>;
>
> Could this be provided automatically, at least for other Boost
> libraries? I know this would be a lot of work, but it would make user
> code less verbose and users would not have to face the token concept
> at first.

If could be, but I am also experimenting with how an async.io library
could look.
I.e. one that is async only (co_await stream.read()) and banished the
asio complexity
to the translations units. You can look at my experiments here
https://github.com/klemens-morgenstern/async/pull/8

>
> Q2: Lazy vs Eager
> ================================================================
>
> > It’s an eager coroutine and recommended as the default;
>
> Great, we can experiment with eagerness and laziness. But why is an
> eager coroutine recommended as default?

Because that's usually what I would want. If I have a coroutine
`async_do_the_thing()`
I would expect it to do the thing even if I don't await the result.
I think this is intuitive especially for those unfamiliar with asio.

>
> Q3: async_ready looks great
> ================================================================
>
> > We can however implement our own ops, that can also utilize the
> > async_ready optimization. To leverage this coroutine feature, async
> > provides an easy way to create a skipable operation:
>
> I think I have a use case for this feature: My first implementation of
> the RESP3 parser for Boost.Redis was based on Asio's async_read_until,
> which like every Asio async function, calls completion as if by post.
> The cost of this design is however high in situations where the next
> \r\n delimiter is already in the buffer when async_read_until is
> called again. The resulting rescheduling with post is actually
> unnecessary and greatly impacts performance, being able to skip the
> post has performance benefits. But what does *an easy way to create a
> skipable operation* actually mean? Does it
>
> - avoid a suspension point?
> - avoid a post?
> - act like a regular function call?

Theren are two mechanisms at work here:

- if you provide a ready() function in a custom op, it will avoid
suspension altogether, thus be like a normal function call
- if immediate completion is awaitable, the coroutine will suspend,
but resume rightaway. Thus avoid a post.

>
> Q4: Synchronization primitives
> ================================================================
>
> Will this library ever add synchronization primitives like async
> mutexes, condition variables, barriers etc. Or are they supposed to be
> used from an external library like proposed Boost.Sem.

It has channels that can do the mutex & barrier.
I don't like using names like mutex et al., because they imply they're
thread safe.
Since async is single-threaded you can actually use std::mutex if you
need to cross threads btw.

If we only have one mechanism, channels are the most versatile,
but I think there is indeed a need for something condition variable like.
I just don't know what it is yet, I am thinking maybe a pub/sub utility,
like a multicast-channel or something might do the trick.

I would however like to base this on user experience.
The channel model has proven itself in other languages, so I am
confident enough.

>
> Q5: Boost.Async vs Boost.Asio
> ================================================================
>
> I use C++20 coroutines whenever I can but know very little about their
> implementation. They just seem to work in Asio and look very flexible
> with use_awaitable, deferred, use_promise and use_coro. What advantage
> will Boost.Async bring over using plain Asio in regards to coroutine?

It's open to any awaitable, i.e. a user can just co_await whatever he wants.
asio prevents this by design because it has way less it can assume
about the environment.
That is, asio::awaitable cannot await anything other than itself and
an async op, not even asio::experimental::coro.

Furthermore all of those are meant for potentially threaded
environment so everything they do needs to be an async_op,
i.e. operator|| internally does multiple co_spawns through a `parallel_group`.

Because async can assume more things about how it's run, it can
provide a loss-less select, which `operator||` cannot do.

Likewise the channels work (mostly) without post, because they just
switch from one coro to the other,
whereas asio's channels need to use posting etc.

So in short: it's more efficient because it's optimized for single
threaded environments
it gives you better synchronization mechanisms and better extensibility.

I don't see the asio coroutines as competition, they just solve a
different use-case.
If you were to ask me which to use in boost.redis, I'd tell you to go
for asio's.
But if you wanted to write a small chat server based on boost
libraries, I'd recommend async.

Christian Mazakas via Boost

unread,

Aug 14, 2023, 11:04:41 AM8/14/23

to bo...@lists.boost.org, Christian Mazakas

> It's open to any awaitable, i.e. a user can just co_await whatever he wants.
> asio prevents this by design because it has way less it can assume
> about the environment.
> That is, asio::awaitable cannot await anything other than itself and
> an async op, not even asio::experimental::coro.

Where is this shown in the docs?

I'd like to see a fully working example of a custom awaitable and
I didn't see it in the docs on a cursory glance.

> I don't see the asio coroutines as competition, they just solve a
> different use-case.

In general, I'm not sure I see much compelling difference between this
library and whatever Asio provides.

- Christian

Marcelo Zimbres Silva via Boost

unread,

Aug 14, 2023, 3:35:31 PM8/14/23

to Klemens Morgenstern, Marcelo Zimbres Silva, bo...@lists.boost.org

On Sun, 13 Aug 2023 at 10:49, Klemens Morgenstern
<klemensdavi...@gmail.com> wrote:
> > Q1: Default completion token for other Boost libraries
> > ================================================================
> >
> > I guess it will be a very common thing for all apps using Boost.Async
> > to have to change the default completion token to use_op or use_task.
> > Like you do in the examples
> >
> > > using tcp_acceptor = async::use_op_t::as_default_on_t<tcp::acceptor>;
> > > using tcp_socket = async::use_op_t::as_default_on_t<tcp::socket>;
> >
> > Could this be provided automatically, at least for other Boost
> > libraries? I know this would be a lot of work, but it would make user
> > code less verbose and users would not have to face the token concept
> > at first.
>
> If could be, but I am also experimenting with how an async.io
> library could look. I.e. one that is async only (co_await
> stream.read()) and banished the asio complexity to the translations
> units. You can look at my experiments here
> https://github.com/klemens-morgenstern/async/pull/8

This would be great, but then why not get this merged before the
Boost.Review? One of the parts of Boost.Async that I am finding more
valuable is that it is user-friendly frontend to Asio. The PR above
would put even more weight on that fact. It looks like a very large PR
that should not be missed in the Boost Review.

Also, merging at a later point loses an advertising opportunity since
some people might not feel compelled to come back to look at the
release notes.

> > Q3: async_ready looks great
> > ================================================================
> >
> > > We can however implement our own ops, that can also utilize the
> > > async_ready optimization. To leverage this coroutine feature, async
> > > provides an easy way to create a skipable operation:
> >
> > I think I have a use case for this feature: My first implementation of
> > the RESP3 parser for Boost.Redis was based on Asio's async_read_until,
> > which like every Asio async function, calls completion as if by post.
> > The cost of this design is however high in situations where the next
> > \r\n delimiter is already in the buffer when async_read_until is
> > called again. The resulting rescheduling with post is actually
> > unnecessary and greatly impacts performance, being able to skip the
> > post has performance benefits. But what does *an easy way to create a
> > skipable operation* actually mean? Does it
> >
> > - avoid a suspension point?
> > - avoid a post?
> > - act like a regular function call?
>
> Theren are two mechanisms at work here:
>
> - if you provide a ready() function in a custom op, it will avoid
> suspension altogether, thus be like a normal function call
> - if immediate completion is awaitable, the coroutine will suspend,
> but resume rightaway. Thus avoid a post.

I think this needs more detailed examples for each individual
optimization possibility (act like a function call, skip a post). The
doc says

> Do the wait if we need to

but refers to

void initiate(async::completion_handler<system::error_code> complete) override
{
tim.async_wait(std::move(complete));
}

which clearly does not do any *if needed* check.

> While the above is used with asio, you can also use these handlers
> with any other callback based code.

IMO this statement also needs to be reformulated and perhaps given an example.

> > Q5: Boost.Async vs Boost.Asio
> > ================================================================
> >
> > I use C++20 coroutines whenever I can but know very little about their
> > implementation. They just seem to work in Asio and look very flexible
> > with use_awaitable, deferred, use_promise and use_coro. What advantage
> > will Boost.Async bring over using plain Asio in regards to coroutine?
>
> It's open to any awaitable, i.e. a user can just co_await whatever
> he wants. asio prevents this by design because it has way less it
> can assume about the environment. That is, asio::awaitable cannot
> await anything other than itself and an async op, not even
> asio::experimental::coro.

I am trying to make sense of this statement. Are you referring to what
is shown in the example/delay_op.cpp example? Do the docs teach how to
write an awaitable so that I can profit from using Boost.Async? How
often will users have to do that? Or are the default awaitables
provided by the library already good enough (as shown in the
benchmarks)?

IIUC, this would be the strongest selling point of this library in
comparison to using plain Asio so the docs could put more weight on
that.

> Furthermore all of those are meant for potentially threaded
> environment so everything they do needs to be an async_op,
> i.e. operator|| internally does multiple co_spawns through a `parallel_group`.
>
> Because async can assume more things about how it's run, it can
> provide a loss-less select, which `operator||` cannot do.

What is a loss-less select? In any case, I find it great that we can
have so much performance improvement by being able to assume a
single-threaded environment.

I must say however that I am surprised that my plain-Asio code is
running slower than it could although I am doing single
threaded-io_contexts.

> Likewise the channels work (mostly) without post, because they just
> switch from one coro to the other, whereas asio's channels need to
> use posting etc.

Symmetric transfer?

> I don't see the asio coroutines as competition, they just solve a
> different use-case.

I don't see why it is not a competition. I only do single-threaded
contexts in Asio which is exactly what Boost.Async does. Please
elaborate.

Klemens Morgenstern via Boost

unread,

Aug 14, 2023, 7:20:58 PM8/14/23

to bo...@lists.boost.org, Klemens Morgenstern

On Mon, Aug 14, 2023 at 11:04 PM Christian Mazakas via Boost
<bo...@lists.boost.org> wrote:
>
> > It's open to any awaitable, i.e. a user can just co_await whatever he wants.
> > asio prevents this by design because it has way less it can assume
> > about the environment.
> > That is, asio::awaitable cannot await anything other than itself and
> > an async op, not even asio::experimental::coro.
>
> Where is this shown in the docs?
>
> I'd like to see a fully working example of a custom awaitable and
> I didn't see it in the docs on a cursory glance.
>

There is no example in the docs, because this is part of the language.
I.e. `co_await std::suspend_never()` is a valid statement in most
coroutines.

It's in the python example:
https://github.com/klemens-morgenstern/async/blob/master/example/python.cpp#L234

> > I don't see the asio coroutines as competition, they just solve a
> > different use-case.
>
> In general, I'm not sure I see much compelling difference between this
> library and whatever Asio provides.

How much experience with "whatever asio provides" do you have?

Ruben Perez via Boost

unread,

Aug 15, 2023, 4:03:26 PM8/15/23

to boost@lists.boost.org List, Ruben Perez

Hi all,

Thanks Klemens for submitting this library.

I've taken a different approach for the review this time.
I've tried to build something useful with it. I was already writing a web chat
application with a C++ backend. It was using Boost.Context coroutines
(the ones you get with boost::asio::spawn). I have rewritten the server
to use Boost.Async coroutines.

This email is not a review (I'll write it later), but just a summary of my
experience using the library, in case anyone finds it useful.

You can see the full source code here:
https://github.com/anarthal/servertech-chat/tree/async-rewrite/server

The application
===============

It is a super-simplistic chatting application, where users can send messages
that get broadcast to other users in the same chat room. The application uses
a HTTP webserver to serve static files, websockets for message broadcasting
and Redis for message persistence. It uses Boost.Asio, Boost.Beast, Boost.Redis
and now Boost.Async (together with other foundational Boost libraries).

My experience
=============

I've been able to rewrite it and make it work. I've had some problems (which I
described below) but have been able to solve them. Control flow was mostly
identical to stackful coroutines, except in some key places I describe below.

I've used the use_op token extensively to interoperate with Beast and Redis.
As proposed, it causes problems with Beast's websockets, as it's moving
the websocket's internal implementation. As a result, after the first initiation
that uses use_op, the websocket is rendered unusable. I could overcome it
by slightly modifying use_op's implementation. This is tracked by
https://github.com/klemens-morgenstern/async/issues/68. It's still under
discussion whether this is a bug in Async or in Beast.

use_op uses a per-operation, stack-based memory pool of 2KB internally. It
creates a completion handler with an associated allocator that will consume
this stack-based memory before calling operator new. The pool gets destroyed
once the co_await expression finishes execution. I had a problem with
Boost.Redis, where redis::connection::async_exec was calling complete() without
releasing the memory allocated in the operation, which led to crashes.
I think it's a problem in Boost.Redis, not in Async. Tracked by
https://github.com/boostorg/redis/issues/140.

By default, use_op converts error codes to exceptions. This is consistent
with Asio's use_awaitable, but it's a setting I personally don't like using.
However, it's easy to adapt it by using asio::as_tuple(async::use_op) as
completion token. This should probably be mentioned in docs.

I'm using async::with to clean-up Redis connections on application exit. I
think it's a very useful feature, and one that Asio doesn't offer. It would be
a nice addition to be able to return values from the with call though,
to make code less verbose. Tracked by
https://github.com/klemens-morgenstern/async/issues/69.

As any Beast-based HTTP server, my code has a TCP acceptor that listens
for connections and launches HTTP sessions. At first, I was running these
sessions as detached promises (by using async::promise::operator+).
This caused a deadlock on finalization. async::co_main does not cancel detached
promises, but does wait for them to finish after receiving a SIGINT.
I solved this by placing my promises in an async::wait_group instead of
detaching them. However, I think this behavior is non-obvious and can cause
problems for people less experienced with Asio. I've raised
https://github.com/klemens-morgenstern/async/issues/72 to track this.

I'm currently using detached promises to broadcast messages over websocket
sessions. Compared to asio::spawn, spawning detached tasks in Boost.Async
is significantly easier.

As you may know, when using Beast's websockets, you need to make sure that
the application doesn't perform two concurrent async reads or writes on the
same websocket at the same time. In this chat server, two concurrent writes
can happen if two messages need to be broadcast at the same time. Thus, I was
using a hand-rolled async_mutex, based on Asio channels, to guarantee mutual
exclusion (unfortunately, there is no asio::mutex).

I tried re-writing my async_mutex using async::channel, but I couldn't, because
my implementation relied on asio::channel::try_send, which Async does not offer.
It would be great if this library could offer async locking primitives,
like mutex, condition_variable or event, like Python's asyncio does.
I don't think this is a requirement for inclusion, though.

I then decided to remove the mutex and base my strategy on channels. Websocket
reads and writes are now managed by a single coroutine. If any other session
needs to write through a different session's websocket, it writes a message
through a channel. The coroutine running the websocket session reads from
the websocket and the channel in parallel, using async::select.

My first, naive implementation used
async::select(websocket.async_read(..., use_op), channel.read()).
This does not work as intended because, if the channel read completes first,
select will cancel the websocket read, which will cause the websocket connection
to close. This can be overcome easily with generators, as they implement
the wait_interrupt mechanism. If the channel read completes first, the generator
can be resumed later, without losing information.

Although there exists an asio::experimental::parallel_group with similar
functionality, I don't think this channel-based strategy could be achieved using
plain Asio, since the non-destructive capabilities of the generator are vital.
parallel_group is also almost undocumented, which makes it difficult to use.

My only concern with channels as they are is that channel.read() will throw an
exception on cancellation. This means that
async::select(websocket_generator, channel.read())
will throw-and-catch an exception internally every time a message arrives
on the websocket. I believe that exceptions should be exceptional,
and it's not the case here. It makes debugging harder, and may have a
performance impact (which I haven't measured).
https://github.com/klemens-morgenstern/async/issues/76 tracks this.

I've been able to successfully build this project using clang-17 under Linux.
I had problems building wait_group under gcc-12
(https://github.com/klemens-morgenstern/async/issues/73).

Some other minor issues I found:
* https://github.com/klemens-morgenstern/async/issues/75
* https://github.com/klemens-morgenstern/async/issues/66
* https://github.com/klemens-morgenstern/async/issues/71

Some of these issues have already been fixed in the develop branch by Klemens.
If you're trying the library, I'd advise you to use this branch.

Questions
=========

Q1. Is it possible to somehow address the deadlock problem?
Q2. Is it possible to get non-throwing overloads of the channel functions?
Q3. The docs mention that MSVC compilers have broken coroutine implementations.
What compilers/OS does this library target/has been tested with?

Thanks Klemens for making the async world less painful.

Regards,
Ruben.

Reply all

Reply to author

Forward

0 new messages