The context switch is a small part of the overhead. In Seastar the
switch is mediated by the scheduler and there is extra overhead due to
future/promise integration. I doubt that switching to boost.context will
bring a significant benefit.
Here's a perf report from thead_context_switch_test:
6.62% thread_context_ thread_context_switch_test [.]
seastar::reactor::run_tasks ▒
6.48% reactor-1 thread_context_switch_test [.]
seastar::reactor::run_tasks ▒
5.27% reactor-1 thread_context_switch_test [.]
seastar::basic_semaphore<seastar::semaphore_default_exception_factory,
std::chrono::_V2::steady_clock>::wait ▒
4.49% thread_context_ thread_context_switch_test [.]
seastar::basic_semaphore<seastar::semaphore_default_exception_factory,
std::chrono::_V2::steady_clock>::wait ▒
4.43% thread_context_ thread_context_switch_test [.]
seastar::reactor::add_task ▒
4.41% reactor-1 thread_context_switch_test [.]
seastar::reactor::add_task ▒
4.13% reactor-1 thread_context_switch_test [.]
seastar::noncopyable_function<void
()>::direct_vtable_for<context_switch_tester::_t1::{lambda()#1}>::call ▒
4.07% reactor-1 thread_context_switch_test [.]
seastar::noncopyable_function<void
()>::direct_vtable_for<context_switch_tester::_t2::{lambda()#1}>::call ▒
3.98% thread_context_ thread_context_switch_test [.]
seastar::noncopyable_function<void
()>::direct_vtable_for<context_switch_tester::_t1::{lambda()#1}>::call ▒
3.90% thread_context_ thread_context_switch_test [.]
seastar::internal::future_base::do_wait ▒
3.86% reactor-1 thread_context_switch_test [.]
seastar::internal::future_base::do_wait ▒
3.84% thread_context_ thread_context_switch_test [.]
seastar::noncopyable_function<void
()>::direct_vtable_for<context_switch_tester::_t2::{lambda()#1}>::call ▒
3.22% thread_context_ thread_context_switch_test [.]
seastar::(anonymous namespace)::thread_wake_task::run_and_dispose ▒
2.61% reactor-1 thread_context_switch_test [.]
seastar::(anonymous namespace)::thread_wake_task::run_and_dispose ▒
1.93% thread_context_ thread_context_switch_test [.]
seastar::memory::cpu_pages::allocate_small ▒
1.85% reactor-1
libc-2.31.so [.] __sigsetjmp ▒
1.74% reactor-1 thread_context_switch_test [.]
seastar::memory::cpu_pages::allocate_small ▒
1.70% thread_context_
libc-2.31.so [.] __sigsetjmp ◆
1.68% thread_context_
libpthread-2.31.so [.]
__GI___pthread_cleanup_upto ▒
1.63% reactor-1
libpthread-2.31.so [.]
__GI___pthread_cleanup_upto ▒
1.45% thread_context_
libc-2.31.so [.]
__libc_siglongjmp ▒
1.42% reactor-1
libc-2.31.so [.]
__libc_siglongjmp ▒
1.38% reactor-1
libc-2.31.so [.] __longjmp ▒
1.35% thread_context_
libc-2.31.so [.] __longjmp ▒
1.28% thread_context_ thread_context_switch_test [.]
seastar::internal::promise_base::clear ▒
1.26% thread_context_ thread_context_switch_test [.]
seastar::internal::promise_base::promise_base ▒
1.26% reactor-1 thread_context_switch_test [.]
seastar::internal::promise_base::clear ▒
1.24% reactor-1 thread_context_switch_test [.]
seastar::internal::promise_base::promise_base ▒
1.04% reactor-1 thread_context_switch_test [.]
seastar::memory::cpu_pages::free ▒
1.04% reactor-1 thread_context_switch_test [.]
seastar::jmp_buf_link::switch_in ▒
1.03% thread_context_ thread_context_switch_test [.]
seastar::memory::cpu_pages::free ▒
0.95% thread_context_ thread_context_switch_test [.]
seastar::jmp_buf_link::switch_out ▒
0.92% thread_context_ thread_context_switch_test [.]
seastar::jmp_buf_link::switch_in ▒
0.87% reactor-1
libc-2.31.so [.]
_longjmp_unwind ▒
0.80% thread_context_
libc-2.31.so [.]
_longjmp_unwind ▒
0.72% reactor-1 thread_context_switch_test [.]
seastar::jmp_buf_link::switch_out ▒
0.63% thread_context_ thread_context_switch_test [.] operator new ▒
0.59% thread_context_
libc-2.31.so [.] __sigjmp_save ▒
0.57% thread_context_ thread_context_switch_test [.]
seastar::memory::free ▒
0.55% reactor-1 thread_context_switch_test [.] operator new ▒
0.55% reactor-1 thread_context_switch_test [.]
seastar::memory::free ▒
0.54% reactor-1
libc-2.31.so [.] __sigjmp_save
Although that's doing needless work, there is memory allocation that
shouldn't be there. Still, it's clear that setjmp/longjmp do not dominate.
> Would there still be reason to migrate to boost.context for seastar::threads now that c++20 coroutines are coming?
Threads might still be faster when there are frequent
potentially-blocking operations that rarely block. For example, writing
small buffers to an output_stream backed by a file. Due to write-behind,
the output_stream rarely needs to block.
Threads may be able to keep more state in registers in this case. Or the
compiler may be able to optimize coroutines in an equivalent way. We
still need to check.
>
> (According to my understanding, I should still use seastar::threads for implementing (non-tail) recursive algorithms, as c++20 coroutines do not support nested yielding/awaiting and non-tail recursion. Is my understanding indeed correct?)
Well, you shouldn't be recursing on the default 128k stack, but yes.
Threads and coroutines are more-or-less equivalent with coroutines
requiring less memory.
> And what would be approx. the impact of cache pollution of stackful coroutines on the running time overhead? Won’t those in the typical case dominate the context switching time itself? (In that case, a migration to boost.context would not pay off from an overall perspective)
It's very hard to quantify these things. I can hand-wave all day about
it, but in the end it's hard to provide hard measurements.
For our application, we only use threads when concurrency is limited, so
most of the change will be from continuations to coroutines, and the
goal is to make the code simpler, not speed it up (though we expect some
speedup too).
> Kind regards,
> Niek
>