[boost] TSAN SEGFAULT in Boost.Beast CI

12 views
Skip to first unread message

Richard Hodges via Boost

unread,
Aug 12, 2022, 2:25:20 PM8/12/22
to boost@lists.boost.org List, Richard Hodges
Dear all,

You may have noticed that one of the Beast CI checks fails on the master
and develop branches.

The failure is a segfault in thread sanitizer. Running on my local machine
I get the stack trace embedded below.

This does not happen if Asio is reverted back to version 1.79.0

Chris has confirmed that the change in Asio that caused this is in
asio::spawn.

Note that at this stage there is no reason to suspect that Asio is actually
at fault. I believe it to be a false positive. However, I am an absolute
noob when it comes to the Dark Arts of thread sanitizer. I have no idea
what to blacklist.

This email has two aims:

1. To prevent multiple people wasting their time trying to get to the
bottom of this,
2. To encourage some sage TSAN-maester to emerge who may guide me to
enlightenment.

Here is the error and stack trace:

testing.capture-output
bin.v2/libs/beast/test/beast/http/read.test/gcc-12/debug/link-static/thread-sanitizer-norecover/threading-multi/visibility-hidden/read.run
====== BEGIN OUTPUT ======
beast.http.read
ThreadSanitizer:DEADLYSIGNAL
==1761540==ERROR: ThreadSanitizer: SEGV on unknown address
0x7f6ea78ff000 (pc 0x7f6eaaaba0d0 bp 0x000000000000 sp 0x7f6ea783d910
T1761542)
==1761540==The signal is caused by a READ memory access.
#0 __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1,
20>::Put(__sanitizer::StackTrace, bool*) <null> (libtsan.so.2+0xba0d0)
#1 __tsan::CurrentStackId(__tsan::ThreadState*, unsigned long)
<null> (libtsan.so.2+0x8c48f)
#2 __sanitizer::DD::MutexInit(__sanitizer::DDCallback*,
__sanitizer::DDMutex*) <null> (libtsan.so.2+0xac534)
#3 __tsan::DDMutexInit(__tsan::ThreadState*, unsigned long,
__tsan::SyncVar*) <null> (libtsan.so.2+0x9a3f8)
#4 __tsan::MetaMap::GetSync(__tsan::ThreadState*, unsigned long,
unsigned long, bool, bool) <null> (libtsan.so.2+0xa85dc)
#5 __tsan_atomic32_fetch_add <null> (libtsan.so.2+0x783e9)
#6 __gnu_cxx::__exchange_and_add(int volatile*, int)
/usr/include/c++/12/ext/atomicity.h:66 (read+0x41ad46)
#7 __gnu_cxx::__exchange_and_add_dispatch(int*, int)
/usr/include/c++/12/ext/atomicity.h:101 (read+0x41ad46)
#8 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use()
/usr/include/c++/12/bits/shared_ptr_base.h:187 (read+0x41ad46)
#9 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
/usr/include/c++/12/bits/shared_ptr_base.h:361 (read+0x40d87c)
#10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
/usr/include/c++/12/bits/shared_ptr_base.h:1071 (read+0x41b43c)
#11 std::__shared_ptr<boost::asio::detail::strand_executor_service::strand_impl,
(__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
/usr/include/c++/12/bits/shared_ptr_base.h:1524 (read+0x424db7)
#12 std::shared_ptr<boost::asio::detail::strand_executor_service::strand_impl>::~shared_ptr()
/usr/include/c++/12/bits/shared_ptr.h:175 (read+0x424de3)
#13 boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>,
0ul> const, void>::~invoker() <null> (read+0x486f4d)
#14 boost::asio::detail::executor_op<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>,
0ul> const, void>, boost::asio::detail::recycling_allocator<void,
boost::asio::detail::thread_info_base::default_tag>,
boost::asio::detail::scheduler_operation>::do_complete(void*,
boost::asio::detail::scheduler_operation*, boost::system::error_code
const&, unsigned long) <null> (read+0x49ff36)
#15 boost::asio::detail::scheduler_operation::complete(void*,
boost::system::error_code const&, unsigned long)
boost/asio/detail/scheduler_operation.hpp:40 (read+0x513742)
#16 boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&,
boost::asio::detail::scheduler_thread_info&, boost::system::error_code
const&) boost/asio/detail/impl/scheduler.ipp:492 (read+0x501be9)
#17 boost::asio::detail::scheduler::run(boost::system::error_code&)
boost/asio/detail/impl/scheduler.ipp:210 (read+0x5008af)
#18 boost::asio::io_context::run()
boost/asio/impl/io_context.ipp:63 (read+0x4f54d6)
#19 boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}::operator()() const <null> (read+0x41480d)
#20 void std::__invoke_impl<void,
boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}>(std::__invoke_other,
boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}&&) <null> (read+0x4bb4bc)
#21 std::__invoke_result<boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}>::type
std::__invoke<boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}>(boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}&&) <null> (read+0x4b8c78)
#22 void std::thread::_Invoker<std::tuple<boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) <null>
(read+0x4b5b00)
#23 std::thread::_Invoker<std::tuple<boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}> >::operator()() <null> (read+0x4b1f74)
#24 std::thread::_State_impl<std::thread::_Invoker<std::tuple<boost::beast::test::enable_yield_to::enable_yield_to(unsigned
long)::{lambda()#1}> > >::_M_run() <null> (read+0x4ab02e)
#25 execute_native_thread_routine <null> (libstdc++.so.6+0xdbb72)
#26 __tsan_thread_start_func <null> (libtsan.so.2+0x393ef)
#27 start_thread <null> (libc.so.6+0x8ce2c)
#28 clone3 <null> (libc.so.6+0x1121af)

ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV (/lib64/libtsan.so.2+0xba0d0) in
__sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1,
20>::Put(__sanitizer::StackTrace, bool*)
==1761540==ABORTING

EXIT STATUS: 66
====== END OUTPUT ======




--
Richard Hodges
hodg...@gmail.com
office: +44 2032 898 513
home: +376 861 195
mobile: +376 380 212

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

William Linkmeyer via Boost

unread,
Aug 13, 2022, 10:49:04 AM8/13/22
to bo...@lists.boost.org, William Linkmeyer
That’s interesting. I don’t see any reason why that lock would fail. Clang’s analyzers do come up with false positives sometimes.

If I can get around to it this weekend, I’ll run this under valgrind. If that comes back positive, I’ll run it under a couple different compilers and their default standard libraries; circle, clang, apple clang, msvc. (This tsan looks like it’s using gcc‘s non-extended stdlib.)

But it’ll be a busy weekend, I’m trying to learn Haskell and put together a pet project. So, if anyone has the time, the notes above would be what I’d do.

WL

> On Aug 12, 2022, at 2:25 PM, Richard Hodges via Boost <bo...@lists.boost.org> wrote:
>
> Dear all,

Richard Hodges via Boost

unread,
Aug 13, 2022, 10:53:50 AM8/13/22
to William Linkmeyer, Richard Hodges, bo...@lists.boost.org
On Sat, 13 Aug 2022 at 16:48, William Linkmeyer <wli...@gmail.com> wrote:

> That’s interesting. I don’t see any reason why that lock would fail.
> Clang’s analyzers do come up with false positives sometimes.
>
> If I can get around to it this weekend, I’ll run this under valgrind. If
> that comes back positive, I’ll run it under a couple different compilers
> and their default standard libraries; circle, clang, apple clang, msvc.
> (This tsan looks like it’s using gcc‘s non-extended stdlib.)
>

Thank you William. Your support is greatly appreciated.
For what it's worth, our CI does run tests under valgrind, and they pass.
An example run can be found here:
https://drone.cpp.al/boostorg/beast/326/11/2

William Linkmeyer via Boost

unread,
Aug 13, 2022, 11:06:25 AM8/13/22
to Richard Hodges, William Linkmeyer, bo...@lists.boost.org
If these CI tests are running under a container, could you please link to that containerfile? I have noticed that clang’s tsan in particular behaves differently on different operating systems, even with the same compiler, sysroots and stdlib.

WL

> On Aug 13, 2022, at 10:53 AM, Richard Hodges <hodg...@gmail.com> wrote:
>
> 

William Linkmeyer via Boost

unread,
Aug 13, 2022, 11:09:20 AM8/13/22
to Richard Hodges, William Linkmeyer, bo...@lists.boost.org
(For example, with the same minimal sysroot, compiler and stdlib, tsan may pass on Ubuntu 18.x and fail on 20.x.)

WL

> On Aug 13, 2022, at 11:06 AM, William Linkmeyer <wli...@gmail.com> wrote:
>
> If these CI tests are running under a container, could you please link to that containerfile? I have noticed that clang’s tsan in particular behaves differently on different operating systems, even with the same compiler, sysroots and stdlib.

William Linkmeyer via Boost

unread,
Aug 13, 2022, 11:12:05 AM8/13/22
to Richard Hodges, William Linkmeyer, bo...@lists.boost.org
(And fails in a very similar way to this, in my experience; otherwise stable behavior, segfault, abort.)

WL

> On Aug 13, 2022, at 11:08 AM, William Linkmeyer <wli...@gmail.com> wrote:
>
> (For example, with the same minimal sysroot, compiler and stdlib, tsan may pass on Ubuntu 18.x and fail on 20.x.)

Richard Hodges via Boost

unread,
Aug 13, 2022, 3:54:27 PM8/13/22
to bo...@lists.boost.org, Richard Hodges
This is the drone CI script:
https://github.com/boostorg/beast/blob/develop/.drone/drone.sh

Using the drone images mentioned here:
https://github.com/boostorg/beast/blob/develop/.drone.star

But please note that I also get a SEGFAULT when running this tsan test on
my local machine, which is Fedora 36 running on Intel tin.

Command line to reproduce:
./b2 toolset=gcc thread-sanitizer=norecover link=static variant=debug
libs/beast/test -q -d+2 -j1

Richard Hodges via Boost

unread,
Aug 16, 2022, 1:04:50 PM8/16/22
to bo...@lists.boost.org, Richard Hodges
Just a quick note to say that thanks to Chris Kohlhoff's intervention, this
segfault has been worked around.

The issue is that TSAN becomes confused when tracking a fiber across
threads, so the code has been modified accordingly.
This seems to be a limitation in TSAN rather than poor Boost library code.
The code that manifested this was actually part of the test infrastructure,
not beast itself.

https://stackoverflow.com/a/73375092/2015579

Peter Dimov via Boost

unread,
Aug 16, 2022, 1:36:39 PM8/16/22
to bo...@lists.boost.org, Peter Dimov
Richard Hodges wrote:
> Just a quick note to say that thanks to Chris Kohlhoff's intervention, this
> segfault has been worked around.
>
> The issue is that TSAN becomes confused when tracking a fiber across threads,
> so the code has been modified accordingly.
> This seems to be a limitation in TSAN rather than poor Boost library code.
> The code that manifested this was actually part of the test infrastructure, not
> beast itself.
>
> https://stackoverflow.com/a/73375092/2015579

Good that this is fixed, but I'm still of the opinion that it needs to be reported
to https://github.com/google/sanitizers.
Reply all
Reply to author
Forward
0 new messages