Fibers "SEGV on unknown address"

307 views
Skip to first unread message

Cynthia Coan

unread,
Sep 9, 2020, 12:39:30 PM9/9/20
to thread-sanitizer
Hey All,

I'm currently trying to add sanitizer support to an internal Fibers library, I figured out how to perform ASAN, but on TSAN I'm getting errors:

```
ThreadSanitizer:DEADLYSIGNAL
==12==ERROR: ThreadSanitizer: SEGV on unknown address 0x60000212fff8 (pc 0x00000031dd2b bp 0x7fff3added50 sp 0x7fff3added28 T12)
==12==The signal is caused by a WRITE memory access.
ThreadSanitizer:DEADLYSIGNAL
ThreadSanitizer: nested bug in the same thread, aborting.
```

(I've put a full backtrace from lldb on a gist: https://gist.github.com/SecurityInsanity/3fb238de1140cc4202ef46218874d48c)

I'm a bit unclear though if this is because I'm perhaps integrating wrong, or if the things I'm doing aren't currently supported inside of TSAN. Based off of reading: https://reviews.llvm.org/D54889 . It seems like there is support for setjmp/longjmp, but I see some messages on that review thread that also imply support wasn't merged in for what I was doing.

To give an idea of what my code is doing, it's using `makecontext()`/`setcontext()` initially, and then once we're put on a thread we use: `_setjmp`/`_longjmp` to switch past that point.

As a result the switching code looks like (I've removed some of the things like the asan blocks to hopefully make it clearer):

```cpp
void FiberBase::switch_to(FiberBase* to) {
if (likely(_setjmp(_env) == 0)) {
__tsan_switch_to_fiber(to->_tsan_fiber, 0);

// First switch into the thread uses stack prepared by makecontext.
// After that we can use setjmp / longjmp for subsequent calls.
if (!to->_initialized) {
setcontext(&to->_ctxt);
} else {
_longjmp(to->_env, 1);
}
}
}
```

So is `_setjmp`/`_longjmp` actually supported? The review makes it unclear. If it is supported is their anything specifically wrong with my switching example provided above?

Thanks,
Cynthia

Cynthia Coan

unread,
Sep 11, 2020, 2:01:54 PM9/11/20
to thread-sanitizer
Looking at the tests for compiler-rt it seems like they use: `sigsetjmp`/`siglongjmp` so changing to those to see if it'd give me a more sensible error message, and I get:

```
ThreadSanitizer: can't find longjmp buf
FATAL: ThreadSanitizer CHECK failed: /home/nnelson/Documents/llvm-project/llvm/utils/release/rc2/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:532 "((0)) != (0)" (0x0, 0x0)
   #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /home/nnelson/Documents/llvm-project/llvm/utils/release/rc2/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp:47:25 (coven-fiber-tests+0x320ee5)
   #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /home/nnelson/Documents/llvm-project/llvm/utils/release/rc2/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:78:5 (coven-fiber-tests+0x337c8f)
   #2 LongJmp(__tsan::ThreadState*, unsigned long*) /home/nnelson/Documents/llvm-project/llvm/utils/release/rc2/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:532:3 (coven-fiber-tests+0x2b763d)
   #3 siglongjmp /home/nnelson/Documents/llvm-project/llvm/utils/release/rc2/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:632:3 (coven-fiber-tests+0x2b783a)
   #4 coven::base::context::FiberBase::switch_to(coven::base::context::FiberBase*) /proc/self/cwd/projects/coven/base/context_base.cc:92:7 (coven-fiber-tests+0x357a37)
   #5 coven::Fiber::yield() /proc/self/cwd/projects/coven/fiber.cc:146:8 (coven-fiber-tests+0x3512df)
```

This seems to be the same error message qemu was originally getting, a bit curious why it wouldn't be able to find the longjmp buf. For reference again here's the code with macro's removed, now with sigsetjmp/siglongjmp:

```
void FiberBase::switch_to(FiberBase* to) {
   if (likely(sigsetjmp(_env, 0) == 0)) {

#if defined(__SANITIZE_THREAD__)
     __tsan_switch_to_fiber(to->_tsan_fiber, 0);
#endif

     // First switch into the thread uses stack prepared by makecontext.
     // After that we can use setjmp / longjmp for subsequent calls.
     if (!to->_initialized) {
       setcontext(&to->_ctxt);
     } else {
       siglongjmp(to->_env, 1);
     }
   }
}
```

Dmitry Vyukov

unread,
Sep 16, 2020, 4:05:46 AM9/16/20
to Cynthia Coan, thread-sanitizer, yu...@acronis.com
Hi Cynthia,

I have all of that already paged out from my memory, but I will try to
do my best. Also +Yuri who added the fibers change.

> ==12==ERROR: ThreadSanitizer: SEGV on unknown address 0x60000212fff8 (pc 0x00000031dd2b bp 0x7fff3added50 sp 0x7fff3added28 T12)

0x60000 address is used for traces:
https://github.com/llvm/llvm-project/blob/a8a85166d81f573af7ff325fdf93dd8bdfdeddbf/compiler-rt/lib/tsan/rtl/tsan_platform.h#L39

And trace object starts with a stack trace:
https://github.com/llvm/llvm-project/blob/a8a85166d81f573af7ff325fdf93dd8bdfdeddbf/compiler-rt/lib/tsan/rtl/tsan_trace.h#L47

The low bytes of fff8 suggest an underflow. So I would assume the
thread stack trace was underflowed, which suggests that did not
understand some of fiber switching code and added a frame on one fiber
and removed on a wrong fiber.

> ThreadSanitizer: can't find longjmp buf

This is also an indication that does not understand the longjmp, which
may lead to messed stack traces and stack frame pop on a wrong
context.

> curious why it wouldn't be able to find the longjmp buf.

Tsan stores jmp_buf's on setjmp per thread, and then on longjmp it
tries to find the stored jmp_buf pointer in the per-thread context.
This message means it couldn't find it.
One reason may be that we stored jmp_buf in one thread context, but
then trying to find in another thread context (due to fiber switches).

> I'm a bit unclear though if this is because I'm perhaps integrating wrong, or if the things I'm doing aren't currently supported inside of TSAN.

setjmp/longjmp are supported for some limited use cases. Namely,
intended use in a single thread. Also they are supported for anything
covered by tests.
Try to add a test with the crux of what you are doing. If it's
working, then it makes sense to add such a test so that it does not
break in future. If it's not working, then by the fact it's not
supported.

Cynthia Coan

unread,
Sep 30, 2020, 12:07:48 AM9/30/20
to Dmitry Vyukov, thread-sanitizer, yu...@acronis.com
Hey Dmitry,

Thanks for the message this was actually really helpful as someone not very familiar with TSAN. I've attempted writing a test case (although it's not much bigger than some of my test cases that are failing), the big thing seems to be the factor of `setcontext` with `sigsetjmp`/`siglongjmp`. It sounds like this is what you described:

```

Tsan stores jmp_buf's on setjmp per thread, and then on longjmp it
tries to find the stored jmp_buf pointer in the per-thread context.
This message means it couldn't find it.
One reason may be that we stored jmp_buf in one thread context, but
then trying to find in another thread context (due to fiber switches).
```

Although they are in separate "contexts", they are in the same thread which based on my understanding is safe (and why the code is continuing to work, and not actually seeing data-races). However, it sounds like you're saying TSAN is storing jmpbufs in the actual context (does that make sense for TSAN?), instead of just as say a `thread_local`. If so, is the answer to change jmp bufs to be stored local to threads as opposed to thread contexts, would that break anything?

Thanks,
Cynthia
Reply all
Reply to author
Forward
0 new messages