Re: Deadlock with TSan on xpcshell tests

47 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 20, 2020, 7:13:18 AM3/20/20
to Christian Holler, thread-sanitizer
On Fri, Mar 20, 2020 at 11:28 AM Christian Holler <cho...@mozilla.com> wrote:
>
> Hi Dmitry,
>
> this morning, I was debugging intermittent timeouts in our CI and tried
> to reproduce them locally. I managed to provoke several hangs in our
> xpcshell tests. Attaching with GDB, it looks somewhat like this:
>
>
> > Program received signal SIGINT, Interrupt.
> > Lock () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:649
> > 649
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:
> > No such file or directory.
> > (gdb) bt
> > #0 0x000055f46b117c60 in Lock() () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_linux.cc:649
> > #1 0x000055f46b19efb2 in Lock () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_thread_registry.h:97
> > #2 0x000055f46b19efb2 in GenericScopedLock () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_mutex.h:183
> > #3 0x000055f46b19efb2 in ReportRace() () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cc:683
> > #4 0x000055f46b1a39ba in __tsan_report_race_thunk () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl_amd64.S:133
> > #5 0x000055f46b194468 in HandleRace () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.cc:620
> > #6 0x000055f46b194468 in MemoryAccessImpl1 () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.cc:696
> > #7 0x000055f46b194468 in MemoryAccess () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.cc:869
> > #8 0x000055f46b194468 in MemoryWrite () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_rtl.h:746
> > #9 0x000055f46b194468 in __tsan_write4() () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_inl.h:45
> > #10 0x00007fee8b0d8a76 in ForkedChild () at
> > /srv/repos/mozilla-central/security/nss/lib/softoken/pkcs11.c:543
> > #11 0x00007fef2bdb8bd2 in __libc_fork () at ../sysdeps/nptl/fork.c:204
> > #12 0x000055f46b13c5e2 in __interceptor_fork() () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:2106
> > #13 0x00007fef318ce8c1 in LaunchApp() () at
> > /srv/repos/mozilla-central/ipc/chromium/src/base/process_util_linux.cc:232
> > #14 0x00007fef31979e38 in DoLaunch() () at
> > /srv/repos/mozilla-central/ipc/glue/GeckoChildProcessHost.cpp:1199
> > #15 0x00007fef31977364 in PerformAsyncLaunch() () at
> > /srv/repos/mozilla-central/ipc/glue/GeckoChildProcessHost.cpp:957
> > #16 0x00007fef3199039f in applyImpl<mozilla::ipc::BaseProcessLauncher,
> > RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults,
> > mozilla::ipc::LaunchError, false> >
> > (mozilla::ipc::BaseProcessLauncher::*)()> ()
> > at
> > /srv/repos/mozilla-central/objdir-ff-fuzzing-tsan/dist/include/nsThreadUtils.h:1158
> > #17 0x00007fef3199039f in apply<mozilla::ipc::BaseProcessLauncher,
> > RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults,
> > mozilla::ipc::LaunchError, false> >
> > (mozilla::ipc::BaseProcessLauncher::*)()> ()
> > at
> > /srv/repos/mozilla-central/objdir-ff-fuzzing-tsan/dist/include/nsThreadUtils.h:1164
> > #18 0x00007fef3199039f in Invoke () at
> > /srv/repos/mozilla-central/objdir-ff-fuzzing-tsan/dist/include/mozilla/MozPromise.h:1329
> > #19 0x00007fef3199039f in Run() () at
> > /srv/repos/mozilla-central/objdir-ff-fuzzing-tsan/dist/include/mozilla/MozPromise.h:1349
> > #20 0x00007fef30ee7a19 in Run() () at
> > /srv/repos/mozilla-central/xpcom/threads/TaskQueue.cpp:207
> > #21 0x00007fef30ef4f38 in ProcessNextEvent() () at
> > /srv/repos/mozilla-central/xpcom/threads/nsThread.cpp:1220
> > #22 0x00007fef30ef9bb6 in NS_ProcessNextEvent() () at
> > /srv/repos/mozilla-central/xpcom/threads/nsThreadUtils.cpp:481
> > #23 0x00007fef319a9a2e in Run() () at
> > /srv/repos/mozilla-central/ipc/glue/MessagePump.cpp:332
> > #24 0x00007fef318d9aad in RunInternal () at
> > /srv/repos/mozilla-central/ipc/chromium/src/base/message_loop.cc:315
> > #25 0x00007fef318d9aad in RunHandler () at
> > /srv/repos/mozilla-central/ipc/chromium/src/base/message_loop.cc:308
> > #26 0x00007fef318d9aad in Run() () at
> > /srv/repos/mozilla-central/ipc/chromium/src/base/message_loop.cc:290
> > #27 0x00007fef30ef124c in ThreadFunc() () at
> > /srv/repos/mozilla-central/xpcom/threads/nsThread.cpp:464
> > #28 0x00007fef3fea6e85 in _pt_root () at
> > /srv/repos/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:201
>
>
> Does this make any sense to you? I believe the source lines are from
> LLVM release/9.x branch.
>
>
> Thanks in advance,
>
> Chris

+thread-sanitizer mailing list

Hi Chris,

This thread is just waiting on the ReportMutex. There should be some
other thread that is holding the mutex and is not releasing it, that
one is the real root cause. 'thread apply all bt' may shed some light.

Christian Holler

unread,
Mar 20, 2020, 8:14:34 AM3/20/20
to Dmitry Vyukov, thread-sanitizer
Hi :)


thanks for the quick response.

Actually, there are no other threads running when this happens:

> (gdb) info threads
>   Id   Target Id         Frame
> * 1    Thread 0x7ff7f031f700 (LWP 30923) "IPC Launch"
> atomic_exchange<__sanitizer::atomic_uint32_t> () at
> /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h:67

Best,

Chris

Dmitry Vyukov

unread,
Mar 20, 2020, 12:07:03 PM3/20/20
to Christian Holler, thread-sanitizer
On Fri, Mar 20, 2020 at 1:14 PM Christian Holler <cho...@mozilla.com> wrote:
>
> Hi :)
>
>
> thanks for the quick response.
>
> Actually, there are no other threads running when this happens:
>
> > (gdb) info threads
> > Id Target Id Frame
> > * 1 Thread 0x7ff7f031f700 (LWP 30923) "IPC Launch"
> > atomic_exchange<__sanitizer::atomic_uint32_t> () at
> > /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h:67

Ah, I see. It's out old friend -- fork. And ForkedChild called
directly from fork. I see it's setup as pthread_atfork callback in
libnss.

We need to figure out how this is different from:
https://github.com/llvm/llvm-project/blob/master/compiler-rt/test/tsan/pthread_atfork_deadlock.c

We probably need to enable all ignores around the call to fork... maybe...
What if you enable all ignores in ForkBefore and re-enable in
ForkParentAfter/ForkChildAfter? You can see how to do it right in
ForkChildAfter.

Christian Holler

unread,
Mar 20, 2020, 2:19:52 PM3/20/20
to Dmitry Vyukov, thread-sanitizer
Hi,

On 20.03.20 17:06, Dmitry Vyukov wrote:
> We probably need to enable all ignores around the call to fork... maybe...
> What if you enable all ignores in ForkBefore and re-enable in
> ForkParentAfter/ForkChildAfter? You can see how to do it right in
> ForkChildAfter.

So you mean duplicating the code from ForkChildAfter to the other Fork
functions you mentioned? If you meant something different, can you send
a patch (or outline it further), then I can try it.


Thanks,

Chris

Dmitry Vyukov

unread,
Mar 21, 2020, 9:38:11 AM3/21/20
to Christian Holler, thread-sanitizer
On Fri, Mar 20, 2020 at 7:19 PM Christian Holler <cho...@mozilla.com> wrote:
>
> Hi,
>
> On 20.03.20 17:06, Dmitry Vyukov wrote:
> > We probably need to enable all ignores around the call to fork... maybe...
> > What if you enable all ignores in ForkBefore and re-enable in
> > ForkParentAfter/ForkChildAfter? You can see how to do it right in
> > ForkChildAfter.
>
> So you mean duplicating the code from ForkChildAfter to the other Fork
> functions you mentioned? If you meant something different, can you send
> a patch (or outline it further), then I can try it.

Please try this:
https://github.com/llvm/llvm-project/commit/be41a98ac222f33ed5558d86e1cede67249e99b5

Christian Holler

unread,
Mar 23, 2020, 6:00:15 AM3/23/20
to Dmitry Vyukov, thread-sanitizer
Hi Dmitry,


the hangs seem to be gone both locally and in CI, thanks! We will of
course monitor the situation further, because some of the hangs were
intermittent and infrequent. If anything remains reproducible, I'll let
you know :)

Thanks again for the quick help.


Cheers,

Chris
Reply all
Reply to author
Forward
0 new messages