sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))

202 views
Skip to first unread message

Rik Prohaska

unread,
Aug 31, 2015, 2:57:09 PM8/31/15
to thread-sanitizer
Hello,

Seeing this failure

FATAL: ThreadSanitizer CHECK failed: /build/buildd/llvm-toolchain-snapshot-3.5/projects/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40)


when running tests on some software with clang 3.5 on ubuntu 14.04.


Is there a bug fix or workaround for this failure?


Thanks

Dmitry Vyukov

unread,
Aug 31, 2015, 3:19:49 PM8/31/15
to thread-s...@googlegroups.com
Hi Rik,

There is no workaround on tsan side.

This CHECK fires when a thread tries to acquire more than 64 mutexes
recursively (one under another). Are you sure that you don't leak
locked mutexes? Maybe you forget to unlock a mutex somewhere?

Rik Prohaska

unread,
Sep 1, 2015, 4:56:47 PM9/1/15
to thread-sanitizer
I am trying to create a simpler test case.  As far as I can tell, the problem occurs in a much larger code base with 16 mutexes locked.

Dmitry Vyukov

unread,
Sep 2, 2015, 5:26:53 AM9/2/15
to thread-s...@googlegroups.com
Hi Rik,

If you build fresh clang from sources following the instructions:
https://github.com/google/sanitizers/wiki/AddressSanitizerHowToBuild

Then, you can apply the following patch to tsan runtime, then tsan
will dump locked mutexes on the CHECK failure. This will allow us to
understand why the thread holds that many mutexes at once. You say
that only 16 mutexes are locked, but tsan thinks that it is 64. We
will understand why tsan thinks so.


Index: sanitizer_common/sanitizer_deadlock_detector.h
===================================================================
--- sanitizer_common/sanitizer_deadlock_detector.h (revision 245715)
+++ sanitizer_common/sanitizer_deadlock_detector.h (working copy)
@@ -28,6 +28,7 @@

#include "sanitizer_common.h"
#include "sanitizer_bvgraph.h"
+#include "sanitizer_stackdepot.h"

namespace __sanitizer {

@@ -66,7 +67,13 @@
recursive_locks[n_recursive_locks++] = lock_id;
return false;
}
- CHECK_LT(n_all_locks_, ARRAY_SIZE(all_locks_with_contexts_));
+ if (n_all_locks_ >= ARRAY_SIZE(all_locks_with_contexts_)) {
+ Printf("Current thread holds too many mutexes\n");
+ for (int i = 0; i < ARRAY_SIZE(all_locks_with_contexts_); i++) {
+ StackDepotGet(all_locks_with_contexts_[i].stk).Print();
+ }
+ CHECK(0);
+ }
// lock_id < BV::kSize, can cast to a smaller int.
u32 lock_id_short = static_cast<u32>(lock_id);
LockWithContext l = {lock_id_short, stk};
Index: sanitizer_common/sanitizer_deadlock_detector1.cc
===================================================================
--- sanitizer_common/sanitizer_deadlock_detector1.cc (revision 245715)
+++ sanitizer_common/sanitizer_deadlock_detector1.cc (working copy)
@@ -148,7 +148,7 @@
void DD::MutexAfterLock(DDCallback *cb, DDMutex *m, bool wlock, bool trylock) {
DDLogicalThread *lt = cb->lt;
u32 stk = 0;
- if (flags.second_deadlock_stack)
+ //if (flags.second_deadlock_stack)
stk = cb->Unwind();
// Printf("T%p MutexLock: %zx stk %u\n", lt, m->id, stk);
if (dd.onFirstLock(&lt->dd, m->id, stk))
> --
> You received this message because you are subscribed to the Google Groups
> "thread-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to thread-sanitiz...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Rik Prohaska

unread,
Sep 2, 2015, 8:34:18 AM9/2/15
to thread-sanitizer
I generated a test case with 12258 mutexes initialized but only 17 locked at the time the assert hits.  See 

https://s3.amazonaws.com/prohaska7-pub/mutex-limit-17.cc

Dmitry Vyukov

unread,
Sep 2, 2015, 8:39:53 AM9/2/15
to thread-s...@googlegroups.com
Nice! Out of curiosity, did you write a generator program that tries
different lock/unlock orders?

Rik Prohaska

unread,
Sep 2, 2015, 9:20:22 AM9/2/15
to thread-sanitizer
I captured all of the mutex ops in our application, which is a concurrent binary search tree, and then used that sequence in the test program.  So, unfortunately I do not have a general purpose generator.

Dmitry Vyukov

unread,
Sep 2, 2015, 11:31:13 AM9/2/15
to thread-s...@googlegroups.com
I cannot reproduce it with latest clang:

$ clang++ /tmp/deadlock.cc -std=c++11 -fsanitize=thread -O1 -g && ./a.out
$ clang++ -v
clang version 3.8.0 (trunk 246647)

Rik Prohaska

unread,
Sep 2, 2015, 11:52:49 AM9/2/15
to thread-sanitizer
Does the change in https://github.com/google/sanitizers/issues/594 have anything to do with this?

Dmitry Vyukov

unread,
Sep 2, 2015, 12:11:08 PM9/2/15
to thread-s...@googlegroups.com
It seems to only affect reporting code.

Henrik Skupin

unread,
Apr 5, 2024, 2:33:58 AMApr 5
to thread-sanitizer
Hi,

and sorry for replying to this old thread but my feedback is related to the `recursive` part of the former message.

We have the same issue right now in Firefox TSAN builds for specific test jobs that fail with the exact same error message. I was investigating this issue and with the help from colleagues we noticed that the problem with the deadlock detector is actually not when more than 64 mutexes are aquired recursively but more than 63 mutexes overall on a thread.

An example to reproduce is:

```
#include <pthread.h> 

#define N 64

int main() { 
  pthread_mutex_t mutexes[N]; 
  int i; 

  for (i = 0; i < N; i++) { 
    pthread_mutex_init(&mutexes[i], NULL); 
    pthread_mutex_lock(&mutexes[i]); 
  } 

  return 0; 
}
```

By just having this one-line check output and being required to modify the source of llvm, it is quite hard to find the underlying reason. Could this maybe made more informative by default so that at least all the used mutexes could be printed?

Thanks,
Henrik
Reply all
Reply to author
Forward
0 new messages