why are simple ThreadSanitizer tests flaky?

102 views
Skip to first unread message

Konstantin Serebryany

unread,
Jan 14, 2014, 4:42:13 AM1/14/14
to thread-s...@googlegroups.com, Jakub Jelinek, max
ThreadSanitizer, both the GCC and LLVM variants, have lots of simple
tests like this one:
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/lit_tests/simple_race.c?view=markup
These tests pass if tsan finds a race and fail if tsan does not find a race.
Sometimes tsan fails to find the race.

I just tried to reproduce the problem on my machine by doing this:

clang -fsanitize=thread simple_race.c
while true; do ./a.out > /dev/null 2>&1 && echo "NO RACE DETECTED" ; done

And after quite some time I did get "NO RACE DETECTED".
Jakub says that on his machine that happens very frequently (1 in ~5 runs)

So, what's going on?
The only idea we have is that we are hitting a known by-design race in
the tsan state machine.
A more or less up-to-date description of the algorithm is found here:
https://code.google.com/p/thread-sanitizer/wiki/Algorithm
(the exact description is the source, MemoryAccessImpl in tsan_rtl.cc)

The key loop in tsan's state machine looks like this:
for i in 1..4:
UpdateOneShadowState(shadow_address, i, new_shadow_word, store_word)
i.e. we are updating 4 shadow slots that correspond to the given
application address independently. There is no synchronization here,
otherwise tsan would be 10x slower.
If the racy accesses in the application happen within a few cycles
from each other,
the tsan state machine may actually hit it's own internal race and
fail to find the user's race.
We believe that this race can not lead to false positives, only to
false negatives.

Is this a problem for tsan users?
I think no. Happens-before race detection is probabilistic anyway,
there are many more reasons why tsan may fail to find a single race in
a single run.

Is this a problem for tsan developers? Looks like.
Dmitry has been inserting sleep(1) statements in tests here and there
to make the tests less flaky.
The LLVM bots where green lately (got enough sleeps for our machines)
but GCC bots fail often.

I would start from inserting sleep(1) statements in the tests that
frequently fail on GCC bots.
This will not guarantee lack of failures, but should reduce the
failure rate dramatically. Like this:

void *Thread1(void *x) {
sleep(1);
Global++;
return 0;
}

Interesting observation: if we make sure the racy accesses in the
application really happen
close to each other, tsan will start missing the race more often
(happens 1 in ~20 runs on my box):

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
int Global;
pthread_barrier_t B;
void *Thread1(void *x) {
pthread_barrier_wait(&B); //<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Global = 42;
return NULL;
}
void *Thread2(void *x) {
pthread_barrier_wait(&B); //<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Global = 43;
return NULL;
}
int main() {
pthread_barrier_init(&B, 0, 2);
pthread_t t[2];
pthread_create(&t[0], NULL, Thread1, NULL);
pthread_create(&t[1], NULL, Thread2, NULL);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
return 0;
}


--kcc

tetr...@gmail.com

unread,
Jan 14, 2014, 4:55:33 AM1/14/14
to thread-s...@googlegroups.com, Jakub Jelinek, max
I would start from inserting sleep(1) statements in the tests that
frequently fail on GCC bots.
This will not guarantee lack of failures, but should reduce the
failure rate dramatically. Like this:

Can we do usleep? GCC team has been ranting about long runtimes of TSan tests.

Jakub Jelinek

unread,
Jan 14, 2014, 5:08:31 AM1/14/14
to tetr...@gmail.com, thread-s...@googlegroups.com, max
Well, even with sleep/usleep they would be flaky under load, so not
entirely sure it is a good idea to include them in the testsuite at all,
after all, we are all happy mudflap with it's flaky testsuite has been
removed and don't want to replace it with something similarly racy.

On my bootstrap box (oldish 4 socket Quad-Core AMD Opteron(tm) Processor 8354)
usleep (1000) in Thread1 of simple_race.c is still not enough even on an
idle box, it still occassionally fails to report failure, with usleep
(10000) I haven't seen a failure to report in ~ 100 runs, but that is still
idle box, regtests happen often on a loaded box.

So, if we really want to keep the test in gcc testsuite, we'd want to add
some usleep (say the 10000), and put the pthread_creates/joins into a loop
and try say 100 times, in the hopefully common case we wouldn't hit the race
already the first time and would do just one iteration, and perhaps we could
dynamically increase the usleep time (start with 0, and increase up to say a
second in the 100th iteration).

Jakub

Jakub Jelinek

unread,
Jan 14, 2014, 5:16:10 AM1/14/14
to tetr...@gmail.com, thread-s...@googlegroups.com, max
On Tue, Jan 14, 2014 at 11:08:31AM +0100, Jakub Jelinek wrote:
> So, if we really want to keep the test in gcc testsuite, we'd want to add
> some usleep (say the 10000), and put the pthread_creates/joins into a loop
> and try say 100 times, in the hopefully common case we wouldn't hit the race
> already the first time and would do just one iteration, and perhaps we could
> dynamically increase the usleep time (start with 0, and increase up to say a
> second in the 100th iteration).

For the iterations we'd need to halt_on_error=true of course.

Jakub

Konstantin Serebryany

unread,
Jan 14, 2014, 5:28:21 AM1/14/14
to thread-s...@googlegroups.com, tetr...@gmail.com, max
usleep(10000) is probably not enough indeed.

I see no big trouble in sleep(1), at least with LLVM tests.
We have ~150 tests, ~90 of them use sleep(1). 'ninja check-tsan' takes
< 10 seconds on my box

What tests are running slow in GCC suite?

--kcc
> --
> You received this message because you are subscribed to the Google Groups "thread-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Yuri Gribov

unread,
Jan 14, 2014, 1:09:43 PM1/14/14
to Konstantin Serebryany, thread-s...@googlegroups.com, max
Replying-to-all this time.

> What tests are running slow in GCC suite?

Here are the relevant emails:
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00942.html

-Y

Konstantin Serebryany

unread,
Jan 15, 2014, 12:18:03 AM1/15/14
to Yuri Gribov, thread-s...@googlegroups.com, max
I've run this:
make -j 16 -C gcc check-g{cc,++}
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} tsan.exp'
It took 50 seconds and I saw at most 4 processes running at any given
point, so there seem to be little parallelism in the test runner.

--kcc

>
> -Y

Yuri Gribov

unread,
Jan 15, 2014, 12:44:39 AM1/15/14
to Konstantin Serebryany, thread-s...@googlegroups.com, max
> there seem to be little parallelism in the test runner.

Absolutely. AFAIK this is a known gcc feature.
-Y

Dmitry Vyukov

unread,
Jan 16, 2014, 3:57:43 AM1/16/14
to thread-s...@googlegroups.com, Jakub Jelinek, max
I have not yet seen flakes in the tests where I inserted sleep(1).
However, of course, it's somewhat dependent on test environment. If
the machine is extremely loaded so that runnable threads gets delayed
by 1+ seconds, then the flake is still possible.

Maxim Ostapenko

unread,
Jan 31, 2014, 6:52:54 AM1/31/14
to Jakub Jelinek, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
Hi,

> So, if we really want to keep the test in gcc testsuite, we'd want to
add
> some usleep (say the 10000), and put the pthread_creates/joins into a
loop
> and try say 100 times, in the hopefully common case we wouldn't hit
the race
> already the first time and would do just one iteration, and perhaps
we could
> dynamically increase the usleep time (start with 0, and increase up
to say a
> second in the 100th iteration).

I've finally had time and fixed flaky simple_race.c tsan test.
The problem with default_options.C is different, so I'll fix it separately.

Unfortunately, I couldn't reproduce the race condition in simple_race on
my machine.
Could someone check the attached patch?

-Maxim
fix_race.diff

Jakub Jelinek

unread,
Jan 31, 2014, 7:01:09 AM1/31/14
to Maxim Ostapenko, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
On Fri, Jan 31, 2014 at 03:52:54PM +0400, Maxim Ostapenko wrote:
> 2014-01-31 Maxim Ostapenko <m.ost...@partner.samsung.com>
>
> * c-c++-common/tsan/simple_race.c: Make test less flaky.
>
> diff --git a/gcc/testsuite/c-c++-common/tsan/simple_race.c b/gcc/testsuite/c-c++-common/tsan/simple_race.c
> index 24b88e8..66089a2 100644
> --- a/gcc/testsuite/c-c++-common/tsan/simple_race.c
> +++ b/gcc/testsuite/c-c++-common/tsan/simple_race.c
> @@ -1,12 +1,33 @@
> /* { dg-do run } */
> +/* { dg-set-target-env-var TSAN_OPTIONS "halt_on_error=1" } */
> /* { dg-shouldfail "tsan" } */
>
> #include <pthread.h>
> #include <stdio.h>
> +#include <unistd.h>
> +
> +#define MAX_ITERATIONS_NUMBER 10

That looks too small to me, I'd use 100 instead.

> +unsigned int delay_time = 1;
> +
> +static inline void delay () {
> + sleep(delay_time);
> +}

My preference would be to use usleep instead, and gradually increase
the delay time (say start with 1000 usec and go to up to say 2 seconds
for iteration 99).

Jakub

Dmitry Vyukov

unread,
Jan 31, 2014, 7:04:46 AM1/31/14
to thread-s...@googlegroups.com, Jakub Jelinek, Yury Gribov, Slava Garbuzov
Does not just sleep(1) help?
That's what we do in llvm and it seems to work well.

Yury Gribov

unread,
Jan 31, 2014, 7:17:55 AM1/31/14
to Dmitry Vyukov, thread-s...@googlegroups.com, Jakub Jelinek, Slava Garbuzov
Dmitry wrote:
> Does not just sleep(1) help?
> That's what we do in llvm and it seems to work well.

That was my expectation but some people reported that this isn't enough
under heavy loads.

-Y

Yury Gribov

unread,
Jan 31, 2014, 7:22:08 AM1/31/14
to Jakub Jelinek, Maxim Ostapenko, thread-s...@googlegroups.com, Slava Garbuzov
Jakub wrote:
> My preference would be to use usleep instead, and gradually increase
> the delay time (say start with 1000 usec and go to up to say 2 seconds
> for iteration 99).

Problem is that then we'll then get 100 sec worst case which looks too
much for one small test.

-Y

Jakub Jelinek

unread,
Jan 31, 2014, 7:33:27 AM1/31/14
to Yury Gribov, Maxim Ostapenko, thread-s...@googlegroups.com, Slava Garbuzov
That is why I suggest not to use hardcoded sleep (1) all the time, if you go
with 1000 usec first, then 2000, 4000, 8000, etc. until you reach say 256000
usec and then increase linearly, perhaps with a cap on say 4 seconds.
IMHO the worst case is still worth the random FAILs.

Jakub

Konstantin Serebryany

unread,
Jan 31, 2014, 7:38:39 AM1/31/14
to thread-s...@googlegroups.com, Yury Gribov, Maxim Ostapenko, Slava Garbuzov
Imho, this is vast over-complication. 
We use sleep(1) in 90 tests and all the tests run in < 9 seconds, thanks to parallelism. 
My 2c, I do not have strong opinions about the state of the test suite in GCC tree. 

--kcc 


Maxim Ostapenko

unread,
Jan 31, 2014, 8:06:22 AM1/31/14
to Jakub Jelinek, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
> That is why I suggest not to use hardcoded sleep (1) all the time, if you go
> with 1000 usec first, then 2000, 4000, 8000, etc. until you reach say 256000
> usec and then increase linearly, perhaps with a cap on say 4 seconds.
> IMHO the worst case is still worth the random FAILs.

Ok, got It.

Is it OK now?

-Maxim

fix_race.diff

Jakub Jelinek

unread,
Jan 31, 2014, 8:12:18 AM1/31/14
to Maxim Ostapenko, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
On Fri, Jan 31, 2014 at 05:06:22PM +0400, Maxim Ostapenko wrote:
> 2014-01-31 Maxim Ostapenko <m.ost...@partner.samsung.com>
>
> * c-c++-common/tsan/simple_race.c: Made test less flaky.
>
> diff --git a/gcc/testsuite/c-c++-common/tsan/simple_race.c b/gcc/testsuite/c-c++-common/tsan/simple_race.c
> index 24b88e8..cb137b8 100644
> --- a/gcc/testsuite/c-c++-common/tsan/simple_race.c
> +++ b/gcc/testsuite/c-c++-common/tsan/simple_race.c
> @@ -1,12 +1,35 @@
> /* { dg-do run } */
> +/* { dg-set-target-env-var TSAN_OPTIONS "halt_on_error=1" } */
> /* { dg-shouldfail "tsan" } */
>
> #include <pthread.h>
> #include <stdio.h>
> +#include <unistd.h>
> +
> +#define MAX_ITERATIONS_NUMBER 100
> +#define SLEEP_STEP 44441600
> +
> +unsigned int delay_time = 1000;
> +
> +static inline void delay () {
> + usleep(delay_time);
> +}
> +
> +extern int main_1();
> +
> +int main() {
> + int i;
> + for (i = 0; i < MAX_ITERATIONS_NUMBER; i++) {
> + main_1();
> + delay_time += delay_time < 256000 ? delay_time : SLEEP_STEP;
> + }
> + return 0;

Yeah, like that, except I don't know where SLEEP_STEP value comes
from? That value sounds way too big to me, 44441600 is 44 seconds,
so you start from 0.001 seconds up to 0.256 seconds and then jump to
44.7 seconds, 89 seconds etc.? I'd expect SLEEP_STEP to be
say 128000.

Jakub

Maxim Ostapenko

unread,
Jan 31, 2014, 8:19:07 AM1/31/14
to Jakub Jelinek, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
> Yeah, like that, except I don't know where SLEEP_STEP value comes
> from? That value sounds way too big to me, 44441600 is 44 seconds,
> so you start from 0.001 seconds up to 0.256 seconds and then jump to
> 44.7 seconds, 89 seconds etc.? I'd expect SLEEP_STEP to be
> say 128000.

Oh, I'm sorry. Fixed.

Ok to commit?

-Maxim

fix_race.diff

Jakub Jelinek

unread,
Jan 31, 2014, 8:24:58 AM1/31/14
to Maxim Ostapenko, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
Yes, thanks.

Jakub

Maxim Ostapenko

unread,
Jan 31, 2014, 8:46:03 AM1/31/14
to Jakub Jelinek, Yury Gribov, thread-s...@googlegroups.com, Slava Garbuzov
Commited in 207344.

-Maxim

Yuri Gribov

unread,
Jan 31, 2014, 9:15:49 AM1/31/14
to thread-s...@googlegroups.com, Jakub Jelinek, Yury Gribov, Slava Garbuzov, m.ost...@partner.samsung.com
> Commited in 207344. 

I'd prefer someone to check that it really solves the race condition problems before commiting...

-Y

Jakub Jelinek

unread,
Jan 31, 2014, 9:17:51 AM1/31/14
to Yuri Gribov, thread-s...@googlegroups.com, Yury Gribov, Slava Garbuzov, m.ost...@partner.samsung.com
I'm already running a bootstrap/regtest with that in the tree, so will find
out in a few hours. That said, only weeks of testing will tell if it is
reliably gone.

Jakub
Reply all
Reply to author
Forward
0 new messages