Is ThreadSanitizer's capability of detecting data race sound?

580 views
Skip to first unread message

Masataka Nishi

unread,
Jun 18, 2016, 11:58:55 AM6/18/16
to thread-s...@googlegroups.com
To: ThreadSanitizer-ML,

Hi. I am Masataka Nishi.

I ran ThreadSanitizer with "-fsanitize=thread" option using a statically
scheduled concurrent program, and then read the paper titled
"ThreadSanitizer - data race detection in practice" and an extended
paper submitted to RV2011.

I am working on a problem of verifying if a lock-free concurrent program
running on multi-core CPU is conflict-serializable or not.
To guarantee that the concurrent program never attempts a
non-deterministic program execution, a scheduler handles which
(pre-analyzed) threads can run concurrently. Each thread pre-fetches the
latest program state (i.e. view-serializable) and update the program
state atomically. A single mutex is used here to implement the atomic
update. I try to verify the guarantee by somehow, and I thought that a
race detector is a good tool.

Q1. Why does ThreadSanitizer falsely find data race in the code "6.4.1
Condition Variable"? Is it because the effect of mutex that prohibits
interleaving of Thread1 and Thread2 is ignored?

Q2. In what case ThreadSanitizer fail to catch data race (false negative)?

Q3. Once we remove all mutex in the threads except for the one used for
atomic update, is the proposed capability of detecting data race sound
in a sense that
a) the referred false positive is not reported,
b) all real data races are detected, and
c) no risk of false negative?
If not, what could be a major difficulty of failing to guarantee each one?

Best regards,
--
Masataka Nishi

Dmitry Vyukov

unread,
Jun 20, 2016, 4:21:13 AM6/20/16
to thread-s...@googlegroups.com
Hi Masataka,

The paper that you read is about an old version of ThreadSanitizer
(TSan). The old TSan could work in 2 modes: lock-set mode and
happens-before mode. The lock-set mode has less false negatives, but
requires strict locking discipline (it verifies that a particular
variable is protected by some set of mutexes). The lock-set mode has
false positives on condition variables and on all of lock-free code.
The new TSan (that is now in cland and gcc) works only in
happens-before mode. And this mode does not have false positives.
False positives in happens-before mode can happen due to 2 main
reasons: (1) unrelated synchronization that appear to synchronize the
accesses in a particular execution and (2) implementation limits on
amount of information about past that TSan can memorize. The good news
is that if you run a program enough times, it should catch all data
races eventually.

nishi.m...@gmail.com

unread,
Jun 20, 2016, 10:58:12 AM6/20/16
to thread-sanitizer
To: Dmitry-san,
I appreciate for the answer. ThreadSanitizer is now under stress-test using some complicated concurrent execution paths. For now indeed it doesn't raise false positives.
> run a program enough times
Then next, I need to find a criterion for deciding wether there remains missed cases.
Best regards,
Masataka Nishi 

Masataka Nishi

unread,
Jun 26, 2016, 9:36:45 AM6/26/16
to thread-s...@googlegroups.com
To:Dmitry-san,

Hi. Let me explain there may be a risk of missing real data race
using the following 2+1 processes. We firstly run proc0, then run
proc1 and proc2 concurrently. A pair of write(a[1]=3) in proc1 and
write(a[1]=4) in proc2 may satisfy race condition, but it depends on
what value proc0 writes on a[0]. If the value a[0] depends on an
undetermined input to a program, then we cannot deterministically
decide if data race could occur or not. This issue originates from
that we still need to examine if proc2 going through the first branching
path is feasible or not.

----
unsigned int a[3];
void *proc0(void *x){
a[0]=3;
//a[0]=1;
};
void *proc1(void *x){
a[1]=3;
};
void *proc2(void *x){
if(a[0]>2){
a[1]=4;
}else{
a[2]=4;
};
};

----
Q1. Does this example imply that we still need to explore all possible
parameter space of input to the program (here the space is
0<=a[0]<2^32-1 instead) to decide conclusively if there is a risk of
data race?
Q1'. In other word, do we still need to examine feasibility of all execution
paths in each process, combined with feasibility of concurrent scheduling
of processes of our interest?
Q1'' In another word, is computational complexity of correctly
deciding if there is no risk of missing real data race still
comparable with the
complexity of model-checking concurrent processes?
Q2. Do you have any idea of conservative design rule viable for eliminating
the mentioned risk of false-negative? I think the current
ThreadSanitizer does not miss any real data race using happens-before
checking, If there is no branching path.

Best regards,
_/_/_/_/_/_/_/_/_/
Masataka Nishi

Dmitry Vyukov

unread,
Jun 27, 2016, 6:58:00 AM6/27/16
to thread-s...@googlegroups.com
You are correct. ThreadSanitizer is a dynamic checker, it only detects
data races that actually happen in the current execution.

Re torelation: write better tests, run real systems under
ThreadSanitizer. That't the techniques we use.

Masataka Nishi

unread,
Jun 28, 2016, 7:20:44 PM6/28/16
to thread-s...@googlegroups.com
To:Dmitry-san,
> You are correct. ThreadSanitizer is a dynamic checker, it only detects
> data races that actually happen in the current execution.
> Re torelation: write better tests, run real systems under
> ThreadSanitizer. That't the techniques we use.

Exploring the entire parameter space for detecting data race,
or proving absence of data race, sounds like a greatly
inefficient way of verification work. "Happens-before"
symbolically matches R/W access to program variables
without exploring the parameter space, and it works if
concurrently running processes consist of sequential code only.

As far as all feasible (=schedulable) concurrent execution
paths are examined, then there remains no false negative.
Yet, it would be costly to prepare such test scenarios, as
lots of effort on system-level tests are required. It also raises
the cost of modifying the original code.

So I try to explore a better way of handling the branching
paths for upgrading the existing race detection method in
a way that absence of data race is guaranteed by design.

p.s.
Now I "struggle" to find evidently bad false positive.

Masataka Nishi

Dmitry Vyukov

unread,
Jun 29, 2016, 2:56:26 AM6/29/16
to thread-s...@googlegroups.com
On Wed, Jun 29, 2016 at 1:20 AM, Masataka Nishi
<nishi.m...@gmail.com> wrote:
> To:Dmitry-san,
>> You are correct. ThreadSanitizer is a dynamic checker, it only detects
>> data races that actually happen in the current execution.
>> Re torelation: write better tests, run real systems under
>> ThreadSanitizer. That't the techniques we use.
>
> Exploring the entire parameter space for detecting data race,
> or proving absence of data race, sounds like a greatly
> inefficient way of verification work. "Happens-before"

That's the best I can think of. If you have something better, I will
be happy to hear about it.


> symbolically matches R/W access to program variables
> without exploring the parameter space, and it works if
> concurrently running processes consist of sequential code only.
>
> As far as all feasible (=schedulable) concurrent execution
> paths are examined, then there remains no false negative.
> Yet, it would be costly to prepare such test scenarios, as
> lots of effort on system-level tests are required. It also raises
> the cost of modifying the original code.
>
> So I try to explore a better way of handling the branching
> paths for upgrading the existing race detection method in
> a way that absence of data race is guaranteed by design.
>
> p.s.
> Now I "struggle" to find evidently bad false positive.
>
> Masataka Nishi
>
> --
> You received this message because you are subscribed to the Google Groups "thread-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages