GDB, backtrace from __tsan::PrintReport

308 views
Skip to first unread message

yegor.de...@gmail.com

unread,
Nov 11, 2013, 4:14:39 PM11/11/13
to thread-s...@googlegroups.com
Hi All,

I am trying to run a program under a debugger and stop it, when a data
race is reported. I achieve this by putting a breakpoint at
__tsan::PrintReport. This works. However, when I call backtrace in GDB,
the topmost frame I see is the one of __tsan_{read,write}N function:

Breakpoint 1, 0x00007f454ed7ff40 in __tsan::PrintReport(__tsan::ReportDesc const*) ()
(gdb) bt
#0  0x00007f454ed7ff40 in __tsan::PrintReport(__tsan::ReportDesc const*) ()
#1  0x00007f454ed88dd9 in __tsan::OutputReport(__tsan::Context*, __tsan::ScopedReport const&, __tsan::ReportStack const*, __tsan::ReportStack const*) ()
#2  0x00007f454ed89624 in __tsan::ReportRace(__tsan::ThreadState*) ()
#3  0x00007f454ed86646 in __tsan_report_race_thunk ()
#4  0x00007f454ed8460c in __tsan_write4 ()
#5  0x0000000000000000 in ?? ()

Any ideas on why GDB fails to reconstruct the frames further up?

Notably, once __tsan_write4 is entered, GDB knows what is above it:

Breakpoint 2, 0x00007f08ae791140 in __tsan_write4 ()
(gdb) bt
#0  0x00007f08ae791140 in __tsan_write4 ()
#1  0x00007f08ae79a1ff in Thread2(void*) ()
#2  0x00007f08ae76b435 in __tsan_thread_start_func ()
#3  0x00007f08ae30de0e in start_thread (arg=0x7f08ae610040) at pthread_create.c:311
#4  0x00007f08ad41f9ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) info frame
Stack level 0, frame at 0x7f08ae5abe80:
 rip = 0x7f08ae791140 in __tsan_write4; saved rip 0x7f08ae79a1ff
 called by frame at 0x7f08ae5abeb0
 Arglist at 0x7f08ae5abe70, args:
 Locals at 0x7f08ae5abe70, Previous frame's sp is 0x7f08ae5abe80
 Saved registers:
  rip at 0x7f08ae5abe78

However, when __tsan_report_race_thunk is called, it does not:

(gdb) n
Single stepping until exit from function __tsan_write4,
which has no line number information.

Breakpoint 1, 0x00007f08ae793628 in __tsan_report_race_thunk ()
(gdb) bt
#0  0x00007f08ae793628 in __tsan_report_race_thunk ()
#1  0x00007f08ae79160c in __tsan_write4 ()
#2  0x0000000000000000 in ?? ()
(gdb) frame 1
#1  0x00007f08ae79160c in __tsan_write4 ()
(gdb) info frame
Stack level 1, frame at 0x7f08ae5aba80:
 rip = 0x7f08ae79160c in __tsan_write4; saved rip 0x0
 called by frame at 0x7f08ae5aba88, caller of frame at 0x7f08ae5aba68
 Arglist at 0x7f08ae5aba60, args:
 Locals at 0x7f08ae5aba60, Previous frame's sp is 0x7f08ae5aba80
 Saved registers:
  rbx at 0x7f08ae5aba68, r14 at 0x7f08ae5aba70, rip at 0x7f08ae5aba78

(Note that frame address is different, rip location is different.)

The problem is observed on GCC 4.8.2, Clang 3.3, GDB 7.6.1 under Debian
Testing, on any kind of debugged program compiled with the standard
flags: -fsanitize=thread -fPIC -pie -pthread -g. The example in the
above dialogs was compiled by Clang.

Any comments?

Konstantin Serebryany

unread,
Nov 12, 2013, 2:43:30 AM11/12/13
to thread-s...@googlegroups.com
My wild guess is that our implementation of __tsan_report_race_thunk
(lib/tsan/rtl/tsan_rtl_amd64.S) is no gdb-friendly.
Dmitry, WDYT?

Yegor, just curious, why are you trying to attach with gdb, i.e. what
kind of information is missing in the reports?

--kcc
> --
> You received this message because you are subscribed to the Google Groups
> "thread-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to thread-sanitiz...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

yegor.de...@gmail.com

unread,
Nov 12, 2013, 5:14:44 AM11/12/13
to thread-s...@googlegroups.com
Hi Kostya,


> Yegor, just curious, why are you trying to attach with gdb, i.e. what
> kind of information is missing in the reports?
In short — valuation of local variables in the stack frames above.

The full story is that I am trying to use ThreadSanitizer for detecting
data races in SPMD programs using PGAS APIs for communication. I launch
worker processes as threads within one process and apply ThreadSanitizer.

These APIs have methods like "copy block of memory on node X at address
Y of size Z to me at address W". Copying is done asynchronously, i.e.
in background (which is emulated by memcpy in a background thread).
Therefore, TSan reports the call stack of the background thread, rather
then the call stack of the place where the copy request was created
(which would be desirable).

One option how to know who created the copy request is to record this
information for every request created, and then look up this info via
GDB (nice for proof-of-concept check), or add an external hook printing
it together with TSan's report (most likely, the right way (tm) of doing
this), or make TSan know about these copy requests explicitly
(least favorite). I have started playing with the first option, which
led to this discussion topic.

Konstantin Serebryany

unread,
Nov 12, 2013, 5:55:50 AM11/12/13
to thread-s...@googlegroups.com
As a workaround you may try building tsan run-time with -DTSAN_DEBUG=1
-- it should disable the tricky assembly call.

Dmitry Vyukov

unread,
Nov 14, 2013, 2:09:16 AM11/14/13
to thread-s...@googlegroups.com
On Tue, Nov 12, 2013 at 1:14 AM, <yegor.de...@gmail.com> wrote:
Hi,

gbd can not unwind though our 'hacky call' as is.
We need to fix cfi directives for it to work properly.

I am not sure why the following was commented before. Now when I uncomment it, gdb properly unwinds from ReportRace and our presubmit script still passes successfully. Can you test the following patch in your env?

llvm/projects/compiler-rt/lib/tsan$ svn diff
Index: rtl/tsan_rtl.h
===================================================================
--- rtl/tsan_rtl.h (revision 193877)
+++ rtl/tsan_rtl.h (working copy)
@@ -727,11 +727,11 @@
 // so we create a reserve stack frame for it (1024b must be enough).
 #define HACKY_CALL(f) \
   __asm__ __volatile__("sub $1024, %%rsp;" \
-                       "/*.cfi_adjust_cfa_offset 1024;*/" \
+                       ".cfi_adjust_cfa_offset 1024;" \
                        ".hidden " #f "_thunk;" \
                        "call " #f "_thunk;" \
                        "add $1024, %%rsp;" \
-                       "/*.cfi_adjust_cfa_offset -1024;*/" \
+                       ".cfi_adjust_cfa_offset -1024;" \
                        ::: "memory", "cc");
 #else
 #define HACKY_CALL(f) f()

Dmitry Vyukov

unread,
Nov 14, 2013, 2:16:54 AM11/14/13
to thread-s...@googlegroups.com
On Tue, Nov 12, 2013 at 2:14 PM, <yegor.de...@gmail.com> wrote:
Hi Kostya,


> Yegor, just curious, why are you trying to attach with gdb, i.e. what
> kind of information is missing in the reports?
In short — valuation of local variables in the stack frames above.

The full story is that I am trying to use ThreadSanitizer for detecting
data races in SPMD programs using PGAS APIs for communication. I launch
worker processes as threads within one process and apply ThreadSanitizer.

These APIs have methods like "copy block of memory on node X at address
Y of size Z to me at address W". Copying is done asynchronously, i.e.
in background (which is emulated by memcpy in a background thread).
Therefore, TSan reports the call stack of the background thread, rather
then the call stack of the place where the copy request was created
(which would be desirable).


We see this problem everywhere. Tsan works on thread level, modern concurrent software works on task level.

Would the tasking API solve the problem for you?
(you will need to annotate your thread pool so that tsan understands task boundaries)


 
One option how to know who created the copy request is to record this
information for every request created, and then look up this info via
GDB (nice for proof-of-concept check), or add an external hook printing
it together with TSan's report (most likely, the right way (tm) of doing
this), or make TSan know about these copy requests explicitly
(least favorite). I have started playing with the first option, which
led to this discussion topic.

Dmitry Vyukov

unread,
Nov 15, 2013, 5:30:28 AM11/15/13
to thread-s...@googlegroups.com
This patch seems to work, so I've submitted it.

yegor.de...@gmail.com

unread,
Nov 15, 2013, 1:17:53 PM11/15/13
to thread-s...@googlegroups.com
Hi Dmitry,


> Would the tasking API solve the problem for you?
> https://code.google.com/p/thread-sanitizer/issues/detail?id=37
Yes. The interface from tsan_interface_task.h seems to be exactly what
is needed.

It would be also great to have an annotation mechanism for specifying
racy memory regions, e.g.:

__tsan_add_racy_region(void *addr, uptr size);
__tsan_remove_racy_region(void *addr);

Some scientific applications are racy on purpose, and the only way to
suppress reports there is by a function name, which is not very
fine-grained. Also, while the support for tasks is not there, code
modifications in several places are required, in order to suppress races
between tasks. I think, you already had a similar interface in TSan1
(ANNOTATE_BENIGN_RACE_SIZED).

BTW, your patch fixes the problem with the backtrace. Konstatin's
workaround with -DTSAN_DEBUG=1 helped too.

Thank you all!

Alexey Samsonov

unread,
Nov 17, 2013, 5:28:47 AM11/17/13
to thread-s...@googlegroups.com
On Fri, Nov 15, 2013 at 10:17 PM, <yegor.de...@gmail.com> wrote:
Hi Dmitry,


> Would the tasking API solve the problem for you?
> https://code.google.com/p/thread-sanitizer/issues/detail?id=37
Yes. The interface from tsan_interface_task.h seems to be exactly what
is needed.

It would be also great to have an annotation mechanism for specifying
racy memory regions, e.g.:

__tsan_add_racy_region(void *addr, uptr size);
__tsan_remove_racy_region(void *addr);

Some scientific applications are racy on purpose,

(comment from a peanut gallery)
That's what frustrates us and what we're trying to fight with.
 
and the only way to
suppress reports there is by a function name, which is not very
fine-grained. Also, while the support for tasks is not there, code
modifications in several places are required, in order to suppress races
between tasks. I think, you already had a similar interface in TSan1
(ANNOTATE_BENIGN_RACE_SIZED).

BTW, your patch fixes the problem with the backtrace. Konstatin's
workaround with -DTSAN_DEBUG=1 helped too.

Thank you all!

--
You received this message because you are subscribed to the Google Groups "thread-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Alexey Samsonov, MSK

Yegor Derevenets

unread,
Nov 17, 2013, 8:06:31 AM11/17/13
to thread-s...@googlegroups.com
Hi Alexey,

On Sun, Nov 17, 2013 at 02:28:47PM +0400, Alexey Samsonov wrote:
> > It would be also great to have an annotation mechanism for specifying
> > racy memory regions, e.g.:
> >
> > __tsan_add_racy_region(void *addr, uptr size);
> > __tsan_remove_racy_region(void *addr);
> >
> > Some scientific applications are racy on purpose,
>
> (comment from a peanut gallery)
> That's what frustrates us and what we're trying to fight with.
> http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

> The correct way to express such pattern is to use atomic operations.
Exactly. One can annotate local memory accesses with atomics. But what
about remote memory accesses?

What should the atomic version of e.g. gaspi_write() [1] do? Should it
write the whole block atomically? Should it know the structure of the
data being sent and write each field atomically? Is anything of that
efficiently implementable on the existing interconnect hardware? A quick
scan of Infiniband specification says: "No". And a study of existing
PGAS APIs indirectly confirms this: none of them has atomic bulk
operations.

One can say then: "You should synchronize". But synchronization means
more overhead, longer execution, higher power consumption, and,
therefore, costs. (I guess, you in Google know it best of all.)

I see two ways out: 1) making racy_gaspi_write() working exactly as the
usual one and ignoring races caused by it; 2) marking memory regions as
racy and ignoring races within them (provides somewhat finer control).
And in the end one has to hope that the observed, formally undefined,
behavior will not change in the future.

However, any other ideas will be warmly welcome.

[1] http://www.gaspi.de/fileadmin/GASPI/pdf/GASPI-1.0.1.pdf

--
Yegor Derevenets

Dmitry Vyukov

unread,
Nov 19, 2013, 12:26:47 AM11/19/13
to thread-s...@googlegroups.com
On Fri, Nov 15, 2013 at 10:17 PM, <yegor.de...@gmail.com> wrote:
> Hi Dmitry,
>
>
>> Would the tasking API solve the problem for you?
>> https://code.google.com/p/thread-sanitizer/issues/detail?id=37
> Yes. The interface from tsan_interface_task.h seems to be exactly what
> is needed.
>
> It would be also great to have an annotation mechanism for specifying
> racy memory regions, e.g.:
>
> __tsan_add_racy_region(void *addr, uptr size);
> __tsan_remove_racy_region(void *addr);
>
> Some scientific applications are racy on purpose, and the only way to
> suppress reports there is by a function name, which is not very
> fine-grained. Also, while the support for tasks is not there, code
> modifications in several places are required, in order to suppress races
> between tasks. I think, you already had a similar interface in TSan1
> (ANNOTATE_BENIGN_RACE_SIZED).


The annotations do work with new tsan. We do not particularly promote
their usage, though.

Yegor Derevenets

unread,
Nov 19, 2013, 10:59:33 AM11/19/13
to thread-s...@googlegroups.com
On Tue, Nov 19, 2013 at 09:26:47AM +0400, Dmitry Vyukov wrote:
> > I think, you already had a similar interface in TSan1
> > (ANNOTATE_BENIGN_RACE_SIZED).
>
> The annotations do work with new tsan.
Thanks. Did not notice AnnotateBenignRaceSized before.

> We do not particularly promote their usage, though.
Yeah. All data races are evil and must be fixed (like compiler
warnings). However, in rare cases this is not possible.

BTW, are you sure WTFAnnotateBenignRaceSized should not use sz argument?

--
Yegor Derevenets

Dmitry Vyukov

unread,
Nov 19, 2013, 11:07:26 AM11/19/13
to thread-s...@googlegroups.com
Fixed in r195133.
Thanks!
Reply all
Reply to author
Forward
0 new messages