Is there any downside to using libgcc's backtrace?

395 views
Skip to first unread message

krisschumi

unread,
Dec 21, 2017, 1:48:10 PM12/21/17
to gperftools
Hi Aliaksey,

We're unable to use libunwind (because it crashes all the time) and frame pointer (because it is causing a 3% performance degradation when compiled with -fno-omit-frame-pointer). Therefore, I want to configure and compile gperftools with the "--enable-stacktrace-by-backtrace" option. But then, I see a note in the configure file saying "No libunwind and no frame pointer, expect crashy profiler". Can you explain why the profiler is more likely to crash with this approach? Just wanted to let you know that there's not a single try catch block in all of our code.

There's no mention of any crashes with libgcc's backtrace in your wiki article below:

Aliaksei Kandratsenka

unread,
Dec 21, 2017, 4:51:12 PM12/21/17
to krisschumi, gperftools
Hi. I think you're likely to have more luck with updated libunwind than libgcc or backtrace() facility (glibc's backtrace is using libgcc as well). I.e. try getting their latest release or even building it from their git master branch.

If you still hit problems, it looks like libunwind project isn't dead it might be best to just report any crashing issues to them. One possible issue (but arguably not excuse for libunwind) is that some asm functions in glibc don't bother with unwind annotations. Are you crashing in something like memset/strlen/etc ? If so and if newest libunwind doesn't help, then please mention that to libunwind bug report.

As for weakness of libgcc, indeed main risk is getting cpu profiler "tick" while exception is being thrown. If you're certain that you don't have them, it might be worth a try. But note that we have seen crash reports with backtrace()/libgcc as well. Could be that they're not expecting backtracing/unwinding from signal handler for example.


Thanks,
Krishna

--
You received this message because you are subscribed to the Google Groups "gperftools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gperftools+unsubscribe@googlegroups.com.
To post to this group, send email to gperf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gperftools/3975b99c-0a07-4daf-a332-38ddf92e0260%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

krisschumi

unread,
Dec 21, 2017, 6:26:45 PM12/21/17
to gperftools
Thanks for the response, Aliaksey. I forgot to mention that I was indeed using the latest version of libunwind, which is 1.2.1. And I built it from their git master branch.

Below is the exact line in libunwind where it crashes in file Ginit.c. I will submit a bug report as well. I will try gperftools with libgcc backtrace. I already tried frame pointers and it works superb -- no crashes. Our software is a large multi-threaded CAD program where performance is important. Our software also has a peak memory of ~200 GB because we have to store and manipulate circuit layout objects.

static int
access_mem (unw_addr_space_t as, unw_word_t addr, unw_word_t *val, int write,
            void *arg)
{
  if (unlikely (write))
    {
      Debug (16, "mem[%016lx] <- %lx\n", addr, *val);
      *(unw_word_t *) addr = *val;
    }
  else
    {
      /* validate address */
      const struct cursor *c = (const struct cursor *)arg;
      if (likely (c != NULL) && unlikely (c->validate)
          && unlikely (validate_mem (addr))) {
        Debug (16, "mem[%016lx] -> invalid\n", addr);
        return -1;
      }
      *val = *(unw_word_t *) addr;
      Debug (16, "mem[%016lx] -> %lx\n", addr, *val);
    }
  return 0;
}

On Thursday, December 21, 2017 at 1:51:12 PM UTC-8, Aliaksei Kandratsenka wrote:
On 21 December 2017 at 10:48, krisschumi <krishn...@gmail.com> wrote:
Hi Aliaksey,

We're unable to use libunwind (because it crashes all the time) and frame pointer (because it is causing a 3% performance degradation when compiled with -fno-omit-frame-pointer). Therefore, I want to configure and compile gperftools with the "--enable-stacktrace-by-backtrace" option. But then, I see a note in the configure file saying "No libunwind and no frame pointer, expect crashy profiler". Can you explain why the profiler is more likely to crash with this approach? Just wanted to let you know that there's not a single try catch block in all of our code.

There's no mention of any crashes with libgcc's backtrace in your wiki article below:


Hi. I think you're likely to have more luck with updated libunwind than libgcc or backtrace() facility (glibc's backtrace is using libgcc as well). I.e. try getting their latest release or even building it from their git master branch.

If you still hit problems, it looks like libunwind project isn't dead it might be best to just report any crashing issues to them. One possible issue (but arguably not excuse for libunwind) is that some asm functions in glibc don't bother with unwind annotations. Are you crashing in something like memset/strlen/etc ? If so and if newest libunwind doesn't help, then please mention that to libunwind bug report.

As for weakness of libgcc, indeed main risk is getting cpu profiler "tick" while exception is being thrown. If you're certain that you don't have them, it might be worth a try. But note that we have seen crash reports with backtrace()/libgcc as well. Could be that they're not expecting backtracing/unwinding from signal handler for example.


Thanks,
Krishna

--
You received this message because you are subscribed to the Google Groups "gperftools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gperftools+...@googlegroups.com.

Aliaksei Kandratsenka

unread,
Dec 21, 2017, 6:51:01 PM12/21/17
to krisschumi, gperftools
On 21 December 2017 at 15:26, krisschumi <krishn...@gmail.com> wrote:
Thanks for the response, Aliaksey. I forgot to mention that I was indeed using the latest version of libunwind, which is 1.2.1. And I built it from their git master branch.

Below is the exact line in libunwind where it crashes in file Ginit.c. I will submit a bug report as well. I will try gperftools with libgcc backtrace. I already tried frame pointers and it works superb -- no crashes. Our software is a large multi-threaded CAD program where performance is important. Our software also has a peak memory of ~200 GB because we have to store and manipulate circuit layout objects.

static int
access_mem (unw_addr_space_t as, unw_word_t addr, unw_word_t *val, int write,
            void *arg)
{
  if (unlikely (write))
    {
      Debug (16, "mem[%016lx] <- %lx\n", addr, *val);
      *(unw_word_t *) addr = *val;
    }
  else
    {
      /* validate address */
      const struct cursor *c = (const struct cursor *)arg;
      if (likely (c != NULL) && unlikely (c->validate)
          && unlikely (validate_mem (addr))) {
        Debug (16, "mem[%016lx] -> invalid\n", addr);
        return -1;
      }
      *val = *(unw_word_t *) addr;
      Debug (16, "mem[%016lx] -> %lx\n", addr, *val);
    }
  return 0;
}

Very interesting. Looks they're trying to write some memory which is odd. They do seem to have some code to validate memory that is read. But I cannot imagine simple capturing of backtrace needing to write anything. Could it be that this is from app actually unwinding stack as part of throwing exception or say longjmp-ing ? I.e. because libunwind (imho sadly) takes over those as well. Basically do you see cpu profiler capturing stack trace in those crashes? And what is function just below signal frame? Could it be some kind of memset/strlen function that is coded in asm in glibc ?

 

On Thursday, December 21, 2017 at 1:51:12 PM UTC-8, Aliaksei Kandratsenka wrote:


On 21 December 2017 at 10:48, krisschumi <krishn...@gmail.com> wrote:
Hi Aliaksey,

We're unable to use libunwind (because it crashes all the time) and frame pointer (because it is causing a 3% performance degradation when compiled with -fno-omit-frame-pointer). Therefore, I want to configure and compile gperftools with the "--enable-stacktrace-by-backtrace" option. But then, I see a note in the configure file saying "No libunwind and no frame pointer, expect crashy profiler". Can you explain why the profiler is more likely to crash with this approach? Just wanted to let you know that there's not a single try catch block in all of our code.

There's no mention of any crashes with libgcc's backtrace in your wiki article below:


Hi. I think you're likely to have more luck with updated libunwind than libgcc or backtrace() facility (glibc's backtrace is using libgcc as well). I.e. try getting their latest release or even building it from their git master branch.

If you still hit problems, it looks like libunwind project isn't dead it might be best to just report any crashing issues to them. One possible issue (but arguably not excuse for libunwind) is that some asm functions in glibc don't bother with unwind annotations. Are you crashing in something like memset/strlen/etc ? If so and if newest libunwind doesn't help, then please mention that to libunwind bug report.

As for weakness of libgcc, indeed main risk is getting cpu profiler "tick" while exception is being thrown. If you're certain that you don't have them, it might be worth a try. But note that we have seen crash reports with backtrace()/libgcc as well. Could be that they're not expecting backtracing/unwinding from signal handler for example.


Thanks,
Krishna

--
You received this message because you are subscribed to the Google Groups "gperftools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gperftools+...@googlegroups.com.
To post to this group, send email to gperf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gperftools/3975b99c-0a07-4daf-a332-38ddf92e0260%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gperftools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gperftools+unsubscribe@googlegroups.com.

To post to this group, send email to gperf...@googlegroups.com.

krisschumi

unread,
Dec 21, 2017, 7:06:08 PM12/21/17
to gperftools
Yes - I consistently saw crashes in that exact line of code. Below is the stack trace (in reverse order) dumped by our CAD program (we have handlers for SIGSEGV so we can dump out a stack trace). I am not showing all of the crash call stack as it contains some of our functions which I do not want to share. Also, the crash is random i.e. I ran the same test case several times and it crashes randomly.

We definitely don't have exceptions in our code. I am not sure about longjmp-ing. What is longjmp-ing? And we don't have any assembly code as well. The one thing that I would note is that we're using a compiler from the dinosaur age (gcc 4.2.2) so we're in-memory compatible with another legacy CAD application. Could this be a problem? We're soon (before end of year) moving to the latest compiler and I will give libunwind a shot again.

-SEGFAULT-Stack- (6) /lib64/libpthread.so.0(+0xf850) [0x7ffff79cb850]
-SEGFAULT-Stack- (5) ProfileHandler::SignalHandler(int, siginfo*, void*)
-SEGFAULT-Stack- (4) CpuProfiler::prof_handler(int, siginfo*, void*, void*)
-SEGFAULT-Stack- (3) GetStackTraceWithContext(void**, int, int, void const*)
-SEGFAULT-Stack- (2) libprofiler.so(+0x92cc) [0x7ffff68812cc]
-SEGFAULT-Stack- (1) libunwind.so.8(_ULx86_64_step+0x22b) [0x7ffff6662bfb]
-SEGFAULT-Stack- (0) libunwind.so.8(+0x3042) [0x7ffff6662042]

Very interesting. Looks they're trying to write some memory which is odd. They do seem to have some code to validate memory that is read. But I cannot imagine simple capturing of backtrace needing to write anything. Could it be that this is from app actually unwinding stack as part of throwing exception or say longjmp-ing ? I.e. because libunwind (imho sadly) takes over those as well. Basically do you see cpu profiler capturing stack trace in those crashes? And what is function just below signal frame? Could it be some kind of memset/strlen function that is coded in asm in glibc ?

Aliaksei Kandratsenka

unread,
Dec 21, 2017, 7:12:07 PM12/21/17
to krisschumi, gperftools
On 21 December 2017 at 16:06, krisschumi <krishn...@gmail.com> wrote:
Yes - I consistently saw crashes in that exact line of code. Below is the stack trace (in reverse order) dumped by our CAD program (we have handlers for SIGSEGV so we can dump out a stack trace). I am not showing all of the crash call stack as it contains some of our functions which I do not want to share. Also, the crash is random i.e. I ran the same test case several times and it crashes randomly.

We definitely don't have exceptions in our code. I am not sure about longjmp-ing. What is longjmp-ing? And we don't have any assembly code as well. The one thing that I would note is that we're using a compiler from the dinosaur age (gcc 4.2.2) so we're in-memory compatible with another legacy CAD application. Could this be a problem? We're soon (before end of year) moving to the latest compiler and I will give libunwind a shot again.

-SEGFAULT-Stack- (6) /lib64/libpthread.so.0(+0xf850) [0x7ffff79cb850]
-SEGFAULT-Stack- (5) ProfileHandler::SignalHandler(int, siginfo*, void*)
-SEGFAULT-Stack- (4) CpuProfiler::prof_handler(int, siginfo*, void*, void*)
-SEGFAULT-Stack- (3) GetStackTraceWithContext(void**, int, int, void const*)
-SEGFAULT-Stack- (2) libprofiler.so(+0x92cc) [0x7ffff68812cc]
-SEGFAULT-Stack- (1) libunwind.so.8(_ULx86_64_step+0x22b) [0x7ffff6662bfb]
-SEGFAULT-Stack- (0) libunwind.so.8(+0x3042) [0x7ffff6662042]


No stack "above " that libpthread entry? Also how this crash stack is obtained ? It doesn't seem to be coming from gdb so perhaps something like breakpad? I think it might be useful to try to get core dump or gdb "stopped" at crash like that and see what gdb thinks of stack trace.

longjmp is part of setjmp/longjmp "facility" in C language. man setjmp will tell you the story. Some implementations (I am not sure if glibc is one of them) do perform stack unwinding as part of longjmp.

To unsubscribe from this group and stop receiving emails from it, send an email to gperftools+unsubscribe@googlegroups.com.

To post to this group, send email to gperf...@googlegroups.com.

Aliaksei Kandratsenka

unread,
Dec 21, 2017, 7:13:20 PM12/21/17
to krisschumi, gperftools
On 21 December 2017 at 16:12, Aliaksei Kandratsenka <al...@google.com> wrote:


On 21 December 2017 at 16:06, krisschumi <krishn...@gmail.com> wrote:
Yes - I consistently saw crashes in that exact line of code. Below is the stack trace (in reverse order) dumped by our CAD program (we have handlers for SIGSEGV so we can dump out a stack trace). I am not showing all of the crash call stack as it contains some of our functions which I do not want to share. Also, the crash is random i.e. I ran the same test case several times and it crashes randomly.

We definitely don't have exceptions in our code. I am not sure about longjmp-ing. What is longjmp-ing? And we don't have any assembly code as well. The one thing that I would note is that we're using a compiler from the dinosaur age (gcc 4.2.2) so we're in-memory compatible with another legacy CAD application. Could this be a problem? We're soon (before end of year) moving to the latest compiler and I will give libunwind a shot again.

-SEGFAULT-Stack- (6) /lib64/libpthread.so.0(+0xf850) [0x7ffff79cb850]
-SEGFAULT-Stack- (5) ProfileHandler::SignalHandler(int, siginfo*, void*)
-SEGFAULT-Stack- (4) CpuProfiler::prof_handler(int, siginfo*, void*, void*)
-SEGFAULT-Stack- (3) GetStackTraceWithContext(void**, int, int, void const*)
-SEGFAULT-Stack- (2) libprofiler.so(+0x92cc) [0x7ffff68812cc]
-SEGFAULT-Stack- (1) libunwind.so.8(_ULx86_64_step+0x22b) [0x7ffff6662bfb]
-SEGFAULT-Stack- (0) libunwind.so.8(+0x3042) [0x7ffff6662042]


No stack "above " that libpthread entry? Also how this crash stack is obtained ? It doesn't seem to be coming from gdb so perhaps something like breakpad? I think it might be useful to try to get core dump or gdb "stopped" at crash like that and see what gdb thinks of stack trace.

longjmp is part of setjmp/longjmp "facility" in C language. man setjmp will tell you the story. Some implementations (I am not sure if glibc is one of them) do perform stack unwinding as part of longjmp.

But in any case looks like we have enough info to report bug to libunwind people. We're clearly just capturing backtrace. I.e. 100% read-only. Which should not update any random parts of memory.

krisschumi

unread,
Dec 21, 2017, 7:24:23 PM12/21/17
to gperftools
There is stack above the libpthread entry, but it is functions from our code, so I did not include them here in this public forum. The crash stack is from calling glibc backtrace and backtrace_symols_fd functions from within our SIGSEGV handler. I also confirmed this stack by running our program (with profiling turned on) under gdb and got the same stack.

Thanks for weighing in on this issue. I'll write to libunwind, but I definitely cannot provide them our test case for sure. I'll have to figure out other ways to reproduce and share a test case.

Also, since you have not commented on our compiler (gcc 4.2.2), I am assuming that that is not a problem.

Aliaksei Kandratsenka

unread,
Dec 21, 2017, 7:43:31 PM12/21/17
to krisschumi, gperftools
On 21 December 2017 at 16:24, krisschumi <krishn...@gmail.com> wrote:
There is stack above the libpthread entry, but it is functions from our code, so I did not include them here in this public forum. The crash stack is from calling glibc backtrace and backtrace_symols_fd functions from within our SIGSEGV handler. I also confirmed this stack by running our program (with profiling turned on) under gdb and got the same stack.

Thanks for weighing in on this issue. I'll write to libunwind, but I definitely cannot provide them our test case for sure. I'll have to figure out other ways to reproduce and share a test case.

Also, since you have not commented on our compiler (gcc 4.2.2), I am assuming that that is not a problem.

gcc 4.2 is unlikely to be causing this, but I cannot exclude anything. As for frames in your code, I also don't care. All I am asking is whether it looks like something odd is being done in your code. Such as calls to pthread or one of string functions in glibc.

Can you at least reveal what that libpthread.so entry at the top does? My guess would be perhaps something mutex-ful?

 
To unsubscribe from this group and stop receiving emails from it, send an email to gperftools+unsubscribe@googlegroups.com.

To post to this group, send email to gperf...@googlegroups.com.

krisschumi

unread,
Dec 21, 2017, 8:06:26 PM12/21/17
to gperftools
Let me run the test case again to get the stack trace. But as far as I remember, the stack above libpthread was not doing any string operations. There's in fact virtually no string operations in our code other than during startup. But I will try to answer your questions as soon as I get a chance to rerun my experiment.
Reply all
Reply to author
Forward
0 new messages