debugging segfaults that occur in release mode only

22 views
Skip to first unread message

Prasun Ratn

unread,
May 11, 2022, 3:49:21 AM5/11/22
to DynamoRIO Users
I am seeing a segfault that only occurs in release mode (not every time but pretty frequently). To debug the crash I ran it in debug mode but couldn't reproduce the issue.

I tried running under gdb but it didn't stop at SIGSEGV (it did stop earlier during the safe_read SEGV). Also when I to break into the program using 'CTRL-C' in gdb (after the initial stuff was done) the program immediately terminated. This doesn't happen when I run the program itself under gdb.

Any suggestions on how to go about this debug?

sharma...@google.com

unread,
May 11, 2022, 10:21:28 AM5/11/22
to DynamoRIO Users
Hi,

> I tried running under gdb but it didn't stop at SIGSEGV (it did stop earlier during the safe_read SEGV).

From your question, since the crash does not reproduce in some scenarios, it may be so that it's timing sensitive. If I'm understanding correctly, when run under gdb the crash indeed happens, just that gdb does not stop at the SIGSEGV? 

Does the crash happen with a client or just plain DR?

Does it enter DR's main_signal_handler on the SIGSEGV? If it does, you can try looking at the siginfo there. 

For such crashes, I usually resort to printf debugging. I start by adding manual logs at dispatch enter and exit, to first figure out if the crash is in DR or the code cache.

Since log files won't be available in release mode, you could try modifying the source code to selectively convert some useful LOGs to dr_printf.

Maybe Derek might know of a more systematic way to debug in this situation.

Abhinav

Ratn, Prasun

unread,
May 11, 2022, 1:45:56 PM5/11/22
to sharma...@google.com, DynamoRIO Users
> > I tried running under gdb but it didn't stop at SIGSEGV (it did stop
>earlier during the safe_read SEGV).
>
>From your question, since the crash does not reproduce in some
>scenarios, it may be so that it's timing sensitive. If I'm
>understanding correctly, when run under gdb the crash indeed happens,
>just that gdb does not stop at the SIGSEGV?
>
Yes that is correct.

>
>Does the crash happen with a client or just plain DR?
This happens with drcachesim + my changes for #5199

>Does it enter DR's main_signal_handler on the SIGSEGV? If it does, you
>can try looking at the siginfo there.
I will check this. I wasn't sure where DR's sigsegv handler was.
>For such crashes, I usually resort to printf debugging. I start by
>adding manual logs at dispatch enter and exit, to first figure out if
>the crash is in DR or the code cache.
I see. Could you tell me which functions these would be? I might need
some fumbling around to locate these.
>Since log files won't be available in release mode, you could try
>modifying the source code to selectively convert some useful LOGs to
>dr_printf.
I'll try this.
>
>
>Maybe Derek might know of a more systematic way to debug in this
>situation.
>
>Abhinav
>
>On Wednesday, May 11, 2022 at 3:49:21 AM UTC-4 prasu...@gmail.com
>wrote:
>>I am seeing a segfault that only occurs in release mode (not every
>>time but pretty frequently). To debug the crash I ran it in debug mode
>>but couldn't reproduce the issue.
>>
>>I tried running under gdb but it didn't stop at SIGSEGV (it did stop
>>earlier during the safe_read SEGV). Also when I to break into the
>>program using 'CTRL-C' in gdb (after the initial stuff was done) the
>>program immediately terminated. This doesn't happen when I run the
>>program itself under gdb.
>>
>>Any suggestions on how to go about this debug?
>
>--
>You received this message because you are subscribed to a topic in the
>Google Groups "DynamoRIO Users" group.
>To unsubscribe from this topic, visit
>https://groups.google.com/d/topic/dynamorio-users/qlgKXJb4K4E/unsubscribe.
>To unsubscribe from this group and all its topics, send an email to
>dynamorio-use...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/dynamorio-users/21fb7a76-6a71-4a15-8658-9b8f89b09a44n%40googlegroups.com
><https://groups.google.com/d/msgid/dynamorio-users/21fb7a76-6a71-4a15-8658-9b8f89b09a44n%40googlegroups.com?utm_medium=email&utm_source=footer>.

sharma...@google.com

unread,
May 11, 2022, 2:13:19 PM5/11/22
to DynamoRIO Users
Hi,

> I wasn't sure where DR's sigsegv handler was

> This happens with drcachesim + my changes for #5199

Okay, so that helps narrow it down somewhat. I assume it doesn't crash with drcachesim at the master branch?

> Could you tell me which functions these would be?

Try dispatch_enter_dynamorio and dispatch_enter_fcache (see the related dispatch_enter_fcache_stats and dispatch_exit_fcache_stats too).

Hope this helps.

Abhinav


Derek Bruening

unread,
May 11, 2022, 2:59:43 PM5/11/22
to Ratn, Prasun, sharma...@google.com, DynamoRIO Users
On Wed, May 11, 2022 at 1:45 PM Ratn, Prasun <prasu...@gmail.com> wrote:
> > I tried running under gdb but it didn't stop at SIGSEGV (it did stop
>earlier during the safe_read SEGV).
>
>From your question, since the crash does not reproduce in some
>scenarios, it may be so that it's timing sensitive. If I'm
>understanding correctly, when run under gdb the crash indeed happens,
>just that gdb does not stop at the SIGSEGV?
>
Yes that is correct.

That is strange if gdb was told to stop yet it didn't, or the kernel didn't deliver it via ptrace.
What is the output?  gdb detects the inferior process exiting?
I would check the gdb version and try something more recent as one avenue.
 
The other thing I would do is get a core dump and then use gdb on that.
It is more limited than live but at least you can examine state at that point. 
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/em985b7c42-1e1d-4112-995d-58eba924bcda%40blr-5cg94298yy.

Prasun Ratn

unread,
May 13, 2022, 10:52:17 AM5/13/22
to Derek Bruening, sharma...@google.com, DynamoRIO Users
I was able to generate a core. I opened it with gdb using `gdb app core` where app is the program I am running under DR.

gdb shows 128 (SI_KERNEL?) and dmesg shows a vsyscall related message. Does this imply something went wrong earlier or is this the problem?

(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 128, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0,
        sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
      si_addr = 0x0, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 0, si_fd = 0}}}

[99844.990366] app[48697] vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround ip:7f6d4d119d23 cs:33 sp:7fff40c40938 ax:ffffffffff600000 si:ffffffffff600000 di:7fff40c40960
[99958.820495] traps: app[51431] general protection fault ip:7f6d4cb7600b sp:7f6b096b85f0 error:0 in libc-2.31.so[7f6d4cb51000+178000]

I am using Ubuntu 20.04 with 5.14 kernel.
$ gdb --version
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2

Reply all
Reply to author
Forward
0 new messages