Syscall stack switch and gdb reporting stack corrupted problem

1,021 views
Skip to first unread message

Waldek Kozaczuk

unread,
Jan 21, 2018, 6:08:31 PM1/21/18
to OSv Development
Last couple of days I was trying to investigate why gdb reports stack is corrupted when backtracing. This is related to golang support and is the same issue that Hawxchen reported when troubleshooting some side effect of his patch addressing #808 (see this email exchange - https://groups.google.com/forum/#!searchin/osv-dev/hawxchen%7Csort:date/osv-dev/Ol1zuoOwjzg/DD6g4YI6AgAJ).

The "corrupted stack" is reported like so by gdb:

(gdb) bt
#0  0x0000000000391182 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:24
#3  0x00000000002258f7 in abort (fmt=fmt@entry=0x69f91e "Aborted\n") at runtime.cc:132
#4  0x0000000000225920 in abort () at runtime.cc:98
#5  0x00000000003d89eb in syscall (number=number@entry=267) at linux.cc:416
#6  0x00000000003d8b54 in syscall_wrapper (number=267, p1=-100, p2=842350526864, p3=842350845952, p4=128, 
    p5=0, p6=0) at linux.cc:431
#7  0x000000000038abc8 in syscall_entry () at arch/x64/entry.S:261
#8  0x00001000010a861a in syscall.Syscall6 () at /home/wkozaczuk/tools/go/src/syscall/asm_linux_amd64.s:52
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

 The key thing that makes this slightly different from normal "corrupt stack" is "frame inner to this frame" phrase that caught my eye and I started digging/googling into that. There wasn't much but I eventually I found that this two - http://gdb-patches.sourceware.narkive.com/Sm8EGiYq/allow-switching-stacks and source code of frame.c of gdb - https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/frame.c;h=05a3be400c35ada0de48fd529e9cc83e0eaa941d;hb=db3a1dc7c9404e203c54576db3335e69d995e83e#l840.

The key comments in frame.c:
840 /* As a safety net to avoid unnecessary backtracing while trying
841 to find an invalid ID, we check for a common situation where
842 we can detect from comparing stack addresses that no other
843 frame in the current frame chain can have this ID. See the
844 comment at frame_id_inner for details. */

/* Safety net to check whether frame ID L should be inner to
722 frame ID R, according to their stack addresses.
724 This method cannot be used to compare arbitrary frames, as the
725 ranges of valid stack addresses may be discontiguous (e.g. due
726 to sigaltstack).
728 However, it can be used as safety net to discover invalid frame
729 IDs in certain circumstances. Assuming that NEXT is the immediate
730 inner frame to THIS and that NEXT and THIS are both NORMAL frames:
732 * The stack address of NEXT must be inner-than-or-equal to the stack
733 address of THIS.
735 Therefore, if frame_id_inner (THIS, NEXT) holds, some unwind
736 error has occurred.
738 * If NEXT and THIS have different stack addresses, no other frame
739 in the frame chain may have a stack address in between.
741 Therefore, if frame_id_inner (TEST, THIS) holds, but
742 frame_id_inner (TEST, NEXT) does not hold, TEST cannot refer
743 to a valid frame in the frame chain.
745 The sanity checks above cannot be performed when a SIGTRAMP frame
746 is involved, because signal handlers might be executed on a different
747 stack than the stack used by the routine that caused the signal
748 to be raised. This can happen for instance when a thread exceeds
749 its maximum stack size. In this case, certain compilers implement
750 a stack overflow strategy that cause the handler to be run on a
751 different stack. */
753 static int
754 frame_id_inner (struct gdbarch *gdbarch, struct frame_id l, struct frame_id r)

This made me thinking that maybe when we switch stack during syscall_entry the beginning address of syscall stack might be higher then any address in the regular stack. This does not matter for the regular operation - stack in both areas grows in the direction of lower addresses. However when gdb tries to unwind the stack during backtrace it detects when tracing from last stack frame of syscall stack into next frame which is part of regular stack that the absolute addresses become lower which is why it reports as "previous frame inner to this frame (corrupt stack?)".

So I hacked some code in pthreads.cc to make syscal stack to be allocated before regular with a hope to make syscall stack lower than regular stack. Btw I realized that regular stack when using pthreads API is allocated using mmap ANONYMOUS which makes its address much lower than anything returned from alloc. What was the reason to use mmap vs regular malloc?

Here is an example of the stack addresses after my hack:
// Regularly allocated thread stack by malloc() in arch-switch.hh
...
***-> 166, Stack top: 0xffff80000205a040 // First number is thread id, second is a stack number
...
############################ -> pTHREAD allocate syscall stack to: 0x200000380000
############################ -> pTHREAD allocate to: 0x200000480000
***-> 199, Syscall stack top: 0x200000380000
***-> 199, Stack top: 0x200000480000

As I kind of expected now I can see full stack trace when I run bt (example):
(gdb) bt
#0  0x0000000000391182 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:24
#3  0x00000000002258f7 in abort (
    fmt=fmt@entry=0x6a0938 "Assertion failed: %s (%s: %s: %d)\n")
    at runtime.cc:132
#4  0x0000000000225939 in __assert_fail (
    expr=expr@entry=0x7303ae "!(e & ~SUPPORTED_EVENTS)", 
    file=file@entry=0x7303a0 "core/epoll.cc", line=line@entry=55, 
    func=func@entry=0x7304a0 <events_poll_to_epoll(unsigned int)::__func__> "events_poll_to_epoll") at runtime.cc:139
#5  0x0000000000408460 in events_poll_to_epoll (e=32) at core/epoll.cc:55
#6  epoll_file::process_activity (maxevents=128, events=0x20000047f880, 
    activity=std::unordered_set with 1 elements = {...}, this=
    0xffffa000029b7e00) at core/epoll.cc:199
#7  epoll_file::wait (this=0xffffa000029b7e00, 
    events=events@entry=0x20000047f880, maxevents=maxevents@entry=128, 
    timeout_ms=timeout_ms@entry=0) at core/epoll.cc:161
#8  0x0000000000406831 in epoll_wait (epfd=<optimized out>, 
    events=0x20000047f880, maxevents=128, timeout_ms=0) at core/epoll.cc:344
#9  0x00000000003d8236 in syscall (number=number@entry=232) at linux.cc:374
#10 0x00000000003d8b34 in syscall_wrapper (number=232, p1=4, 
    p2=35184376805504, p3=128, p4=0, p5=4, p6=17592204000280) at linux.cc:432
---Type <return> to continue, or q <return> to quit---
#11 0x000000000038abc8 in syscall_entry () at arch/x64/entry.S:261
#12 0x00001000010c5409 in runtime.epollwait ()
    at /home/wkozaczuk/tools/go-1.9.2/go/src/runtime/sys_linux_amd64.s:575
#13 0x0000100001096876 in runtime.netpoll (block=false, ~r1=0x0)
    at /home/wkozaczuk/tools/go-1.9.2/go/src/runtime/netpoll_epoll.go:74
#14 0x00001000010a5183 in runtime.sysmon ()
    at /home/wkozaczuk/tools/go-1.9.2/go/src/runtime/proc.go:3897
#15 0x000010000109def2 in runtime.mstart1 ()
    at /home/wkozaczuk/tools/go-1.9.2/go/src/runtime/proc.go:1182
#16 0x000010000109ddc6 in runtime.mstart ()
    at /home/wkozaczuk/tools/go-1.9.2/go/src/runtime/proc.go:1152
#17 0x000010000111fbe3 in crosscall_amd64 () at gcc_amd64.S:35
#18 0xffff800000f2c040 in ?? ()
#19 0xffff800000f2f798 in ?? ()
#20 0xffff800002e1e190 in ?? ()
#21 0xffffa000014a31c0 in ?? ()
#22 0x000000c420000900 in ?? ()
#23 0x000010000109dd60 in runtime.startTheWorldWithSema ()
    at /home/wkozaczuk/tools/go-1.9.2/go/src/runtime/proc.go:1070
#24 0x000010000111f73b in threadentry (v=<optimized out>)
    at gcc_linux_amd64.c:102
#25 0x0000000000448816 in pthread_private::pthread::<lambda()>::operator() (
    __closure=0xffffa000029b9600) at libc/pthread.cc:115
---Type <return> to continue, or q <return> to quit---
#26 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/5/functional:1871
#27 0x00000000003e67c7 in sched::thread_main_c (t=0xffff800002e19040)
    at arch/x64/arch-switch.hh:198
#28 0x000000000038ab53 in thread_main () at arch/x64/entry.S:113

Given all that I think we have 3 options:
  1. Somehow change logic of allocating syscall stack to make its address always be lower than other stack (what if we do not even have control of that in case of Golang goroutines). I am not sure how easy it is to achieve.
  2. Add some CFI trickery to entry.S to make gdb ignore non-contiguous stack when stack unwinding.
  3. Implement osv backtrace like other gdb extensions in scripts/loader.py
This also means that original Hawxchen did not break any CFI logic.

Being able to see full backtraces is actuall super handy when trying to troubleshoot various golang problems.

Regards,
Waldek

Nadav Har'El

unread,
Jan 28, 2018, 12:03:23 PM1/28/18
to Waldek Kozaczuk, OSv Development
Interesting. Perhaps the problem is that we try to tell GDB, via CFI, that the system call is a "normal" call, and it finds it strage that if it's a normal call, why does the stack pointer changes in this unexpected manner.
The alternative could be to tell GDB that this is a "signal frame" with .cfi_signal_frame - this GDB knows has a separate stack. I don't remember exactly what this entails, I haven't touched this CFI stuff in a while, you can see how we used it in entry.S, maybe it requires some changes to how/where we save stuff on the stack during the system call. But if we use this, the gdb backtrace will show a "signal handler called" message as part of the backtrace (I wanted to avoid that...) at the point the syscall was called - but shouldn't print those "inner frame" messages.
 

So I hacked some code in pthreads.cc to make syscal stack to be allocated before regular with a hope to make syscall stack lower than regular stack. Btw I realized that regular stack when using pthreads API is allocated using mmap ANONYMOUS which makes its address much lower than anything returned from alloc. What was the reason to use mmap vs regular malloc?

With mmap(), you can further modify the permissions of part of the pages - namely, to make the first page an no-permission "stack guard" page (so if reached, we get an immediate crash).
You can't do that with malloc (you're not supposed to call mprotect() on memory not returned by mmap() - although in Linux it probably works...).
We could allocate the two of those together, and then give the higher half to the syscall stack. But it might be ugly (I'm not sure we want to allocate these stacks at the same time, and at some point I even hoped we won't need to allocate a new syscall stack for each thread until actually used...).
 
  1. Add some CFI trickery to entry.S to make gdb ignore non-contiguous stack when stack unwinding.
So I thought of .cfi_signal_frame, but maybe there are more sophisticated CFI tricks I'm not familiar with (if we could just get gdb not to be over-smart in this case, that would be enough...).
 
  1. Implement osv backtrace like other gdb extensions in scripts/loader.py
Possible, but would be sad, considering the normal "backtrace" does work most of the time, so people won't know to look for something else.
Reply all
Reply to author
Forward
0 new messages