Using the bt-akaros script

3 views
Skip to first unread message

Barret Rhoden

unread,
Jun 12, 2015, 3:28:00 PM6/12/15
to aka...@googlegroups.com
For those curious about how to use the new backtrace script, here's
the steps I went through while debugging a warning we get while
running netperf.

If you run netperf, you'll see something like:

/ $ netperf -H 10.0.2.2
TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null) () port
0 AF_INET
netperf: create_data_socket: SO_REUSEADDR failed 92
[kernel] sys_close failed: proc 25 fd 5. Check your rets.

The last line is a warning from the kernel: someone tried to close
an FD twice. That can be pretty disastrous, since you could
accidentally close another socket/file. We had a brutal bug like that
a while ago, which is why I put in the warning.

Anyway, head into kern/src/syscall.c and add a backtrace:

if (retval < 0) {
/* no one checks their retvals. a double close will cause problems. */
printk("[kernel] sys_close failed: proc %d fd %d. Check your rets.\n",
p->pid, fd);

/* add this */
backtrace_user_ctx(current_ctx);

}

Note that current_ctx might not be the context that issued the
syscall! If the syscall blocked, then the original context has
actually restarted and moved on, and current_ctx points somewhere
else. This is all due to our asynchronous syscalls. Regardless, this
is still a useful trick.

At runtime, you'll see:

[kernel] sys_close failed: proc 25 fd 5. Check your rets.
User context backtrace:
Offsets only matter for shared libraries
#01 Addr 0x000040000035f12e is in libc-2.19.so at offset 0x00000000000cf12e
#02 Addr 0x0000400000357111 is in libc-2.19.so at offset 0x00000000000c7111
#03 Addr 0x000000000040c511 is in netperf at offset 0x000000000000c511
#04 Addr 0x0000000000402f35 is in netperf at offset 0x0000000000002f35
#05 Addr 0x00004000002ae1b2 is in libc-2.19.so at offset 0x000000000001e1b2
#06 Addr 0x0000000000402ce4 is in netperf at offset 0x0000000000002ce4

Those addresses and offsets need to be resolved to the functions
containing them. This is where bt-akaros.sh comes in.

First, you'll need to edit your local copy of bt-akaros.sh so that the
script can find your shared libraries and binaries and knows what a
shared library looks like. You can have these point anywhere, but I
point mine right into KFS:

S0LIBS_PREFIX=~/akaros/ros-kernel/kern/kfs/lib/
SO_REGEX=.*so$
BIN_PREFIX=~/akaros/ros-kernel/kern/kfs/bin/

Then you just pipe your backtrace into the script:

$ echo '#01 Addr 0x000040000035f12e is in libc-2.19.so at offset 0x00000000000cf12e
#02 Addr 0x0000400000357111 is in libc-2.19.so at offset 0x00000000000c7111
#03 Addr 0x000000000040c511 is in netperf at offset 0x000000000000c511
#04 Addr 0x0000000000402f35 is in netperf at offset 0x0000000000002f35
#05 Addr 0x00004000002ae1b2 is in libc-2.19.so at offset 0x000000000001e1b2
#06 Addr 0x0000000000402ce4 is in netperf at offset 0x0000000000002ce4
' | ~/scripts/bt-akaros.sh

and get your results:

#01 Addr 0x000040000035f12e is in libc-2.19.so at offset 0x00000000000cf12e __ros_syscall_errno@@GLIBC_2.2.5+0x6e
#02 Addr 0x0000400000357111 is in libc-2.19.so at offset 0x00000000000c7111 close@@GLIBC_2.2.5+0x21
#03 Addr 0x000000000040c511 is in netperf at offset 0x000000000000c511 send_tcp_stream+0x654
#04 Addr 0x0000000000402f35 is in netperf at offset 0x0000000000002f35 main+0x105
#05 Addr 0x00004000002ae1b2 is in libc-2.19.so at offset 0x000000000001e1b2 __libc_start_main@@GLIBC_2.2.5+0xa2
#06 Addr 0x0000000000402ce4 is in netperf at offset 0x0000000000002ce4 _start+0x124


It doesn't give you line numbers (it just looks at the symtab), but it
does a decent job and is faster than disassembling and manually looking
for an address.

With that, we can find the close() call in send_tcp_stream(). Even
though it's a huge function, there's only one close() call, and it
happens right after a shutdown(). And there's our problem. Our
implementation of shutdown() actually does a close(), which isn't
right. Hence the double-close.

Barret


Reply all
Reply to author
Forward
0 new messages