If you run netperf, you'll see something like:
/ $ netperf -H 10.0.2.2
TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null) () port
0 AF_INET
netperf: create_data_socket: SO_REUSEADDR failed 92
[kernel] sys_close failed: proc 25 fd 5. Check your rets.
The last line is a warning from the kernel: someone tried to close
an FD twice. That can be pretty disastrous, since you could
accidentally close another socket/file. We had a brutal bug like that
a while ago, which is why I put in the warning.
Anyway, head into kern/src/syscall.c and add a backtrace:
if (retval < 0) {
/* no one checks their retvals. a double close will cause problems. */
printk("[kernel] sys_close failed: proc %d fd %d. Check your rets.\n",
p->pid, fd);
/* add this */
backtrace_user_ctx(current_ctx);
}
Note that current_ctx might not be the context that issued the
syscall! If the syscall blocked, then the original context has
actually restarted and moved on, and current_ctx points somewhere
else. This is all due to our asynchronous syscalls. Regardless, this
is still a useful trick.
At runtime, you'll see:
[kernel] sys_close failed: proc 25 fd 5. Check your rets.
User context backtrace:
Offsets only matter for shared libraries
#01 Addr 0x000040000035f12e is in libc-2.19.so at offset 0x00000000000cf12e
#02 Addr 0x0000400000357111 is in libc-2.19.so at offset 0x00000000000c7111
#03 Addr 0x000000000040c511 is in netperf at offset 0x000000000000c511
#04 Addr 0x0000000000402f35 is in netperf at offset 0x0000000000002f35
#05 Addr 0x00004000002ae1b2 is in libc-2.19.so at offset 0x000000000001e1b2
#06 Addr 0x0000000000402ce4 is in netperf at offset 0x0000000000002ce4
Those addresses and offsets need to be resolved to the functions
containing them. This is where bt-akaros.sh comes in.
First, you'll need to edit your local copy of bt-akaros.sh so that the
script can find your shared libraries and binaries and knows what a
shared library looks like. You can have these point anywhere, but I
point mine right into KFS:
S0LIBS_PREFIX=~/akaros/ros-kernel/kern/kfs/lib/
SO_REGEX=.*so$
BIN_PREFIX=~/akaros/ros-kernel/kern/kfs/bin/
Then you just pipe your backtrace into the script:
$ echo '#01 Addr 0x000040000035f12e is in libc-2.19.so at offset 0x00000000000cf12e
#02 Addr 0x0000400000357111 is in libc-2.19.so at offset 0x00000000000c7111
#03 Addr 0x000000000040c511 is in netperf at offset 0x000000000000c511
#04 Addr 0x0000000000402f35 is in netperf at offset 0x0000000000002f35
#05 Addr 0x00004000002ae1b2 is in libc-2.19.so at offset 0x000000000001e1b2
#06 Addr 0x0000000000402ce4 is in netperf at offset 0x0000000000002ce4
' | ~/scripts/bt-akaros.sh
and get your results:
#01 Addr 0x000040000035f12e is in libc-2.19.so at offset 0x00000000000cf12e __ros_syscall_errno@@GLIBC_2.2.5+0x6e
#02 Addr 0x0000400000357111 is in libc-2.19.so at offset 0x00000000000c7111 close@@GLIBC_2.2.5+0x21
#03 Addr 0x000000000040c511 is in netperf at offset 0x000000000000c511 send_tcp_stream+0x654
#04 Addr 0x0000000000402f35 is in netperf at offset 0x0000000000002f35 main+0x105
#05 Addr 0x00004000002ae1b2 is in libc-2.19.so at offset 0x000000000001e1b2 __libc_start_main@@GLIBC_2.2.5+0xa2
#06 Addr 0x0000000000402ce4 is in netperf at offset 0x0000000000002ce4 _start+0x124
It doesn't give you line numbers (it just looks at the symtab), but it
does a decent job and is faster than disassembling and manually looking
for an address.
With that, we can find the close() call in send_tcp_stream(). Even
though it's a huge function, there's only one close() call, and it
happens right after a shutdown(). And there's our problem. Our
implementation of shutdown() actually does a close(), which isn't
right. Hence the double-close.
Barret