What follows is the analysis of one strace (strace -fv -s 100) attached to the docker daemon.
The strace log file shows the creation of a chain of processes: dockerd forks containerd forks containerd-shim forks runc forks a command that runs inside the container (the command is '/usr/bin/find .'). This is also expected.
When the I/O of the process /usr/bin/find in the docker container is blocked, strace shows that the Golang schedulers are still active:
(line 298015)
[pid 2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} <unfinished ...>
[pid 2264] sched_yield( <unfinished ...>
[pid 2270] swapcontext(0x2190a580, 0 <unfinished ...>
[pid 2264] <... sched_yield resumed>) = 0
[pid 2270] <... swapcontext resumed>) = 0
[pid 2264] sched_yield( <unfinished ...>
[pid 2270] swapcontext(0, 0x214dda80 <unfinished ...>
[pid 2264] <... sched_yield resumed>) = 0
[pid 2270] <... swapcontext resumed>) = 558750336
[pid 2264] sched_yield( <unfinished ...>
[pid 2270] swapcontext(0, 0x2190a580 <unfinished ...>
[pid 2264] <... sched_yield resumed>) = 0
[pid 2270] <... swapcontext resumed>) = 563127680
[pid 2264] sched_yield( <unfinished ...>
[pid 2270] swapcontext(0x2190a580, 0 <unfinished ...>
[pid 2264] <... sched_yield resumed>) = 0
[pid 2270] <... swapcontext resumed>) = 0
...
(line 298134)
[pid 2266] <... _newselect resumed>) = 0 (Timeout)
[pid 2264] <... sched_yield resumed>) = 0
[pid 2270] <... swapcontext resumed>) = 563127680
[pid 2266] epoll_wait(4, <unfinished ...>
[pid 2264] sched_yield( <unfinished ...>
[pid 2270] swapcontext(0x2190a580, 0 <unfinished ...>
[pid 2266] <... epoll_wait resumed>[], 128, 0) = 0
[pid 2264] <... sched_yield resumed>) = 0
[pid 2270] <... swapcontext resumed>) = 0
[pid 2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} <unfinished ...>
TIDs 2264, 2266 and 2270 belong to the process runc. The strace log has similar straces for the other processes (dockerd, containerd, containerd-shim), so I assume also their goroutine schedulers were active. I am actually wondering how to relate the arguments listed in the strace file to the Golang or C code.
Just before the container gets blocked it runs the command 'find .' that should produce output to the terminal (but there is no output at first, that is the problem). The data is visible in the strace log through the 'write()' system call:
[pid 2204] execve("/usr/bin/find", ["/usr/bin/find", "."], ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "HOSTNAME=caae2fb3bccf", "TERM=xterm", "HOME=/root"] <unfinished ...>
...
[pid 2204] lstat64(".", <unfinished ...>
...
[pid 2204] write(1, ".\n", 2 <unfinished ...>
...
[pid 2204] lstat64("./var", <unfinished ...>
...
[pid 2204] write(1, "./var\n", 6 <unfinished ...>
...
The 'find' process writes 626 lines to stdout (18778 characters, this seems to be reproducible). The last lines are:
...
[pid 2204] lstat64("./sys/kernel/slab/rpc_buffers", <unfinished ...>
...
[pid 2204] write(1, "./sys/kernel/slab/rpc_buffers\n", 30 <unfinished ...>
All the write() system calls except the last one are successfully completed. The last one remains blocked. At that time PID 2204 hangs for a long time
runc / dockerd / containerd / containerd-shim have continuous activity as in the first strace that I showed above.
When I terminate the docker process with a signal SIGINT, the characters written by PID 2204 are suddenly flushed to the terminal.
Are there anything specific things to look for in the strace files (eg. specific epoll() calls). Is there a way to map the arguments and return values of swapcontext() to goroutines, or would this be a useless thing to try to do?
This is the analysis of one trial. Some of the other trials did not start the full chain of processes, it looks like the behaviour of the bug is also timing dependent.