Port to powerpc 440fpu

260 views
Skip to first unread message

Hugo Cornelis

unread,
Jun 18, 2020, 4:17:30 PM6/18/20
to golang-nuts

Hi all,

Does anyone have experience with porting go applications to the powerpc 440fpu 32 bit.

We have a team that is porting the Docker tool suite to a device that uses this CPU, we generated the system call bindings and compiled the Docker tool suite without much problems.

Most of Docker seems to be working fine on the device, however interactive terminal input/output is not working.

Investigation shows that two specific goroutines that are responsible for forwarding the I/O between two Docker related processes are not scheduled (they don't receive CPU cycles) until after Docker terminates these process with a signal 15 (TERM) and the two goroutines are suddenly scheduled and all of the output buffers are suddenly flushed to the terminal.

We have looked at the terminal settings and flags applied to the file descriptors and these seem all fine (although I must admit that the code flows inside Docker and its tools are complicated).

We suspect there may be a problem with one or more of the system call bindings, for instance that there may be a system call declared with //sysnb where it should be just //sys, if that makes sense.  I would actually not know how to distinguish between these two flags.

We would now like to inspect the status of the two goroutines to understand what they are waiting for, and why the scheduler does not schedule them.

Debugging with GODEBUG=schedtrace=1000;scheddetail=1, helps somewhat but no idea how to relate the output of the scheduler state to the two goroutines (if it would make sense at all).

Does anyone have any experience debugging this type of problem?  How would we look at where exactly these processes are blocked without developing core knowledge about the Docker tool suite?

We have been working on this for several weeks now, any help would be greatly appreciated.

Thanks!

Hugo


Ian Lance Taylor

unread,
Jun 18, 2020, 9:34:09 PM6/18/20
to Hugo Cornelis, golang-nuts
The standard Go distribution doesn't support 32-bit PPC, so I feel
like there is some missing background information here.

If the problem is with making system calls, then it often helps to
look at the "strace -f" output to see what is going on at the system
call level.

Ian

Hugo Cornelis

unread,
Jun 29, 2020, 4:01:59 AM6/29/20
to Ian Lance Taylor, golang-nuts


Hi,

The standard Go distribution doesn't support 32-bit PPC.

To compile Golang code to 32-bit PPC we first built a proof of concept based on docker-cli using the gccgo packages for Ubuntu.  We got this working without too much effort.  Afterwards we integrated this type of cross-compilation into Buildroot to compile the entire Docker tool suite for use on an embedded system.

Most of Docker seems to be working fine on the embedded device, however local interactive terminal input / output with a running container is not working.

What we observe is similar to what is described here: https://github.com/moby/moby/issues/39461

Investigation shows that two specific goroutines in Container daemon that are responsible for forwarding the input and output from the container to the user are not scheduled (they don't receive CPU cycles) until after Docker terminates.

These two goroutines use the functions io.CopyBuffer() and ReadFrom() / WriteTo() to forward the traffic (the used method to forward traffic is demonstrated in recvtty.go at https://github.com/opencontainers/runc/blob/master/contrib/cmd/recvtty/recvtty.go)

When Docker terminates it sends signal 15 (TERM) to these processes.  This somehow allows the two goroutines to be scheduled which flushes the output buffers to the terminal.

This may be due to wrong system call bindings for 32-bit PPC in the unix package, however inspection of these bindings did not reveal any problem so far.

We have been working on this for several weeks now, any help would be greatly appreciated.

Thanks!

Hugo


Ian Lance Taylor

unread,
Jun 29, 2020, 3:10:35 PM6/29/20
to Hugo Cornelis, golang-nuts
On Mon, Jun 29, 2020 at 1:01 AM Hugo Cornelis
<hugo.c...@essensium.com> wrote:
>
> The standard Go distribution doesn't support 32-bit PPC.
>
> To compile Golang code to 32-bit PPC we first built a proof of concept based on docker-cli using the gccgo packages for Ubuntu. We got this working without too much effort. Afterwards we integrated this type of cross-compilation into Buildroot to compile the entire Docker tool suite for use on an embedded system.
>
> Most of Docker seems to be working fine on the embedded device, however local interactive terminal input / output with a running container is not working.
>
> What we observe is similar to what is described here: https://github.com/moby/moby/issues/39461
>
> Investigation shows that two specific goroutines in Container daemon that are responsible for forwarding the input and output from the container to the user are not scheduled (they don't receive CPU cycles) until after Docker terminates.
>
> These two goroutines use the functions io.CopyBuffer() and ReadFrom() / WriteTo() to forward the traffic (the used method to forward traffic is demonstrated in recvtty.go at https://github.com/opencontainers/runc/blob/master/contrib/cmd/recvtty/recvtty.go)
>
> When Docker terminates it sends signal 15 (TERM) to these processes. This somehow allows the two goroutines to be scheduled which flushes the output buffers to the terminal.
>
> This may be due to wrong system call bindings for 32-bit PPC in the unix package, however inspection of these bindings did not reveal any problem so far.
>
> We have been working on this for several weeks now, any help would be greatly appreciated.
>
> Thanks!


Thanks for the background.

Earlier I suggested looking at the output of "strace -f" for the
programs that fail. Does that show anything of interest?

Ian

Hugo Cornelis

unread,
Jul 3, 2020, 8:54:44 AM7/3/20
to Ian Lance Taylor, golang-nuts

Thanks for your answer.

On Mon, Jun 29, 2020 at 9:10 PM Ian Lance Taylor <ia...@golang.org> wrote:
Thanks for the background.

Earlier I suggested looking at the output of "strace -f" for the
programs that fail.  Does that show anything of interest?

What follows is the analysis of one strace (strace -fv -s 100) attached to the docker daemon.

The strace log file shows the creation of a chain of processes: dockerd forks containerd forks containerd-shim forks runc forks a command that runs inside the container (the command is '/usr/bin/find .').  This is also expected.

When the I/O of the process /usr/bin/find in the docker container is blocked, strace shows that the Golang schedulers are still active:

(line 298015)
[pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} <unfinished ...>
[pid  2264] sched_yield( <unfinished ...>
[pid  2270] swapcontext(0x2190a580, 0 <unfinished ...>
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 0
[pid  2264] sched_yield( <unfinished ...>
[pid  2270] swapcontext(0, 0x214dda80 <unfinished ...>
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 558750336
[pid  2264] sched_yield( <unfinished ...>
[pid  2270] swapcontext(0, 0x2190a580 <unfinished ...>
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 563127680
[pid  2264] sched_yield( <unfinished ...>
[pid  2270] swapcontext(0x2190a580, 0 <unfinished ...>
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 0
...
(line 298134)
[pid  2266] <... _newselect resumed>)   = 0 (Timeout)
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 563127680
[pid  2266] epoll_wait(4,  <unfinished ...>
[pid  2264] sched_yield( <unfinished ...>
[pid  2270] swapcontext(0x2190a580, 0 <unfinished ...>
[pid  2266] <... epoll_wait resumed>[], 128, 0) = 0
[pid  2264] <... sched_yield resumed>)  = 0
[pid  2270] <... swapcontext resumed>)  = 0
[pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} <unfinished ...>

TIDs 2264, 2266 and 2270 belong to the process runc.  The strace log has similar straces for the other processes (dockerd, containerd, containerd-shim), so I assume also their goroutine schedulers were active.  I am actually wondering how to relate the arguments listed in the strace file to the Golang or C code.

Just before the container gets blocked it runs the command 'find .' that should produce output to the terminal (but there is no output at first, that is the problem).  The data is visible in the strace log through the 'write()' system call:

[pid  2204] execve("/usr/bin/find", ["/usr/bin/find", "."], ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "HOSTNAME=caae2fb3bccf", "TERM=xterm", "HOME=/root"] <unfinished ...>
...
[pid  2204] lstat64(".",  <unfinished ...>
...
[pid  2204] write(1, ".\n", 2 <unfinished ...>
...
[pid  2204] lstat64("./var",  <unfinished ...>
...
[pid  2204] write(1, "./var\n", 6 <unfinished ...>
...

The 'find' process writes 626 lines to stdout (18778 characters, this seems to be reproducible).  The last lines are:

...
[pid  2204] lstat64("./sys/kernel/slab/rpc_buffers",  <unfinished ...>
...
[pid  2204] write(1, "./sys/kernel/slab/rpc_buffers\n", 30 <unfinished ...>

All the write() system calls except the last one are successfully completed.  The last one remains blocked.  At that time PID 2204 hangs for a long time

runc / dockerd / containerd / containerd-shim have continuous activity as in the first strace that I showed above.

When I terminate the docker process with a signal SIGINT, the characters written by PID 2204 are suddenly flushed to the terminal.

Are there anything specific things to look for in the strace files (eg. specific epoll() calls).  Is there a way to map the arguments and return values of swapcontext() to goroutines, or would this be a useless thing to try to do?

This is the analysis of one trial.  Some of the other trials did not start the full chain of processes, it looks like the behaviour of the bug is also timing dependent.

Hugo

 

Ian

Ian Lance Taylor

unread,
Jul 3, 2020, 3:36:40 PM7/3/20
to Hugo Cornelis, golang-nuts
That looks like the process is writing to a pipe that nothing is reading from.

Ian

Hugo Cornelis

unread,
Jul 9, 2020, 4:09:41 AM7/9/20
to Ian Lance Taylor, golang-nuts
On Fri, Jul 3, 2020 at 9:36 PM Ian Lance Taylor <ia...@golang.org> wrote:
That looks like the process is writing to a pipe that nothing is reading from.

Yes, that is correct.  The question is: why doesn't the reader read from the pipe?  And why does it suddenly start reading when the Docker daemon process is terminated?

At first I would believe this to be a starvation problem, but our investigation is still inconclusive.

Here is what we know about the reader process / goroutine:

- It is a goroutine that becomes active when Docker terminates.

- This same goroutine gets stuck at io.CopyBuffer(epollConsole, in, *bp) before Docker terminates.  During this time the writer writes 18778 characters (and then gets stuck).

- All the configurations we tested, gave this or similar behaviour.  However the behaviour is slightly timing dependent, ie. inserting logging statements may result in small changes to this behaviour.


During the last few days of investigation we found one race in containerd and several bugs in our system call bindings for ppc (Ftruncate, Truncate, Fstatfs, Statfs, Lstat).

We have fixed these, but the problem with the reader not reading / blocked I/O persists.

It may be a case of starvation, or a race, or something else.

More investigation is required.  We are now looking further into the system call bindings, debugging the code of Docker and its tools, and the gccgo runtime.

Thanks for your reply.

Hugo


Hugo Cornelis

unread,
Aug 10, 2020, 5:00:27 AM8/10/20
to Ian Lance Taylor, golang-nuts, Atilla Filiz

Hi,

Bottom line: Docker works reliably on powerpc 440fpu 32 bit using gccgo as the compiler.  We will likely soon start working on powerpc e6500 in 32bit mode.

After a fix in the structures used by the epoll system calls, the problem disappeared.  I assume the problem was a starvation similar to


We had to correct the used system call numbers for the fstat system call family, for sendfile, fadvise, ftruncate, truncate and fcntl.

We also had to fix the alignment of some of the structures used by these functions and for the EpollEvent structure (ie. the generator did not always generate correct structures).  The fix for EpollEvent also fixed the I/O starvation problem.

It remains unclear why the generator did not generate correct structures.  We updated the post processor mkpost.go to fix the structures (alignment + member names), but did not look further into the underlying problem.

Hugo



David Riley

unread,
Aug 10, 2020, 9:48:11 AM8/10/20
to Hugo Cornelis, golang-nuts
On Aug 10, 2020, at 4:59 AM, Hugo Cornelis <hugo.c...@essensium.com> wrote:
>
>
> Hi,
>
> Bottom line: Docker works reliably on powerpc 440fpu 32 bit using gccgo as the compiler. We will likely soon start working on powerpc e6500 in 32bit mode.
>
> After a fix in the structures used by the epoll system calls, the problem disappeared. I assume the problem was a starvation similar to
>
> https://github.com/moby/moby/issues/39461
>
> We had to correct the used system call numbers for the fstat system call family, for sendfile, fadvise, ftruncate, truncate and fcntl.
>
> We also had to fix the alignment of some of the structures used by these functions and for the EpollEvent structure (ie. the generator did not always generate correct structures). The fix for EpollEvent also fixed the I/O starvation problem.
>
> It remains unclear why the generator did not generate correct structures. We updated the post processor mkpost.go to fix the structures (alignment + member names), but did not look further into the underlying problem.

Glad to see updates! I hope there's a chance to mainline this, I would welcome running Go in 32-bit PPC on my Net/OpenBSD machines that run on that platform.


- Dave

Reply all
Reply to author
Forward
0 new messages