What can cause a process launched via os.exec to sporadically die from SIGPIPE

669 views
Skip to first unread message

Marcin Romaszewicz

unread,
Jul 5, 2020, 1:55:01 PM7/5/20
to golang-nuts
Hi All,

I'm hitting a problem using os.exec Cmd.Start to run a process.

I'm setting Cmd.Stdio and Cmd.Stderr to the same instance of an io.Pipe, and spawn a Goroutine to consume the pipe reader until I reach EOF. I then call cmd.Start(), do some additional work, and call cmd.Wait(). The runtime of the executable I launch is 15-30 minutes, and stdout/stderr output is minimal, a few 10's of kB during this 15-30 minute run.

When the pipe reaches EOF or errors out, I close the pipe reader, exit the goroutine reading the pipe, and that's when cmd.Wait() returns, exactly as documented.

This works exactly as described about 70% of the time. The remaining 30% of the time, cmd.Wait()  returns an error, which stringifies as "signal: broken pipe". I'm running thousands of copies of this executable across thousands of instances in AWS, so I have a big data set here. The broken pipe error happens at the very end when my exec'd executable is exiting, so as far as I can tell, it's run successfully and is hitting this error on exit.

I realize that SIGPIPE and EPIPE are common ways that processes clean each other up, and that shells do a lot of work hiding them, so I've also tried using exec.Cmd to spawn bash, which in turn runs my executable, but I still get a lot of these deaths due to SIGPIPE.

I've tried to reproduce this with simple commands - like `cat <longfile.txt>`, and none of these simple commands ever result in the broken pipe, and I capture all their output without issue. The command I'm running differs in that it uses quite a lot of resources and the machine is doing significant work when the executable is exiting. However, the sigpipe is being received by the application, not my Go code, implying that the Go side is closing the pipe. I can't find where this is happening.

Any tips on how to chase this down?

Thanks,
-- Marcin


Ian Lance Taylor

unread,
Jul 5, 2020, 4:05:30 PM7/5/20
to Marcin Romaszewicz, golang-nuts
The executable is dying due to receiving a SIGPIPE signal. As you
know, that means that it made a write system call to a pipe that had
no open readers. If you're confident that you are reading all the
data from the pipe in the Go program, then the natural first thing to
check is the other possible pipe: if you are reading from stdout,
check what happens on stderr, and vice-versa.

Since that probably won't help, since you can reproduce it with some
reliability, try running the whole system under strace -f. That will
show you the system calls both of your program and of the subprocess,
and should let you determine exactly which write is triggering the
SIGPIPE, and let you verify that the read end of the pipe has been
closed.

And if that doesn't help, perhaps you can modify the subprocess to
catch SIGPIPE and get a stack trace, again with the goal of finding
out exactly what write is failing.

Hope this helps.

Ian

Marcin Romaszewicz

unread,
Jul 5, 2020, 7:14:33 PM7/5/20
to Ian Lance Taylor, golang-nuts
Thanks for the tips.

The comment on Stdout and Stderr on cmd says:
// If Stdout and Stderr are the same writer, and have a type that can
// be compared with ==, at most one goroutine at a time will call Write.

Using an io.Pipe shared between these two should result in both being drained correctly, right?
 
Ian

Brian Candler

unread,
Jul 6, 2020, 8:42:23 AM7/6/20
to golang-nuts
This sounds to me like a sequencing issue, but the way you've describe it sounds correct.  Can you share a trimmed-down bit of code?

The key point is, in the goroutine that reads from the pipe, keep reading until eof and *then* close the pipe.  If you close the pipe outside this goroutine, then make sure you wait for the goroutine to finish first, e.g. with a sync.WaitGroup, although I'd have thought after cmd.Wait would be safe enough.

"When the pipe reaches EOF or errors out, I close the pipe reader".  Can you distinguish between these cases - i.e. are you definitely seeing io.EOF every time, or is there sometimes an error, and if so what's the error?  I am just thinking that if you see an error and then close the pipe, but the sender has more to send, then they will EPIPE.  But I can't think of an actual situation where that might happen, nor where it would be useful to continue reading from a pipe after an error.

What's the Go version?  I don't suppose GODEBUG=asyncpreemptoff=1 makes any difference? (Ref)

Ian Lance Taylor

unread,
Jul 6, 2020, 6:16:03 PM7/6/20
to Marcin Romaszewicz, golang-nuts
I guess I don't see how that affects what I said one way or another.

Although, let me back up a second: are you really using an io.Pipe
rather than an os.Pipe? An io.Pipe shouldn't lead to a SIGPIPE
signal.

Ian

Marcin Romaszewicz

unread,
Jul 6, 2020, 7:43:11 PM7/6/20
to Ian Lance Taylor, golang-nuts
Yes, I am using io.Pipe, and passing in the PipeWriter side of it as Stdout and Stderr on Cmd.

I figured out the problem, though. The executable which I am launching is doing a write on a non-existent file descriptor upon shutdown, which generates a SIGPIPE, per strace:

write(75, "\25\3\3\0\32,v\352\5!.v\3536\205\246N\33Y*\233t\354\246t\356Jf5\377\366", 31) = -1 EPIPE (Broken pipe)

That file descriptor had never been opened - it seems to be a random value due to memory corruption.


Ian Lance Taylor

unread,
Jul 6, 2020, 7:46:14 PM7/6/20
to Marcin Romaszewicz, golang-nuts
On Mon, Jul 6, 2020 at 4:42 PM Marcin Romaszewicz <mar...@gmail.com> wrote:
>
> Yes, I am using io.Pipe, and passing in the PipeWriter side of it as Stdout and Stderr on Cmd.

I'm glad you figured out the problem, but I want to note that you
almost certainly want to be using os.PIpe rather than io.Pipe.
io.Pipe is intended for communication between goroutines. os.PIpe is
used between processes. If you use an io.PIpe as exec.Cmd.Stdout or
exec.Cmd.Stderr, then the os/exec package will quitely create a pipe
using os.PIpe, pass that to the child, and start a goroutine to read
from that os.PIpe and write to the io.Pipe.

Ian

Marcin Romaszewicz

unread,
Jul 6, 2020, 10:02:37 PM7/6/20
to Ian Lance Taylor, golang-nuts
Thanks for the tip, I wasn't aware that happened! I'll use os.Pipe.

-- Marcin

Reply all
Reply to author
Forward
0 new messages