Is there a way to use the gnu debugger to trace the cause of a SIGPIPE? If
not, is there *any* way to determine where, in a program, an operation is
occuring that leads to a the OS throwing a SIGPIPE?
Thanks in advance,
Rob D.
> Is there a way to use the gnu debugger to trace the cause of a SIGPIPE? If
> not, is there *any* way to determine where, in a program, an operation is
> occuring that leads to a the OS throwing a SIGPIPE?
The usual advice is to use sigaction() or signal() to ignore SIGPIPE
and check for system calls that fail with errno set to EPIPE.
If your program was compiled with debugging information, then you
could try something like this:
% gdb program-name
.
.
(gdb) run program-arguments
.
.
Program received signal SIGPIPE, Broken pipe.
0x2809e4d8 in sendto () from /usr/lib/libc.so.4
(gdb) bt
#0 0x2809e4d8 in sendto () from /usr/lib/libc.so.4
#1 0x280c68db in send () from /usr/lib/libc.so.4
#2 0x8048e5e in start_server (portstr=0xbfbff753 "12345") at foo.c:230
#3 0x8048af9 in main (argc=2, argv=0xbfbff600) at foo.c:86
#4 0x8048966 in _start ()
In this example, the program died in the start_server() function
in the file foo.c at line 230.
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
The SIGPIPE is ignored....
> If your program was compiled with debugging information, then you
> could try something like this:
>
> % gdb program-name
> .
> .
> (gdb) run program-arguments
> .
> .
> Program received signal SIGPIPE, Broken pipe.
> 0x2809e4d8 in sendto () from /usr/lib/libc.so.4
> (gdb) bt
> #0 0x2809e4d8 in sendto () from /usr/lib/libc.so.4
> #1 0x280c68db in send () from /usr/lib/libc.so.4
> #2 0x8048e5e in start_server (portstr=0xbfbff753 "12345") at foo.c:230
> #3 0x8048af9 in main (argc=2, argv=0xbfbff600) at foo.c:86
> #4 0x8048966 in _start ()
>
> In this example, the program died in the start_server() function
> in the file foo.c at line 230.
>
I've done this, but the backtrace shows me the call stack as coming from the
kernel's signal thrower (I'm compiling the code on a FreeBSD system).
> Is there a way to use the gnu debugger to trace the cause of a SIGPIPE? If
> not, is there *any* way to determine where, in a program, an operation is
> occuring that leads to a the OS throwing a SIGPIPE?
The kernel will post a SIGPIPE to a process when it writes to a pipe, fifo, or
network endpoint that is no longer open for reading. The primary reason for
this behavior is to expeditiously terminate processes that write to their
stdout stream but do not check the exit status of the write() call. For
programs that are well behaved (i.e., those which check the status of calls to
write(2)) it is best to simply set SIGPIPE to be ignored.
I can't speak to how to use gdb(1) to trace the fault but using a system call
trace tool (e.g., truss(1) or strace(1)) you should be able to identify the
write(2) call that resulted in the SIGPIPE and from that deduce where in your
code the write occurred.
Jon Hess
RD> Is there a way to use the gnu debugger to trace the cause of a SIGPIPE? If
Yes.
SIGPIPE is "synchronous" signal what means it is sent directly while executing
of syscall caused it.
Read `info gdb' and go Stopping -> Signals. You'd see general HOWTO on working
with signals. Say
handle SIGPIPE stop
(I think it is default for SIGPIPE, but better if said explicitly)
continue
and on SIGPIPE it would fall to debugger. Say `bt', `f 0', `list' and exact
stop place and stack would be shown.
Using another gdb commands (examining data) you can see all needed details.
RD> not, is there *any* way to determine where, in a program, an operation is
RD> occuring that leads to a the OS throwing a SIGPIPE?
SIGPIPE can be thrown (unless thrown explicitly by kill(2)) on writing to pipe
or syscall where remote end disconnected.
Generally you have *no* need to debug SIGPIPE reason. If your program is simple
filter, SIGPIPE is convenient way to stop it without explicit programmer's work.
If it is daemon dealing with many connections, it should ignore SIGPIPE.
If it is filter with atexit actions (e.g. remove temporary files),
it should handle SIGPIPE and proceed graceful exit.
Again, debugging SIGPIPE reason is senseless in almost all applications.
-netch-
> > The usual advice is to use sigaction() or signal() to ignore SIGPIPE
> > and check for system calls that fail with errno set to EPIPE.
>
> The SIGPIPE is ignored....
Please post the code that you're using to ignore SIGPIPE.
Could another part of the program be changing the SIGPIPE handler?
> > If your program was compiled with debugging information, then you
> > could try something like this:
> >
> > % gdb program-name
> > .
> > .
> > (gdb) run program-arguments
> > .
> > .
> > Program received signal SIGPIPE, Broken pipe.
> > 0x2809e4d8 in sendto () from /usr/lib/libc.so.4
> > (gdb) bt
> > #0 0x2809e4d8 in sendto () from /usr/lib/libc.so.4
> > #1 0x280c68db in send () from /usr/lib/libc.so.4
> > #2 0x8048e5e in start_server (portstr=0xbfbff753 "12345") at foo.c:230
> > #3 0x8048af9 in main (argc=2, argv=0xbfbff600) at foo.c:86
> > #4 0x8048966 in _start ()
> >
> > In this example, the program died in the start_server() function
> > in the file foo.c at line 230.
>
> I've done this, but the backtrace shows me the call stack as coming from the
> kernel's signal thrower (I'm compiling the code on a FreeBSD system).
Please post the gdb output.
How reproduceable is this problem? Do certain circumstances always
cause it to happen or is it unpredictable?
> Thanks everyone, problem diagnozed. Related question - is there any way to
> check the status of a socket before writing to it (or reading from it,
> although that's not the problem here)?
So what was the problem? The solution may be useful to future
generations, or we might discover that the solution still has
problems. If it was a programming bug, then 'fess up so others
can learn not to make the same mistake :-)
The topic of checking socket status has come up before. What
exactly do you want this check to tell you that you can't
determine by performing the read or write and seeing if an
error occurred? Such checks typically have race conditions:
1. You check the socket status; the socket is OK.
2. Something changes the socket status from OK to NOT OK.
3. You try to do something based on the assumption that the socket is OK.
4. Oops.
You could use getsockopt() to get the SO_ERROR socket option,
but this would be susceptible to the race condition described
above. Others have suggested doing a non-blocking recv(),
possibly with the MSG_PEEK flag set -- if it fails with an error
other than EAGAIN/EWOULDBLOCK then the connection is no longer
alive, but again you could have a race condition.
What are you trying to do?
>Thanks everyone, problem diagnozed. Related question - is there any way to
>check the status of a socket before writing to it (or reading from it,
>although that's not the problem here)?
SIGPIPE / EPIPE tells you when the socket is no longer writable, i.e., that
what the signal does, but it does it after the fact, there is no way to
check before.
--
bringing you boring signatures for 17 years
>
> So what was the problem? The solution may be useful to future
> generations, or we might discover that the solution still has
> problems. If it was a programming bug, then 'fess up so others
> can learn not to make the same mistake :-)
To explain it breifly, the code is a networking daemon that implements a
user level protocol. Part of the protocol requires that a certain class of
packets, if received from a given host, must be re-broadcast to all other
hosts that the daemon is connected to. So there is a for() loop to do
this, with a send() call. However, only one socket is processed at a time,
and apparently every so often another connection on a differnt socket will
be terminated while the daemon is processing the current socket. If that
occurs and the daemon has to broadcast a packet from the current socket to
the one that just died, then a SIGPIPE will be thrown when it attempts to
write to the dead socket. That's the problem. The solution, which is what
I did in the first place, is to just write a handler that ignores the
SIGPIPE (I actually wrote a handler, instead of just using SIG_IGN).
> The topic of checking socket status has come up before. What
> exactly do you want this check to tell you that you can't
> determine by performing the read or write and seeing if an
> error occurred? Such checks typically have race conditions:
I'd like to know if the connection is dead so I can remove the socket before
attempting to write to it and getting the inevitable SIGPIPE. However,
it's certainly not application critical, as I have a handler in place that
more or less ignores the SIGPIPE.
> 1. You check the socket status; the socket is OK.
> 2. Something changes the socket status from OK to NOT OK.
> 3. You try to do something based on the assumption that the socket is OK.
> 4. Oops.
Since the check would occur immediately before the send(), I think this race
condition would be fairly rare given the almost non existant delay between
the check and the send(), but I definately see the problem. Especially
since this is multithreaded code, if the thread should be scheduled out
while the check is occuring but before the send(), I could see a problem.
> You could use getsockopt() to get the SO_ERROR socket option,
> but this would be susceptible to the race condition described
> above. Others have suggested doing a non-blocking recv(),
> possibly with the MSG_PEEK flag set -- if it fails with an error
> other than EAGAIN/EWOULDBLOCK then the connection is no longer
> alive, but again you could have a race condition.
>
> What are you trying to do?
>
Mostly I was just checking the code, and seeing if I could find a way to
prevent the SIGPIPE. Since it seems that the signal is really no big deal,
however, then I guess I'll just leave the handler in place and not worry
about it.
-Rob D.
Somebody please correct me if I'm mistaken, but as far as I know,
if you set a signal's handler to SIG_IGN then you should never
receive that signal (except for unblockable signals such as SIGKILL
and SIGSTOP). If you were correctly setting the SIGPIPE handler
to SIG_IGN and you weren't changing it elsewhere in the code, and
yet you were still receiving SIGPIPE signals, then perhaps you've
uncovered a bug in your implementation. But before we draw that
conclusion, could you show us the code that you were using to ignore
the signal? I've seen problems when a programmer used sigaction()
but failed to initialize sa_flags, resulting in garbage SA_* flags
being passed to sigaction().
> > The topic of checking socket status has come up before. What
> > exactly do you want this check to tell you that you can't
> > determine by performing the read or write and seeing if an
> > error occurred? Such checks typically have race conditions:
>
> I'd like to know if the connection is dead so I can remove the socket before
> attempting to write to it and getting the inevitable SIGPIPE. However,
> it's certainly not application critical, as I have a handler in place that
> more or less ignores the SIGPIPE.
>
> > 1. You check the socket status; the socket is OK.
> > 2. Something changes the socket status from OK to NOT OK.
> > 3. You try to do something based on the assumption that the socket is OK.
> > 4. Oops.
>
> Since the check would occur immediately before the send(), I think this race
> condition would be fairly rare given the almost non existant delay between
> the check and the send(), but I definately see the problem. Especially
> since this is multithreaded code, if the thread should be scheduled out
> while the check is occuring but before the send(), I could see a problem.
The infrequency of errors cased by a race condition is precisely
what can make them difficult to diagnose. The timing has to be
just right, which means that the conditions that caused the error
can be difficult or impossible to reproduce during testing. Until
you figure out what's going on, you're stuck with a program that
occasionally misbehaves for reasons unknown.
If you're aware of a race condition and account for it then it may
not cause a problem. If your code has checks to catch errors that
slip in during the race condition "window of opportunity" then you
may be safe.
> > You could use getsockopt() to get the SO_ERROR socket option,
> > but this would be susceptible to the race condition described
> > above. Others have suggested doing a non-blocking recv(),
> > possibly with the MSG_PEEK flag set -- if it fails with an error
> > other than EAGAIN/EWOULDBLOCK then the connection is no longer
> > alive, but again you could have a race condition.
> >
> > What are you trying to do?
>
> Mostly I was just checking the code, and seeing if I could find a way to
> prevent the SIGPIPE. Since it seems that the signal is really no big deal,
> however, then I guess I'll just leave the handler in place and not worry
> about it.
I'm still curious about why you were receiving SIGPIPE signals even
though you had the handler set to SIG_IGN; as far as I know, this
shouldn't happen. All I can think of is that either your code
wasn't setting the handler correctly, or that there's a bug in your
implementation, or that I've misunderstood signal semantics. Has
anybody else seen this?
Robert> That's the problem. The solution, which is what I did in the
Robert> first place, is to just write a handler that ignores the
Robert> SIGPIPE (I actually wrote a handler, instead of just using
Robert> SIG_IGN).
There's almost never any point in that; just set it to SIG_IGN.
--
Andrew.
comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
> My apologies if I was unclear - what I meant was, previous to this, I had a
> handler in place for SIGPIPE. However, based on the advice given to me
> here, I have modified the code now - there is no handler for SIGPIPE, which
> is set to SIG_IGN. No more SIGPIPEs, on to the next bug :)
One of your messages said that "The SIGPIPE is ignored...."; now
you're saying that you had a handler in place. This is why I asked
you to post the code that you were using to ignore SIGPIPE: I wanted
to see what you were really doing instead of relying on what you
said you were doing. It's harder for us to help if you don't post
accurate information and if you don't give us the information we
ask for.