Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

question: TCP connections

5 views
Skip to first unread message

Stuart Wachsberg

unread,
Sep 24, 1995, 3:00:00 AM9/24/95
to
Programming in C under UNIX, how can I determine if a TCP
connection has been torn down by the other side of the
connection. I do not want to have to resort to writing to the
socket and catching a SIGPIPE signal.

Thanks, Stuart Wachsberg


Todd Sandor

unread,
Sep 28, 1995, 3:00:00 AM9/28/95
to
In article <4476ov$3...@b30news.b30.ingr.com>,
Alan Barksdale <alan@afbarksd@ingr.com> wrote:
>In article <446bne$9...@dg-rtp.dg.com> sc...@dg-rtp.dg.com (John Scott) writes:
>>Stuart Wachsberg (sbwa...@neumann.uwaterloo.ca) wrote:
>>: Programming in C under UNIX, how can I determine if a TCP

>>: connection has been torn down by the other side of the
>>1) read() returns zero bytes.
>>2) write() returns -1 and errno is EPIPE (this also generates SIGPIPE).
>>In both cases select() [or poll()] will return ready for reading and
>>writing.
>>SIGPIPE is NOT generated when the connection is closed; only when an
>>attempt is made to write on a closed connection. You can receive SIGIO
>>if you have set the socket to generate SIGIO.
>
>If you're using select() with a NULL timeout and if the other side of the
>connection is on a machine that has been turned off without normal shutdown,
>select() waits indefinitely. You can use the BSD function setsockopt() to
>set socket option SO_KEEPALIVE, which will cause periodic tests on whether
>idle sockets still have good connections. If a connection has gone bad,
>select() will return. On UNIX machines, processes using the socket will
>receive SIGPIPE.
>
>A question on SO_KEEPALIVE:
>
>Does anyone know how to change how long a socket must be idle before the
>"keep alive" test is made in a UNIX environment? On NT, one sets the
>REG_DWORD registry value
>HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime
>to the desired number of milliseconds. I'd like to make the delay on our
>several types of UNIX to be a minute or so.

Appendix E of Richard Stevens "TCP/IP Illustrated Volume 1" (ISBN 0-201-63346-9)
specifies the various TCP/IP configurable options for various Unix flavors (BSD/386,
SunOS 4.1.3, S5R4, Solaris 2.x, AIX 3.2.2 and 4.4 BSD). It includes info on
keepalive settings, so it depends on the "UNIX environment" your talking
about on how you go about doing it. (e.g SunOS you change the tcp_keepidle
variable in tcp_keepidle/in_proto.c and regen the kernel and reboot with it, with
solaris 2.x you use ndd, ndd -get /dev/tcp tcp_keepalive_interval gives you the
setting and ndd -set /dev/tcp tcp_keepalive_interval allows you to set it).

The more important question is that do you really want to set it for a minute or
so?, as far as I know it was never intended to be set this low and it may cause
other tcp-state-machine related problems. I remember looking into this with help
from Sun support (for SunOS 4.1.x) and they basically recommeneded not to set it so
low. They said it may not work (something about that even if you set it to
around 1 minute, it would take at least 10 minutes before the keepalive
functionality kicked in, I can't remember the details).
Also, one side affect of doing this (if it works) is that it affects all
applications that use the tcp keepalive feature. (eg. if a pc user was has
a telnet session in and he/she is editing a file and you had a ethernet
failure (someone accidently tripped over the cable it caused the cable to
come loose) it would only be 1 minute before the telnet session was terminated
and the user may have lost all the edits...).

If an application has to find out so fast then a connection abnormally
dies (ethernet is pulled, machine is powered off etc.) it should be
designed/implemented into the application or a protocol that the
application uses (protocol that sits on top if tcp). I know it
can be done since we have implemented such a protocol using tcp and
udp (basically we use udp for sending/responding to probes at
fixed (configurable) intervals, if a certain number of probes are missed,
the tcp connection is brought down and the application finds out about
it (its similiar to ftp in that two sockets are used,
the difference being that ftp uses two tcp connections, our protocol uses one
tcp and one udp socket). I'm not saying its trivial to do, just that it
can be done.
--
Todd Sandor Newbridge Networks: Kanata, Ontario Canada
to...@newbridge.com (613)591-3600 ext. 1011
There are always alternatives.
-- Spock, "The Galileo Seven," stardate 2822.3.

Alan Barksdale

unread,
Oct 2, 1995, 3:00:00 AM10/2/95
to

Thanks for taking the time to write this useful reply. I'll try to get that
book. We're still discussing what value we want to use for the waiting
period. FWIW, here's a summary of what we've found on several systems:

HP-UX: On version 9.0, use the keepalive script from the HP Response Center.
On version 10.0, use the nettune command.

Intergraph CLIX: Get the UNIXCFG product. Modify tcp_kp_idl in the
master.d/*/dod file appropriate to your system. Build yourself a new kernel.

OpenVMS: Probably use UCX$C_TCP_DROP_IDLE or UCX$C_TCP_PROBE_IDLE with
UCX$INETDEF definition file.

WinNT: Use registry value


HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveT

ime.
______________________________________________________________________________
| If you lie to the compiler, it will get its revenge. --- Henry Spencer |
| --- Alan Barksdale --- afba...@ingr.com --- 205-730-3764 --- |

Stuart Wachsberg

unread,
Oct 3, 1995, 3:00:00 AM10/3/95
to sc...@dg-rtp.dg.com
sc...@dg-rtp.dg.com (John Scott) wrote:
>Stuart Wachsberg (sbwa...@neumann.uwaterloo.ca) wrote:
>: Programming in C under UNIX, how can I determine if a TCP
>: connection has been torn down by the other side of the
>: connection. I do not want to have to resort to writing to the

>: socket and catching a SIGPIPE signal.
>
>: Thanks, Stuart Wachsberg
>
>
>1) read() returns zero bytes.
>
>2) write() returns -1 and errno is EPIPE (this also generates SIGPIPE).
>
>In both cases select() [or poll()] will return ready for reading and
>writing.
>
>SIGPIPE is NOT generated when the connection is closed; only when an
>attempt is made to write on a closed connection. You can receive SIGIO
>if you have set the socket to generate SIGIO.
>
>--
>--
>John A. Scott Data General
>Phone: +1 919 248 5995 62 TW Alexander Drive
>Email: sc...@rtp.dg.com Research Triangle Park, NC 27709


I find that write()ing to a socket where the connection has been
broken by the other side does not give me an error.

Furthurmore, I get a SIPIPE signal only after the 2nd attempt to write()
following a broken connection.

Any advice?

Thanks, Stuart


Warner Losh

unread,
Oct 5, 1995, 3:00:00 AM10/5/95
to
In article <44udf9$l...@kannews.ca.newbridge.com>,
Todd Sandor <to...@Newbridge.COM> wrote:
>An issue you have to deal with if your applications has > 1 TCP
>connection when you get a SIGPIPE is to determine which socket
>caused the signal. The ioctl() suggestioned above can be used
>when you service the SIGPIPE signal by setting up the select() for
>all tcp connections, and doing the ioctl() for each socket the select()
>indicated there is data available (read) on. The one that answers
>with 0 is the one that has gone down (as specified above).
>(I haven't tried this method, but it is suppose to work)

You are very limited in what you can do in a signal handler.

>A cleaner approach is to simply ignore the SIGPIPE signal, ie.
>#ifdef SOLARIS
> sigset(SIGPIPE, SIG_IGN);
>#endif
>#ifdef SUNOS_4
> signal(SIGPIPE, SIG_IGN);
>#endif

#ifdef IGNORE_SIPIPE
signal(SIGPIPE, SIG_IGN);
#endif

would be cleaner :-).

>and use the techniques previously described in this tread to determine
>when the socket goes down, namely:
>1) read() returns zero bytes. (do reads when the select() indicates there
>is something to read).

This doesn't always work. If you have a non-blocking socket that
hasn't completed yet, this method will fail on some systems. If it
did, my life would have been much easier.

>2) write() returns -1 and errno is EPIPE (this also generates SIGPIPE).
>

>Using this method you wouldn't have to wait until you do a write() before
>the SIGPIPE was generated, service the signal, etc. before you found out
>the socket has gone down, you would find out via the select(), read()
>returning 0 (#1).

This is true for most sockets on most systems.

Warner
--
Warner Losh "VMS Forever" home: i...@village.org
Cyberspace Development, Inc work: i...@marketplace.com
Makers of TIA, The Internet Adapter. http://marketplace.com/

Todd Sandor

unread,
Oct 5, 1995, 3:00:00 AM10/5/95
to
In article <450ujl$f...@rover.village.org>, Warner Losh <i...@village.org> wrote:
>In article <44udf9$l...@kannews.ca.newbridge.com>,
>Todd Sandor <to...@Newbridge.COM> wrote:
>>An issue you have to deal with if your applications has > 1 TCP
>>connection when you get a SIGPIPE is to determine which socket
>>caused the signal. The ioctl() suggestioned above can be used
>>when you service the SIGPIPE signal by setting up the select() for
>>all tcp connections, and doing the ioctl() for each socket the select()
>>indicated there is data available (read) on. The one that answers
>>with 0 is the one that has gone down (as specified above).
>>(I haven't tried this method, but it is suppose to work)
>
>You are very limited in what you can do in a signal handler.
>
>>A cleaner approach is to simply ignore the SIGPIPE signal, ie.
>>#ifdef SOLARIS
>> sigset(SIGPIPE, SIG_IGN);
>>#endif
>>#ifdef SUNOS_4
>> signal(SIGPIPE, SIG_IGN);
>>#endif
>
>#ifdef IGNORE_SIPIPE
> signal(SIGPIPE, SIG_IGN);
>#endif
>
>would be cleaner :-).

Actually, not entirely true under Solaris 5.x where if you
use signal(), the default signal handler is re-enabled after you get
the first signal. Thus with Solaris 5.x if you use signal(SIGPIPE, SIG_IGN)
the second time your process gets the SIGPIPE signal it will exit.
You must use sigset() under Solaris 5.x to get the same behavior as
signal() under SunOS 4.1.x....

>
>>1) read() returns zero bytes. (do reads when the select() indicates there
>>is something to read).
>
>This doesn't always work. If you have a non-blocking socket that
>hasn't completed yet, this method will fail on some systems. If it
>did, my life would have been much easier.
>

Hmmm... Non-blocking socket connect() fun... You may want to try using
the #1 technique only after the connection has been established.
Now, knowing when the non-blocking connect() completes is more tcp/ip implementation
specific fun (the first one returns error with errno=EINPROGRESS). One
way would be to implement a mechanism to re-perform the connect()
again after a couple of seconds (use a timer mechanism of some sort)
and when the subsequent connect() is successful, add the socket
descriptor to your read-mask (for #1). (I'm not sure how portable this is...).

There are other techniques that can be used for determining when the non-blocking
connect() completes but they don't work on all systems. For example, you
could set the pending connect socket descriptor in the select() write mask, and
when it indicates it write-able you should be able to the second
connect (or do a getpeername() and if you get the remote address your
connected) and it should complete. This doesn't work on all system (ie. SunOS).

Todd Sandor


--
Todd Sandor Newbridge Networks: Kanata, Ontario Canada
to...@newbridge.com (613)591-3600 ext. 1011

Warner Losh

unread,
Oct 6, 1995, 3:00:00 AM10/6/95
to
In article <451kqf$s...@kannews.ca.newbridge.com>,

Todd Sandor <to...@Newbridge.COM> wrote:
>Actually, not entirely true under Solaris 5.x where if you
>use signal(), the default signal handler is re-enabled after you get
>the first signal. Thus with Solaris 5.x if you use signal(SIGPIPE, SIG_IGN)
>the second time your process gets the SIGPIPE signal it will exit.
>You must use sigset() under Solaris 5.x to get the same behavior as
>signal() under SunOS 4.1.x....

Are you sure about this? Other SYS V.2 boxes don't seem to do this,
and I could have sworn that I have done this on Solaris w/o any
problems... Actually, solaris is the only OS that we've encountered
that gives SIGPIPE when you redo a conenct() to a nonblocking connect,
so anything wouldn't surprise me. My Solaris box is down at the
momement, so I can't test this one way or the other :-(.

>Hmmm... Non-blocking socket connect() fun... You may want to try
>using the #1 technique only after the connection has been
>established. Now, knowing when the non-blocking connect() completes
>is more tcp/ip implementation specific fun (the first one returns
>error with errno=EINPROGRESS). One way would be to implement a
>mechanism to re-perform the connect() again after a couple of seconds
>(use a timer mechanism of some sort) and when the subsequent
>connect() is successful, add the socket descriptor to your read-mask
>(for #1). (I'm not sure how portable this is...).

This is *VERY* portable. It is about the most portible way of doing
things. You may need to substitute poll for select on some systems,
but I've not come across those. It works on SunOS 4.x, 5.x, FreeBSD
1.x, 2.x, BSDi 1.x, 2.x, AIX 3.2.5, HP/UX 9.05, OSF/1 3.0, 3.2, Linux,
Ultrix 4.3, Unixware 1.1.4 (Sys V.2 Unix on Intel), IRIX 4.x, 5.x,
6.x, and OpenVMS (!) AXP 6.2 (and maybe a couple that I've neglected
to recall).

>There are other techniques that can be used for determining when the
>non-blocking connect() completes but they don't work on all systems.
>For example, you could set the pending connect socket descriptor in
>the select() write mask, and when it indicates it write-able you
>should be able to the second connect (or do a getpeername() and if
>you get the remote address your connected) and it should complete.
>This doesn't work on all system (ie. SunOS).

There are severe problems with this technique, the least of which is
portibility. This sort of code will compile and make you think it is
working, but will in fact fail to run correctly. It be best to be
avoided unless you need to know why the connect failed (fortunately I
don't, so I use the more portible technique). There is also a
variation on this technique that has you call getsockopt and extract
SO_ERROR to see why things died or not, but that too has its problems.

0 new messages