Getting out of FIN-WAIT-2

Phil Earnhardt

unread,

Jan 15, 1992, 7:31:45 PM1/15/92

to

How does a TCP host get a connection out of the FIN-WAIT-2 state, assuming
that the remote endpoint has gone away?

(where "gone away" means the remote end is a DOS box that crashed and
has not been rebooted.)

Phil Earnhardt p...@netwise.com
Netwise, Inc. Boulder, CO (303) 442-8280

Phil Karn

unread,

Jan 15, 1992, 8:48:49 PM1/15/92

to

In article <1992Jan16.0...@netwise.com>, p...@netwise.com (Phil Earnhardt) writes:
|> How does a TCP host get a connection out of the FIN-WAIT-2 state, assuming
|> that the remote endpoint has gone away?
|>
|> (where "gone away" means the remote end is a DOS box that crashed and
|> has not been rebooted.)

You can't -- not without risking the clearing of a connection that
appears to be dead, but is still alive because of a temporary network
outage. FIN-WAIT-2 is just like ESTABLISHED state, except that the
connection is only half open. Both are stable states, and they should
be allowed to exist indefinitely (unless cleared by a RST from a host
that has since rebooted).

Something tells me that the "keepalive wars" are about to start again...

Phil

John R. Lyman

unread,

Jan 16, 1992, 9:33:14 AM1/16/92

to

This is a very real problem. There are a couple approaches that I've used
in the past. Telnet can avoid this problem by sending the NOP command
whenever the connection has been idle in both directions for 10 minutes
(or whatever you feel is appropriate). Unfortunatly, Telnet is only one
of many ULPs above TCP! In practice the only other ULP that seems to
cause any problem is FTP. I was never able to come up with a scheme
where FTP could keepalive connections to avoid the FIN-WAIT-2 problem.

This leaves us with two choices (IMHO), a straight timeout approach (which
is not allowed by the standard, but might still be used), and of course,
the keepalive. IMHO keepalives are great when used correctly, but lets
not start the wars (again).

The bottom line is that PC (and workstation) users turn off or reboot
their machines without closing their connections. This can cause host
resources to be lost, until the host is rebooted. There needs to be
a way to timeout TCP connections, and these are the only ways I've
been able to come up with.

------------------------------------------------------------------------
John R. Lyman III Network System Corp.

don provan

unread,

Jan 16, 1992, 6:40:10 PM1/16/92

to

In article <1992Jan16....@ns.network.com> ly...@anduril.network.com (John R. Lyman) writes:
>This is a very real problem. There are a couple approaches that I've used
>in the past. Telnet can avoid this problem by sending the NOP command

>whenever the connection has been idle in both directions for 10 minutes...

Oops! Can't send a NOP in FIN-WAIT-2: you've already closed the
connection for output.

Applications which don't want to waste resources for extended periods are
free to timeout an idle connection even if it's confirmed up and
functional. Heck, an application could even abort an active connection if
it felt it had something better to do with the resources. The treatment
of FIN-WAIT-2 connections is just a special case: even applications that
don't care how long an ESTABLISHED connection is idle may want to get
impatient with idle FIN-WAIT-2 connections and blow them away, since
there's no way to confirm from the application level whether they're still
valid. That all seems real obvious from the application's point of view.

The real problem, of course, is that in the BSD socket interface,
FIN-WAIT-2 connections are typically not attached to the application any
longer. Normally a socket application issues a close and forgets about
the connection and its handle. Only later will the connection get into
FIN-WAIT-2. In this case, an application level solution isn't possible.
Perhaps performing a special "FIN keepalive" might be a good idea.

>I was never able to come up with a scheme
>where FTP could keepalive connections to avoid the FIN-WAIT-2 problem.

Since the FTP command connection runs over TELNET, theoretically you
should be able to send the same NOP. I don't know how many FTP client
programs would choke on it, though....
don provan
do...@novell.com

Robert Elz

unread,

Jan 17, 1992, 6:17:45 AM1/17/92

to

ka...@qualcom.qualcomm.com (Phil Karn) writes:

>Something tells me that the "keepalive wars" are about to start again...

If they do, could contributers possibly separate the issue from the
mechanism?

Debating whether the way BSD times out connections is right or not
has its uses, but isn't what is mostly at issue.

The real debate is between users of connections that can break without
the end points dying, who want connections to be able to stay alive,
pending, through potentially very long network outages, and those more
concerned with resource utilization in (usually server) hosts, who
prefer to get rid of connections that are useless as soon as possible,
and surmise that most connections with no reachable other end are
likely to never return, so want to get rid of those.

Note: it really makes no difference at all whether the implementation
of "dead connection detection" is done in the applications, or in the
kernel, or in the host administrator's fingers (typing commands to
delete processes with no other end), or anything else - if connections
that look useless for an extended period are deleted the same result
is achieved. Its also mostly irrelevant what the definition of "useless"
is, or that is, it is unless you have decided that killing useless
connections are OK, then you need to decide which ones in particular
to go after (ftp connections not transferring data that have been idle
15 minutes might be useless to one person, while to another they're
harmless, but mailer processes that haven't received an SMTP command
for 5 minutes are a disaster).

kre

David L Stevens

unread,

Jan 17, 1992, 8:58:39 AM1/17/92

to

To be in FIN_WAIT_2, you have to have done a close(), which for most
user interfaces means you no longer have a reference to the connection at the
application level. If with a "half close," you shouldn't be sending any more
data after sending a FIN. So you don't have any way of sending a TELNET no-op.
I think the "problem" (quoted, because I hate keep-alives) you're
trying to solve is when an intermediate gateway goes down.

The problem with FIN_WAIT_2 is when your side does a close(), and
gets the ACK for your FIN, but the other side crashes before closing. If you
put a timer on it, you lose, because it's perfectly legitimate for the other
side to do some arbitrary computation (and send more data, if you did a half-
close!) before doing his close.
But if you don't put a timer and the other side crashes, you hang
(literally, in the protocol sense) forever. You have no unacked data and
no timer running to make you send anything again.

Most user interfaces won't allow you to do anything at the application
level for this, because you don't have a writable file descriptor at that
point.
You can fix it with "keep-alives" from within TCP, but that has the
(bogus, IMHO) side-effect of forcing idle connections to go away. If I'm not
using a connection at the time some intermediate gateway crashes, why should
I lose all my state (eg, login session), when, without keep-alives, I can resume
where I left off as long as both ends remain up and the intermediate gateway
that crashed has recovered? That's one of the great advantages of
packet-switching.
Maybe implementations should allow system administrators to force
keep-alives on particular connections if it appears they're hung. I hate any
fixed-timer solution, myself.
--
+-DLS (d...@mentor.cc.purdue.edu)

Johnny Eriksson

unread,

Jan 17, 1992, 7:01:38 AM1/17/92

to

ka...@chicago.qualcomm.com writes:

! You can't -- not without risking the clearing of a connection that
! appears to be dead, but is still alive because of a temporary network
! outage. FIN-WAIT-2 is just like ESTABLISHED state, except that the
! connection is only half open. Both are stable states, and they should
! be allowed to exist indefinitely (unless cleared by a RST from a host
! that has since rebooted).
!
! Something tells me that the "keepalive wars" are about to start again...

The correct term for it ought to be "killalive", since that is what
it does...

! Phil

--Johnny

"When in doubt -- hesitate!"

Phil Karn

unread,

Jan 18, 1992, 6:40:07 PM1/18/92

to

In article <1992Jan16....@ns.network.com> ly...@anduril.network.com (John R. Lyman) writes:

>This leaves us with two choices (IMHO), a straight timeout approach (which
>is not allowed by the standard, but might still be used), and of course,
>the keepalive. IMHO keepalives are great when used correctly, but lets
>not start the wars (again).

There's a third choice: ignore the problem. Are modern timesharing
hosts so memory starved that they cannot tolerate even a few orphaned
user tasks?

I have a dialup SLIP link from my house. I frequently stay logged in
while puttering elsewhere around the house, so the link often stays up
for hours at a time (it's a local phone call). But recently I bought a
FAX machine that shares my data line, so I'm thinking about having my
home gateway dial the SLIP link on demand, much as the Telebit
Netblazer does. That would keep the line clear for faxes or other
possible uses.

But keepalives would break this scheme. Chances are they would just
keep my SLIP link dialed up unnecessarily. But if the gateway did in
fact drop the link, keepalives from hosts on the other end would
gratuitously abort my TCP connections because I don't want the
Netblazer at work to call me. I prefer to place all of the calls from
my end, so I can be assured of having control over the line. Also,
incoming data calls would confuse my FAX machine.

As long as I don't have to contend with keepalives, this scheme will
work great. I stop typing long enough from home, and the gateway drops
the line. I start typing again, and the gateway redials the line; my
own TCP is patient enough to wait for this to happen. Since the UNIX
hosts I telnet to generally don't generate traffic on their own (I
don't run biff) they'll never know the difference.

The Internet is no longer a collection of dedicated point-to-point
links. It now includes links that are established on demand, and links
that come and go for other reasons (radio users move, routers crash
and reboot, routes flap, etc). The architecture should not include
gratuitous features that prevent it from accomodating these unreliable
and/or non-traditional network paths.

>The bottom line is that PC (and workstation) users turn off or reboot
>their machines without closing their connections. This can cause host
>resources to be lost, until the host is rebooted. There needs to be
>a way to timeout TCP connections, and these are the only ways I've
>been able to come up with.

If you really *are* resource-starved on your host, then you might time
out idle users ONLY AS NEEDED to free them up for new users. But if an
occupied resource is not immediately needed for reuse, why
gratuitously free it by bumping an idle user?

Last I heard, 1-megabyte SIMMs were down to $36...

Phil

is otherwise idle, wh
(last I heard,
memory was down to $36/megab

Phil Earnhardt

unread,

Jan 18, 1992, 9:35:27 PM1/18/92

to

In article <1992Jan18....@qualcomm.com> ka...@chicago.qualcomm.com (Phil Karn) writes:
>In article <1992Jan16....@ns.network.com> ly...@anduril.network.com (John R. Lyman) writes:
>>This leaves us with two choices (IMHO), a straight timeout approach (which
>>is not allowed by the standard, but might still be used), and of course,
>>the keepalive. IMHO keepalives are great when used correctly, but lets
>>not start the wars (again).
>
>There's a third choice: ignore the problem. Are modern timesharing
>hosts so memory starved that they cannot tolerate even a few orphaned
>user tasks?

Maybe I should have said more at the start. The problem behind the problem...

The problem isn't FIN-WAIT-2 sockets staying around forever. The problem is
that I can't re-register my listener on the port as long as there are
connections associated with that port hanging around in the FIN-WAIT-2 state.

I'm running on a Sys V box that doesn't let me do the setsockopt() magic.

Hope I'm not going to re-open a second religious war in one week...

Richard Conto

unread,

Jan 20, 1992, 3:51:46 PM1/20/92

to

In article <1992Jan16....@ns.network.com>, ly...@anduril.network.com (John R. Lyman) writes:
|> In article <1992Jan16.0...@qualcomm.com> ka...@chicago.qualcomm.com writes:
|> >In article <1992Jan16.0...@netwise.com>, p...@netwise.com (Phil Earnhardt) writes:
|> >|> How does a TCP host get a connection out of the FIN-WAIT-2 state, assuming
|> >|> that the remote endpoint has gone away?
|> >|>
|> >|> (where "gone away" means the remote end is a DOS box that crashed and
|> >|> has not been rebooted.)
|> >
|> >You can't -- not without risking the clearing of a connection that
|> >appears to be dead, but is still alive because of a temporary network
|> >outage. FIN-WAIT-2 is just like ESTABLISHED state, except that the
|> >connection is only half open. Both are stable states, and they should
|> >be allowed to exist indefinitely (unless cleared by a RST from a host
|> >that has since rebooted).

...
|> >Phil
|>
|> This is a very real problem. There are a couple approaches that I've used
|> in the past. Telnet can avoid this problem by sending the NOP command
|> whenever the connection has been idle in both directions for 10 minutes
|> (or whatever you feel is appropriate). Unfortunatly, Telnet is only one
|> of many ULPs above TCP! In practice the only other ULP that seems to
|> cause any problem is FTP. I was never able to come up with a scheme
|> where FTP could keepalive connections to avoid the FIN-WAIT-2 problem.

What is the real problem? What resources are being consumed by a TCP connection
being in FIN-WAIT-2? Memory? TCP port numbers? You must want to be in FIN-WAIT-2,
it doesn't happen just because a gateway or a host somewhere crashes.

|> The bottom line is that PC (and workstation) users turn off or reboot
|> their machines without closing their connections. This can cause host
|> resources to be lost, until the host is rebooted. There needs to be
|> a way to timeout TCP connections, and these are the only ways I've
|> been able to come up with.

As others have said, the only time a TCP connection would be in FIN-WAIT-2
would be after the host had decided to tear down the connection and deallocate
reasources. Determing when to tear down the connection is not the issue here
(and should be application dependant in my opinion.)

If the "close()"-equivalent blocks until the TCP connection is torn down,
then perhaps the host application should release any other resources before
the call to "close()". After all, what would recovering from a "close()"
error entail anyway?

In short, I'm arguing that the resources consumed by a TCP connection in
FIN-WAIT-2 should be negligable. Even if a whole campus of PCs go offline
because of a power-hit, a properly designed application shouldn't be consuming
excessive resources once it's tried to close the connections.

--- Richard Conto Merit Computer Network
r...@merit.edu MichNet Engineering and Developement

peter da silva

unread,

Jan 21, 1992, 11:59:25 AM1/21/92

to

In article <1992Jan20....@terminator.cc.umich.edu> r...@merit.edu (Richard Conto) writes:
> In short, I'm arguing that the resources consumed by a TCP connection in
> FIN-WAIT-2 should be negligable. Even if a whole campus of PCs go offline
> because of a power-hit, a properly designed application shouldn't be consuming
> excessive resources once it's tried to close the connections.

Well, if a properly designed application does a chdir() to /, and closes all
other file descriptors, before closing the port so it's not holding a mounted
file system busy... yeh. It should also background itself so that it doesn't
leave you unable to get to the shell, and close stdin, stdout, and stderr so
it doesn't keep DTR up on the serial port...

In practice shutting down a running process so it doesn't cause *any* loss of
capability for the users of a system isn't quite as trivial as that.
--
-- Peter da Silva, Ferranti International Controls Corporation
-- Sugar Land, TX 77487-5012; +1 713 274 5180
-- "Have you hugged your wolf today?"

Bill Quigley

unread,

Jan 22, 1992, 11:55:18 PM1/22/92

to

In article <1992Jan18....@qualcomm.com> ka...@chicago.qualcomm.com (Phil Karn) writes:

I'm confused. I thought that, from the FIN-WAIT-2 state, it was
impossible to get back to ESTABLISHED. So a timer started in
FIN-WAIT-2 can only drop a connection that couldn't communicate
anyway, right? It sounds like Phil is talking about timing out
established connections, while I think the original problem is timing
out connections in fin-wait-2 that never receive a fin.

What are keepalives, and how do they work, anyway?

--
Flower Cake - Trends in food come and go, but
the popularity of cookies never wanes - easy to make, easy to eat.
Bill Quigley
Amdahl - UTS Communications Product Support
w...@charon.amdahl.com

Adnan Yaqub

unread,

Jan 24, 1992, 3:34:58 AM1/24/92

to

In article <e7a702K...@amdahl.uts.amdahl.com> w...@uts.amdahl.com (Bill Quigley) writes:
I'm confused. I thought that, from the FIN-WAIT-2 state, it was
impossible to get back to ESTABLISHED. So a timer started in
FIN-WAIT-2 can only drop a connection that couldn't communicate
anyway, right? It sounds like Phil is talking about timing out
established connections, while I think the original problem is timing
out connections in fin-wait-2 that never receive a fin.

What are keepalives, and how do they work, anyway?

The basic idea behind keepalives is to send a message with a sequenece
number equal to SND.UNA-1 to force the remote peer to reply. You do
this if there has been no activity on a connection for awhile. If,
after a time no reply is received you drop the connection.
--
Adnan Yaqub (ad...@sgtech.uucp)
Star Gate Technologies
29300 Aurora Rd, Solon, OH, 44139, USA, +1 216 349 1860

Phil Karn

unread,

Jan 27, 1992, 5:27:59 PM1/27/92

to

In article <e7a702K...@amdahl.uts.amdahl.com>, w...@uts.amdahl.com (Bill Quigley) writes:
|> I'm confused. I thought that, from the FIN-WAIT-2 state, it was
|> impossible to get back to ESTABLISHED. So a timer started in
|> FIN-WAIT-2 can only drop a connection that couldn't communicate
|> anyway, right? It sounds like Phil is talking about timing out
|> established connections, while I think the original problem is timing
|> out connections in fin-wait-2 that never receive a fin.

You're correct in that you can't get back to ESTABLISHED from
FIN-WAIT-2, but it is NOT true that you can't communicate on a TCP
connection in the latter state -- you could continue receiving data
indefinitely. You may be confusing TCP's functionality with its most
common user interface, the close() call in UNIX, which closes both
directions of a socket simultaneously. But nothing *requires* you to
do this.

In fact BSD UNIX derivatives provide a shutdown() call that maps very
nicely into TCP's half-close operations. If you say shutdown(s,1),
you'll send a FIN on the connection but it will remain open
indefinitely (in FIN-WAIT-2 state) for reading.

Phil

Mark Boolootian

unread,

Jan 29, 1992, 4:03:23 PM1/29/92

to

In article <1992Jan27.2...@qualcomm.com> ka...@chicago.qualcomm.com writes:
>
>In fact BSD UNIX derivatives provide a shutdown() call that maps very
>nicely into TCP's half-close operations. If you say shutdown(s,1),
>you'll send a FIN on the connection but it will remain open
>indefinitely (in FIN-WAIT-2 state) for reading.
>

Although shutdown() does allow for the closing of one side of a connection, I
was under the impression that once shutdown() was called, you could no longer
use read() on the file descriptor, thus making shutdown() without use. Am I
right?

mb
---

--
Mark Boolootian boo...@llnl.gov +1 510-423-1948

Roy Smith

unread,

Jan 29, 1992, 12:00:19 PM1/29/92

to

ka...@chicago.qualcomm.com writes:
> In fact BSD UNIX derivatives provide a shutdown() call that maps very
> nicely into TCP's half-close operations. If you say shutdown(s,1),
> you'll send a FIN on the connection but it will remain open
> indefinitely (in FIN-WAIT-2 state) for reading.

OK, but why would you want to? What do you gain by doing a
half-close that you wouldn't by closing both sides of the connection? Do
you save any significant network bandwidth, or host resources, or get
improved performance or reliability, or what? In other words, is
deliberately generating a half-closed connection something the tcp
architects decided was important for some applications and thus made sure
the protocol supported, or is it just happenstance that you can do that?
Are there actually any applications that use shutdown() instead of close(),
or their equivalents on other systems?
--
r...@alanine.phri.nyu.edu (Roy Smith)
Public Health Research Institute
455 First Avenue, New York, NY 10016, USA
"Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"

Ian Dickinson

unread,

Jan 30, 1992, 10:25:25 AM1/30/92

to

In article <116...@lll-winken.LLNL.GOV> boo...@framsparc.ocf.llnl.gov (Mark Boolootian) writes:
>Although shutdown() does allow for the closing of one side of a connection, I
>was under the impression that once shutdown() was called, you could no longer
>use read() on the file descriptor, thus making shutdown() without use. Am I
>right?

It does mean you can no longer use read() on the descriptor, but I believe it
does mean that anything using read() on the other end will see an 'EOF' whilst
still allowing the other half of the connection to remain open.

Cheers,
--
\/ato if (!take(joke))
Ian Dickinson - NIC handle: ID17 em = fork();
va...@csv.warwick.ac.uk ...!mcsun!uknet!warwick!vato
@c=GB@o=University of Warwick@ou=Computing Services@cn=Ian Dickinson

Phil Karn

unread,

Jan 31, 1992, 3:59:53 PM1/31/92

to

In article <1992Jan29....@phri.nyu.edu>, r...@alanine.phri.nyu.edu (Roy Smith) writes:
|> OK, but why would you want to? What do you gain by doing a
|> half-close that you wouldn't by closing both sides of the connection? Do
|> you save any significant network bandwidth, or host resources, or get
|> improved performance or reliability, or what? In other words, is
|> deliberately generating a half-closed connection something the tcp
|> architects decided was important for some applications and thus made sure
|> the protocol supported, or is it just happenstance that you can do that?
|> Are there actually any applications that use shutdown() instead of close(),
|> or their equivalents on other systems?

Half-closed connections are a deliberate feature of TCP. There are
many cases where they're useful, especially in allowing applications
to close more gracefully than they might otherwise if a close simply
chopped both paths simultaneously. For example, a client can do the
logical equivalent of a "UNIX shell ^D" (i.e., send a FIN and go into
FIN-WAIT state) and the server on the other end can send a signoff
message with some reasonable assurance that it got back to the client
before it too closes its side of the connection. It's relatively simple
and quite elegant.

Phil