close_wait states

Robin Tuttle

unread,

Nov 17, 2000, 3:00:00 AM11/17/00

to

Folks,

I am having a problem with CLOSE_WAIT states lingering on my Tru64
Unix v4.0F system. We are running Oracle Application Server v8.0.5.
This is causing a problem with all of the sockets being uses, so that
noone can connect to the Web server. What specific Unix parameters
should I be looking at. Any recommendations?

Many thanks in advance!

Robin Tuttle
University of New Hampshire

robin....@unh.edu

Barry Margolin

unread,

Nov 17, 2000, 3:00:00 AM11/17/00

to

In article <3a15574d...@news.unh.edu>,

Robin Tuttle <Robin....@unh.edu> wrote:
>I am having a problem with CLOSE_WAIT states lingering on my Tru64
>Unix v4.0F system. We are running Oracle Application Server v8.0.5.
>This is causing a problem with all of the sockets being uses, so that
>noone can connect to the Web server. What specific Unix parameters
>should I be looking at. Any recommendations?

This has nothing to do with Unix parameters. CLOSE_WAIT means that the
client has closed its end of the connection, but the server hasn't yet
closed its end. If these persist for a long time, it's a bug in the server
software.

In the case of a database, one way that these could occur without requiring
a bug is if a client submits a query that takes a long time to complete,
and he gets impatient waiting for a response so he cancels it. That could
close the connection, but the server won't notice this until it finishes
processing the query and tries to send the results back. But if you're
getting lots of connections like this, it can't be explained by a handful
of aborted queries.

--
Barry Margolin, bar...@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

netch

unread,

Nov 18, 2000, 3:00:00 AM11/18/00

to

Hi!

>This has nothing to do with Unix parameters. CLOSE_WAIT means that the
>client has closed its end of the connection, but the server hasn't yet
>closed its end. If these persist for a long time, it's a bug in the server
>software.

Does it really? I worked with a cache-machine running linux 2.2 with 4000
clients accessing it, I also reached the maximum open sockets so I changed
some of the /proc/sys/net/ipv4/ parameters so TIME_WAIT timeout was lower.
I also raised the TCP_MAX_SYN_BACKLOG .. these changed made some radical
performance changes. But I have never worked with TRUE64, but I could be
wrong, maybe this doesn't affect the CLOSE_WAIT state of TCP.

Mvh
Martin Svensson
System Administrator Netch Technologies AB
Tel: +46-(0)46-2724046
Email: martin....@netch.se
------------Your-planning-failures-are-not-my-emergencies---------------

Jefferson Ogata

unread,

Nov 18, 2000, 3:00:00 AM11/18/00

to

netch wrote:
>
> Hi!
>
[ you were quoting Barry Margolin here; please retain attributions. ]

> >This has nothing to do with Unix parameters. CLOSE_WAIT means that the
> >client has closed its end of the connection, but the server hasn't yet
> >closed its end. If these persist for a long time, it's a bug in the server
> >software.
>
> Does it really? I worked with a cache-machine running linux 2.2 with 4000
> clients accessing it, I also reached the maximum open sockets so I changed
> some of the /proc/sys/net/ipv4/ parameters so TIME_WAIT timeout was lower.
> I also raised the TCP_MAX_SYN_BACKLOG .. these changed made some radical
> performance changes. But I have never worked with TRUE64, but I could be
> wrong, maybe this doesn't affect the CLOSE_WAIT state of TCP.

CLOSE_WAIT is a completely stable state, and properly shouldn't be considered a
bug, nor should it have a timeout. It simply means the incoming data stream has
been shut down, while the outgoing data stream remains open. FIN_WAIT_1 is the
corresponding stable state for the other end of the connection.

That said, it *usually* does imply a bug somewhere, since there are few
services that operate in this mode. But don't assume just from the fact that a
socket is in CLOSE_WAIT or FIN_WAIT_1 that something is wrong. E.g. a logging
service might accept TCP connections and shut down its outgoing stream, leaving
the client's socket in a perpetual CLOSE_WAIT state; the client may continue to
transmit data to the server in this state indefinitely.

--
Jefferson Ogata : Internetworker, Antibozo
<og...@antibozo-u-spam-u-die.net> http://www.antibozo.net/ogata/
whois: jo...@whois.networksolutions.com

Jefferson Ogata

unread,

Nov 18, 2000, 3:00:00 AM11/18/00

to

Robin Tuttle wrote:
>
> Folks,

>
> I am having a problem with CLOSE_WAIT states lingering on my Tru64
> Unix v4.0F system. We are running Oracle Application Server v8.0.5.
> This is causing a problem with all of the sockets being uses, so that
> noone can connect to the Web server. What specific Unix parameters
> should I be looking at. Any recommendations?

Is there a packet-filtering device between the OAS machine and the clients
named in the remote address of the sockets left in CLOSE_WAIT? If so, it may be
dropping packets critical to the socket shutdown negotiation.

If you run netstat on the client machine, what state is the corresponding entry
left in? Is there anything in the SendQ or RecvQ for the socket at either end?

Barry Margolin

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

In article <8v6o5u$6ot$1...@bob.news.rcn.net>,

Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
>That said, it *usually* does imply a bug somewhere, since there are few
>services that operate in this mode.

CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're part of
the normal scheme of things, and there's nothing that automatically cleans
them up, but if a process leaves lots of them around, it probably indicates
a bug in the process. And in both cases, killing the process that spawned
them will cause them to go away.

Barry Margolin

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

In article <8v6pi0$egs$1...@bob.news.rcn.net>,

Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
>Robin Tuttle wrote:
>>
>> Folks,
>>
>> I am having a problem with CLOSE_WAIT states lingering on my Tru64
>> Unix v4.0F system. We are running Oracle Application Server v8.0.5.
>> This is causing a problem with all of the sockets being uses, so that
>> noone can connect to the Web server. What specific Unix parameters
>> should I be looking at. Any recommendations?
>
>Is there a packet-filtering device between the OAS machine and the clients
>named in the remote address of the sockets left in CLOSE_WAIT? If so, it may be
>dropping packets critical to the socket shutdown negotiation.

I don't think so. The process goes into CLOSE_WAIT state as a result of
receiving a FIN segment from the remote machine, so obviously that FIN
wasn't dropped. When the process calls close(), the socket changes from
CLOSE_WAIT to LAST_ACK state, and a FIN is sent. If the FIN or the
corresponding ACK is dropped the socket will hang in this state, not
CLOSE_WAIT.

Jefferson Ogata

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Barry Margolin wrote:
>
> In article <8v6o5u$6ot$1...@bob.news.rcn.net>,

> Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
> >That said, it *usually* does imply a bug somewhere, since there are few
> >services that operate in this mode.
>
> CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're part of
> the normal scheme of things, and there's nothing that automatically cleans
> them up, but if a process leaves lots of them around, it probably indicates
> a bug in the process. And in both cases, killing the process that spawned
> them will cause them to go away.

Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with data in
the SendQ hang around indefinitely after the associated process is killed,
thereby preventing the service in question from being restarted by hanging onto
the local port number. So killing the process isn't always sufficient to get
rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be at all
surprised.

Jefferson Ogata

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Barry Margolin wrote:
> In article <8v6pi0$egs$1...@bob.news.rcn.net>,

> Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
> >Robin Tuttle wrote:
> >>
> >> Folks,
> >>
> >> I am having a problem with CLOSE_WAIT states lingering on my Tru64
> >> Unix v4.0F system. We are running Oracle Application Server v8.0.5.
> >> This is causing a problem with all of the sockets being uses, so that
> >> noone can connect to the Web server. What specific Unix parameters
> >> should I be looking at. Any recommendations?
> >
> >Is there a packet-filtering device between the OAS machine and the clients
> >named in the remote address of the sockets left in CLOSE_WAIT? If so, it may be
> >dropping packets critical to the socket shutdown negotiation.
>
> I don't think so. The process goes into CLOSE_WAIT state as a result of
> receiving a FIN segment from the remote machine, so obviously that FIN
> wasn't dropped. When the process calls close(), the socket changes from
> CLOSE_WAIT to LAST_ACK state, and a FIN is sent. If the FIN or the
> corresponding ACK is dropped the socket will hang in this state, not
> CLOSE_WAIT.

Consider the following scenario:

Client is coming through a stateful packet filter that has an idle timeout, or
a timeout on CLOSE_WAIT and FIN_WAIT states. Client connects to OAS and
transmits a query, then shuts down its write side. Server is now in CLOSE_WAIT.
OAS goes off for a few minutes cranking away on the query, during which time
the packet filter's timeout kicks off and drops the connection from its state
table. The server finishes the query and proceeds to transmit a full window of
response to the client. This forces it to block waiting for acknowledgements
that will never come, because the packet filter is now dropping the server's
traffic. Thus, the server is stuck in CLOSE_WAIT and unable to reach the code
that calls close() or shutdown().

This is why I asked the OP to check the SendQ.

Jefferson Ogata

unread,

Nov 21, 2000, 12:19:22 AM11/21/00

to

Posted this and the one that follows about 2.5 hours ago but they never showed
up. Must be news server trouble. Anyway...

Barry Margolin wrote:
>
> In article <8v6o5u$6ot$1...@bob.news.rcn.net>,

> Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
> >That said, it *usually* does imply a bug somewhere, since there are few
> >services that operate in this mode.
>
> CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're part of
> the normal scheme of things, and there's nothing that automatically cleans
> them up, but if a process leaves lots of them around, it probably indicates
> a bug in the process. And in both cases, killing the process that spawned
> them will cause them to go away.

Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with data in
the SendQ hang around indefinitely after the associated process is killed,
thereby preventing the service in question from being restarted by hanging onto
the local port number. So killing the process isn't always sufficient to get
rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be at all
surprised.

--

Jefferson Ogata

unread,

Nov 21, 2000, 12:19:47 AM11/21/00

to

Barry Margolin wrote:
> In article <8v6pi0$egs$1...@bob.news.rcn.net>,

Consider the following scenario:

--

Andrew Moar

unread,

Nov 21, 2000, 1:25:13 AM11/21/00

to

In article <3a15574d...@news.unh.edu>,
Robin....@unh.edu (Robin Tuttle) writes:

> I am having a problem with CLOSE_WAIT states lingering on my Tru64
> Unix v4.0F system. We are running Oracle Application Server v8.0.5.
> This is causing a problem with all of the sockets being uses, so that
> noone can connect to the Web server. What specific Unix parameters
> should I be looking at. Any recommendations?

We're running a similar setup here and have had the same problems. We've
changed somaxconn and sominconn to 32767 (in the socket subsystem) and
tcp_keepalive_default=1 and tcp_keepidle=1200 in the inet subsystem. All
these can be changed via dxkerneltuner. See the online docs (via
www.tru64.org) for more information.

Regards,

Andrew
--
Andrew Moar : Ph +61 3 9479 1505 email A.M...@latrobe.edu.au
Unix Environment Specialist, Information Technology Services
La Trobe University, Bundoora

Anthony W. Youngman

unread,

Nov 21, 2000, 3:00:00 AM11/21/00

to

I'm sure I've seen terminal servers stuck in the CLOSE_WAIT state. Real
bummer if it's the printer port and none of the monkeys cares to let you
know for a day that the printer isn't working...

-----Original Message-----
From: Jefferson Ogata [mailto:og...@antibozo-u-spam-u-die.net]
Posted At: 21 November 2000 02:40
Posted To: admin
Conversation: close_wait states
Subject: Re: close_wait states

Barry Margolin wrote:
>
> In article <8v6o5u$6ot$1...@bob.news.rcn.net>,

> Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
> >That said, it *usually* does imply a bug somewhere, since there are
few
> >services that operate in this mode.
>
> CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're
part of
> the normal scheme of things, and there's nothing that automatically
cleans
> them up, but if a process leaves lots of them around, it probably
indicates
> a bug in the process. And in both cases, killing the process that
spawned
> them will cause them to go away.

Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with
data in
the SendQ hang around indefinitely after the associated process is
killed,
thereby preventing the service in question from being restarted by
hanging onto
the local port number. So killing the process isn't always sufficient to
get
rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be
at all
surprised.

--

Barry Margolin

unread,

Nov 21, 2000, 3:00:00 AM11/21/00

to

In article <8vcn9k$lcq$1...@bob.news.rcn.net>,

Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
>Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with data in
>the SendQ hang around indefinitely after the associated process is killed,
>thereby preventing the service in question from being restarted by hanging onto
>the local port number. So killing the process isn't always sufficient to get
>rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be at all
>surprised.

It actually makes sense. If the send window is closed, the socket has to
stick around so it can keep retransmitting the data.