I am having a problem with CLOSE_WAIT states lingering on my Tru64
Unix v4.0F system. We are running Oracle Application Server v8.0.5.
This is causing a problem with all of the sockets being uses, so that
noone can connect to the Web server. What specific Unix parameters
should I be looking at. Any recommendations?
Many thanks in advance!
Robin Tuttle
University of New Hampshire
This has nothing to do with Unix parameters. CLOSE_WAIT means that the
client has closed its end of the connection, but the server hasn't yet
closed its end. If these persist for a long time, it's a bug in the server
software.
In the case of a database, one way that these could occur without requiring
a bug is if a client submits a query that takes a long time to complete,
and he gets impatient waiting for a response so he cancels it. That could
close the connection, but the server won't notice this until it finishes
processing the query and tries to send the results back. But if you're
getting lots of connections like this, it can't be explained by a handful
of aborted queries.
--
Barry Margolin, bar...@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
>This has nothing to do with Unix parameters. CLOSE_WAIT means that the
>client has closed its end of the connection, but the server hasn't yet
>closed its end. If these persist for a long time, it's a bug in the server
>software.
Does it really? I worked with a cache-machine running linux 2.2 with 4000
clients accessing it, I also reached the maximum open sockets so I changed
some of the /proc/sys/net/ipv4/ parameters so TIME_WAIT timeout was lower.
I also raised the TCP_MAX_SYN_BACKLOG .. these changed made some radical
performance changes. But I have never worked with TRUE64, but I could be
wrong, maybe this doesn't affect the CLOSE_WAIT state of TCP.
Mvh
Martin Svensson
System Administrator Netch Technologies AB
Tel: +46-(0)46-2724046
Email: martin....@netch.se
------------Your-planning-failures-are-not-my-emergencies---------------
CLOSE_WAIT is a completely stable state, and properly shouldn't be considered a
bug, nor should it have a timeout. It simply means the incoming data stream has
been shut down, while the outgoing data stream remains open. FIN_WAIT_1 is the
corresponding stable state for the other end of the connection.
That said, it *usually* does imply a bug somewhere, since there are few
services that operate in this mode. But don't assume just from the fact that a
socket is in CLOSE_WAIT or FIN_WAIT_1 that something is wrong. E.g. a logging
service might accept TCP connections and shut down its outgoing stream, leaving
the client's socket in a perpetual CLOSE_WAIT state; the client may continue to
transmit data to the server in this state indefinitely.
--
Jefferson Ogata : Internetworker, Antibozo
<og...@antibozo-u-spam-u-die.net> http://www.antibozo.net/ogata/
whois: jo...@whois.networksolutions.com
Is there a packet-filtering device between the OAS machine and the clients
named in the remote address of the sockets left in CLOSE_WAIT? If so, it may be
dropping packets critical to the socket shutdown negotiation.
If you run netstat on the client machine, what state is the corresponding entry
left in? Is there anything in the SendQ or RecvQ for the socket at either end?
CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're part of
the normal scheme of things, and there's nothing that automatically cleans
them up, but if a process leaves lots of them around, it probably indicates
a bug in the process. And in both cases, killing the process that spawned
them will cause them to go away.
I don't think so. The process goes into CLOSE_WAIT state as a result of
receiving a FIN segment from the remote machine, so obviously that FIN
wasn't dropped. When the process calls close(), the socket changes from
CLOSE_WAIT to LAST_ACK state, and a FIN is sent. If the FIN or the
corresponding ACK is dropped the socket will hang in this state, not
CLOSE_WAIT.
Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with data in
the SendQ hang around indefinitely after the associated process is killed,
thereby preventing the service in question from being restarted by hanging onto
the local port number. So killing the process isn't always sufficient to get
rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be at all
surprised.
Consider the following scenario:
Client is coming through a stateful packet filter that has an idle timeout, or
a timeout on CLOSE_WAIT and FIN_WAIT states. Client connects to OAS and
transmits a query, then shuts down its write side. Server is now in CLOSE_WAIT.
OAS goes off for a few minutes cranking away on the query, during which time
the packet filter's timeout kicks off and drops the connection from its state
table. The server finishes the query and proceeds to transmit a full window of
response to the client. This forces it to block waiting for acknowledgements
that will never come, because the packet filter is now dropping the server's
traffic. Thus, the server is stuck in CLOSE_WAIT and unable to reach the code
that calls close() or shutdown().
This is why I asked the OP to check the SendQ.
Barry Margolin wrote:
>
> In article <8v6o5u$6ot$1...@bob.news.rcn.net>,
> Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
> >That said, it *usually* does imply a bug somewhere, since there are few
> >services that operate in this mode.
>
> CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're part of
> the normal scheme of things, and there's nothing that automatically cleans
> them up, but if a process leaves lots of them around, it probably indicates
> a bug in the process. And in both cases, killing the process that spawned
> them will cause them to go away.
Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with data in
the SendQ hang around indefinitely after the associated process is killed,
thereby preventing the service in question from being restarted by hanging onto
the local port number. So killing the process isn't always sufficient to get
rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be at all
surprised.
--
Consider the following scenario:
--
> I am having a problem with CLOSE_WAIT states lingering on my Tru64
> Unix v4.0F system. We are running Oracle Application Server v8.0.5.
> This is causing a problem with all of the sockets being uses, so that
> noone can connect to the Web server. What specific Unix parameters
> should I be looking at. Any recommendations?
We're running a similar setup here and have had the same problems. We've
changed somaxconn and sominconn to 32767 (in the socket subsystem) and
tcp_keepalive_default=1 and tcp_keepidle=1200 in the inet subsystem. All
these can be changed via dxkerneltuner. See the online docs (via
www.tru64.org) for more information.
Regards,
Andrew
--
Andrew Moar : Ph +61 3 9479 1505 email A.M...@latrobe.edu.au
Unix Environment Specialist, Information Technology Services
La Trobe University, Bundoora
-----Original Message-----
From: Jefferson Ogata [mailto:og...@antibozo-u-spam-u-die.net]
Posted At: 21 November 2000 02:40
Posted To: admin
Conversation: close_wait states
Subject: Re: close_wait states
Barry Margolin wrote:
>
> In article <8v6o5u$6ot$1...@bob.news.rcn.net>,
> Jefferson Ogata <og...@antibozo-u-spam-u-die.net> wrote:
> >That said, it *usually* does imply a bug somewhere, since there are
few
> >services that operate in this mode.
>
> CLOSE_WAIT is to TCP sockets as <defunct> is to processes: they're
part of
> the normal scheme of things, and there's nothing that automatically
cleans
> them up, but if a process leaves lots of them around, it probably
indicates
> a bug in the process. And in both cases, killing the process that
spawned
> them will cause them to go away.
Well, on Linux 2.2 at least, I've seen sockets stuck in CLOSE_WAIT with
data in
the SendQ hang around indefinitely after the associated process is
killed,
thereby preventing the service in question from being restarted by
hanging onto
the local port number. So killing the process isn't always sufficient to
get
rid of them. I don't recall seeing this on Unix hosts, but I wouldn't be
at all
surprised.
--
It actually makes sense. If the send window is closed, the socket has to
stick around so it can keep retransmitting the data.