Sockets question - TIME_WAIT (FAQ 2.7 clarification request)

Kenneth Brody

unread,

Sep 5, 2001, 6:17:31 PM9/5/01

to

(Referring to FAQ section 2.7 -- Please explain the TIME_WAIT state.)

I have a daemon process that is currently in testing stages. Everything
is working, with the exception that the original socket that is opened,
bound, listened to, and accepted, remains in the TIME_WAIT state for a
while upon being closed.

I read section 2.7 of the FAQ and (I think) I have a basic understanding
of the TIME_WAIT state. However, the original socket is not connected to
anything else, so I don't understand why it remains in the TIME_WAIT
state upon being closed, as there is nothing at the other end that has
to acknowledge the close.

In this test phase, the daemon accepts a single connection, and the work
is done via the accepted socket. When the job is done, it shuts down the
accepted socket and closes it, which immediately disappears from netstat.
However, the original socket remains in TIME_WAIT after being closed.
Since it was never talking to another socket, I don't understand why the
TIME_WAIT state is necessary for this socket.

Am I correct in my interpretaion of the FAQ that basically "that's the way
it works... live with it"? The problem is that you need to wait a while
before restarting the daemon to re-test it. (bind fails with "address
in use".)

--

+---------+----------------------------------+-----------------------------+
| Kenneth | kenb...@bestweb.net | "The opinions expressed |
| J. | | herein are not necessarily |
| Brody | http://www.bestweb.net/~kenbrody | those of fP Technologies." |
+---------+----------------------------------+-----------------------------+
GCS (ver 3.12) d- s+++: a C++$(+++) ULAVHSC^++++$ P+>+++ L+(++) E-(---)
W++ N+ o+ K(---) w@ M@ V- PS++(+) PE@ Y+ PGP-(+) t+ R@ tv+() b+
DI+(++++) D---() G e* h---- r+++ y?

David Schwartz

unread,

Sep 5, 2001, 6:34:13 PM9/5/01

to

Kenneth Brody wrote:

> I have a daemon process that is currently in testing stages. Everything
> is working, with the exception that the original socket that is opened,
> bound, listened to, and accepted, remains in the TIME_WAIT state for a
> while upon being closed.

I'm 99.9% sure that's not the socket that's remaining in the TIME_WAIT
state, it's the one that was created in the 'accept' call.

> In this test phase, the daemon accepts a single connection, and the work
> is done via the accepted socket. When the job is done, it shuts down the
> accepted socket and closes it, which immediately disappears from netstat.
> However, the original socket remains in TIME_WAIT after being closed.

Are you sure you aren't confusing the two sockets? Can you post a
'netstat' output showing an unconnected socket in the TIME_WAIT state?
I'd love to see that.

DS

Barry Margolin

unread,

Sep 5, 2001, 6:47:08 PM9/5/01

to

In article <3B96A47B...@bestweb.net>,

Kenneth Brody <kenb...@bestweb.net> wrote:
>Am I correct in my interpretaion of the FAQ that basically "that's the way
>it works... live with it"? The problem is that you need to wait a while
>before restarting the daemon to re-test it. (bind fails with "address
>in use".)

Not if you set the SO_REUSEADDR socket option like you're supposed to.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Kenneth Brody

unread,

Sep 5, 2001, 11:13:46 PM9/5/01

to

David Schwartz wrote:
>
> Kenneth Brody wrote:
>
> > I have a daemon process that is currently in testing stages. Everything
> > is working, with the exception that the original socket that is opened,
> > bound, listened to, and accepted, remains in the TIME_WAIT state for a
> > while upon being closed.
>
> I'm 99.9% sure that's not the socket that's remaining in the TIME_WAIT
> state, it's the one that was created in the 'accept' call.

[...]

> Are you sure you aren't confusing the two sockets? Can you post a
> 'netstat' output showing an unconnected socket in the TIME_WAIT state?
> I'd love to see that.

I think you may be right...

When I do a "netstat -a" while connected, I see two sockets on the same
port:

tcp 0 0 fptunix.5000 216.179.4.117.1675 ESTABLISHED
tcp 0 0 *.5000 *.* LISTEN

Once I disconnect, I get:

tcp 0 0 fptunix.5000 216.179.4.117.1675 TIME_WAIT

For some reason, I always thought that the accept'ed socket would be
assigned a new port number. When I later saw port 5000 in TIME_WAIT,
I was thinking that it must be the original listen socket, not the
accepted socket.

Thanks for making me re-examine the data.

Kenneth Brody

unread,

Sep 5, 2001, 11:24:51 PM9/5/01

to

Barry Margolin wrote:
>
> In article <3B96A47B...@bestweb.net>,
> Kenneth Brody <kenb...@bestweb.net> wrote:
> >Am I correct in my interpretaion of the FAQ that basically "that's the way
> >it works... live with it"? The problem is that you need to wait a while
> >before restarting the daemon to re-test it. (bind fails with "address
> >in use".)
>
> Not if you set the SO_REUSEADDR socket option like you're supposed to.

Thanks. That did it.

(Perhaps the FAQ should have a "see also SO_REUSEADDR" under the TIME_WAIT
section?)

David Schwartz

unread,

Sep 5, 2001, 11:29:40 PM9/5/01

to

Kenneth Brody wrote:

> For some reason, I always thought that the accept'ed socket would be
> assigned a new port number. When I later saw port 5000 in TIME_WAIT,
> I was thinking that it must be the original listen socket, not the
> accepted socket.

Err, what? The port number of a TCP connection never changes -- if it
did it would be a different connection.

DS

Joerg Schmitz-Linneweber

unread,

Sep 6, 2001, 2:48:37 AM9/6/01

to

Hi!

Kenneth Brody wrote:
> Thanks. That did it.
>
> (Perhaps the FAQ should have a "see also SO_REUSEADDR" under the TIME_WAIT
> section?)

If I remember right, Stevens "UNIX Network Programming" is mentioning
SO_REUSEADDR _nearly *every* time_ the term TIME_WAIT is to be seen in the
text :-)

And BTW: You don't try to do socket programming without giving respect to
Stevens don't you? ;-)

Salut,
Jörg

--
Jörg Schmitz-Linneweber
mailto:schmitz-l...@aston-technologie.de
ASTON GmbH, Ruhrorter Straße 9, 46049 Oberhausen, Germany
Tel. +49 (208) 6201930, FAX +49 (208) 6201950,
http://www.aston-technologie.de

Barry Margolin

unread,

Sep 6, 2001, 11:07:17 AM9/6/01

to

In article <3B96EC83...@bestweb.net>,

Kenneth Brody <kenb...@bestweb.net> wrote:
>Barry Margolin wrote:
>>
>> In article <3B96A47B...@bestweb.net>,
>> Kenneth Brody <kenb...@bestweb.net> wrote:
>> >Am I correct in my interpretaion of the FAQ that basically "that's the way
>> >it works... live with it"? The problem is that you need to wait a while
>> >before restarting the daemon to re-test it. (bind fails with "address
>> >in use".)
>>
>> Not if you set the SO_REUSEADDR socket option like you're supposed to.
>
>Thanks. That did it.
>
>(Perhaps the FAQ should have a "see also SO_REUSEADDR" under the TIME_WAIT
>section?)

Probably a good idea, although if you search the FAQ for TIME_WAIT you'll
see it mentioned in the first sentence of the section on SO_REUSEADDR.

I don't know whether anyone is actually maintaining the Sockets FAQ these
days, as it hasn't been updated since 1998. But it can't hurt for you to
send the suggestion to the maintainer, v...@acm.org.

Kenneth Brody

unread,

Sep 6, 2001, 1:32:13 PM9/6/01

to

Obviously a misconception from years ago, which never got challenged until
now.

Kenneth Brody

unread,

Sep 6, 2001, 1:34:57 PM9/6/01

to

Joerg Schmitz-Linneweber wrote:
>
> Hi!
>
> Kenneth Brody wrote:
> > Thanks. That did it.
> >
> > (Perhaps the FAQ should have a "see also SO_REUSEADDR" under the TIME_WAIT
> > section?)
> If I remember right, Stevens "UNIX Network Programming" is mentioning
> SO_REUSEADDR _nearly *every* time_ the term TIME_WAIT is to be seen in the
> text :-)
>
> And BTW: You don't try to do socket programming without giving respect to
> Stevens don't you? ;-)

I have "Internetworking with TCP/IP" volumes II and III, by Comer and
Stevens, on my bookshelf. I haven't refered to them in quote some time.
Perhaps now that I'm doing "real" work with TCP/IP, I should crack them
open again. (I previously played around with them, writing a simple
chat server and the like. Now we're doing real daemon stuff.)

Kenneth Brody

unread,

Sep 6, 2001, 1:37:58 PM9/6/01

to

Barry Margolin wrote:
>
> In article <3B96EC83...@bestweb.net>,
> Kenneth Brody <kenb...@bestweb.net> wrote:

[...]

> >(Perhaps the FAQ should have a "see also SO_REUSEADDR" under the TIME_WAIT
> >section?)
>
> Probably a good idea, although if you search the FAQ for TIME_WAIT you'll
> see it mentioned in the first sentence of the section on SO_REUSEADDR.

That's what happens when you only read the answer to the question you're
looking for. ;-)

> I don't know whether anyone is actually maintaining the Sockets FAQ these
> days, as it hasn't been updated since 1998. But it can't hurt for you to
> send the suggestion to the maintainer, v...@acm.org.

Thanks for the tip.

David Schwartz

unread,

Sep 6, 2001, 4:55:33 PM9/6/01

to

Kenneth Brody wrote:

> > Err, what? The port number of a TCP connection never changes -- if it
> > did it would be a different connection.

> Obviously a misconception from years ago, which never got challenged until
> now.

It's a good example of one of those things that a lot of people carry
around in their heads that can't stand a second's scrutiny. There are
many of these, and as soon as someone points out to you that it makes no
sense, you realize it and wonder how you could ever have believed it.

Another good example is that clouds going up a mountain tend to produce
rain because the cooler air can't hold as much water. Umm, the air isn't
holding the water, they're a mixture of gasses and don't affect each
other in any way. The water vapor would behave exactly the same if the
air wasn't there at all.

Another is that electrons pick up energy in a battery, drop it off at a
lamp, and then return 'empty' to the battery. The electrons don't even
have to move around the circuit for the energy to flow, nor do the
electrons in one wire differ from the electrons in the other. A better
way to view it is a motor and a flywheel with friction with a belt going
around them.

DS

nos...@please.thankyou

unread,

Sep 6, 2001, 5:33:31 PM9/6/01

to

Kenneth Brody <kenb...@bestweb.net> writes:

> David Schwartz wrote:
> >
> > Kenneth Brody wrote:
> >
> > > For some reason, I always thought that the accept'ed socket would be
> > > assigned a new port number. When I later saw port 5000 in TIME_WAIT,
> > > I was thinking that it must be the original listen socket, not the
> > > accepted socket.
> >
> > Err, what? The port number of a TCP connection never
> > changes -- if it did it would be a different connection.
>
> Obviously a misconception from years ago, which never got challenged
> until now.

I'll bite, once a connection is established, how does the port number
(of either end) change?

A TCP connection is defined as protocol,local address, local port,
remote address, remote port. If one of those things changes, it's a
different connection.

nos...@please.thankyou

unread,

Sep 6, 2001, 5:37:39 PM9/6/01

to

nos...@please.thankyou writes:

> Kenneth Brody <kenb...@bestweb.net> writes:
>
> > David Schwartz wrote:
> > >
> > > Kenneth Brody wrote:
> > >
> > > > For some reason, I always thought that the accept'ed socket would be
> > > > assigned a new port number. When I later saw port 5000 in TIME_WAIT,
> > > > I was thinking that it must be the original listen socket, not the
> > > > accepted socket.
> > >
> > > Err, what? The port number of a TCP connection never
> > > changes -- if it did it would be a different connection.
> >
> > Obviously a misconception from years ago, which never got challenged
> > until now.
>
> I'll bite, once a connection is established, how does the port number
> (of either end) change?

Err, never mind. I just realized you were referring to what the OP
said, not what David said.

Barry Margolin

unread,

Sep 6, 2001, 5:49:50 PM9/6/01

to

In article <m3y9nso...@please.thankyou>, <nos...@please.thankyou> wrote:
>Kenneth Brody <kenb...@bestweb.net> writes:
>
>> David Schwartz wrote:
>> >
>> > Kenneth Brody wrote:
>> >
>> > > For some reason, I always thought that the accept'ed socket would be
>> > > assigned a new port number. When I later saw port 5000 in TIME_WAIT,
>> > > I was thinking that it must be the original listen socket, not the
>> > > accepted socket.
>> >
>> > Err, what? The port number of a TCP connection never
>> > changes -- if it did it would be a different connection.
>>
>> Obviously a misconception from years ago, which never got challenged
>> until now.
>
>I'll bite, once a connection is established, how does the port number
>(of either end) change?

Just so people know that the OP's misconception isn't totally outlandish,
there *are* network protocols that work that way. I think NCP, the
protocol used on the original Arpanet prior to TCP/IP, used one port (which
they called a "socket" in those days) just for making the initial
connection to a server, and part of the initial handshaking involved
telling the client a new socket to use from then on.

The TFTP protocol does something similar with UDP ports. The client sends
its initial request to the well-known TFTP port, and the response contains
a new port number to use to send the file. It's done this way because UDP
doesn't have connections that are identified by a pair of port numbers.

Christopher

unread,

Sep 19, 2001, 3:50:44 PM9/19/01

to

"Barry Margolin" <bar...@genuity.net> wrote in message
news:MXxl7.54$nH5.453@burlma1-snr2...

> In article <3B96A47B...@bestweb.net>,
> Kenneth Brody <kenb...@bestweb.net> wrote:
> >Am I correct in my interpretaion of the FAQ that basically "that's the
way
> >it works... live with it"? The problem is that you need to wait a while
> >before restarting the daemon to re-test it. (bind fails with "address
> >in use".)
>
> Not if you set the SO_REUSEADDR socket option like you're supposed to.
>

Different question concerning TIME_WAIT and CLOSE_WAIT. I've written a
small server program (on Linux 2.4) that spins one thread for each
connection - very simple. All the thread does is a recv(), send(), and then
close() (all non-blocking, with error checking). My problem isn't so much
with my program (I hope) than is with TCP.

I run my program and the following perl script:
---
#!/usr/bin/perl
my $ip = <my server ip address>
my $port = <my server port>
while(1) {
system "echo test | telnet $ip $port 1>/dev/null 2>&1"
}
---

When the program runs, I only get more than one connection thread running
when the server begins to fill up with CLOSE_WAIT or TIME_WAIT states.

As you may surmise, I'm flooding the server program with connections (a
DOS attempt). Well, the DOS works (not what I want!). The server buffer
fills completely and leaves itself stuck in TIME_WAIT/CLOSE_WAIT states
until I kill the process (or cause it to exit from buffer or 'too many files
open' errors). From what I've gathered from other sources, TCP is working
just as it was designed. Also, any solutions seem to require changes to the
client side of things (ie. making the client responsible for the TIME_WAIT
state or having the client send an RST to the server). This doesn't stop
malicious clients.

How can a server protect itself from bad clients? What are some
strategies for high traffic servers to prevent themselves from 'denying
service' because of how TCP 'locks up' sockets?

Luis Rojas G.

unread,

Sep 20, 2001, 7:47:08 PM9/20/01

to

Christopher <cwe...@adtel.com> escribió en el mensaje de noticias
oG6q7.3098$L47.1...@news0.telusplanet.net...

If you got sockets in CLOSE_WAIT state is because of your application, not
the client's.
If the client closes the socket (executing a "close(fd)"), its system sends
a FIN message, changing its socket state machine to FIN_WAIT1. Your system
receives the FIN message, acknowledges it, changing from ESTABLISHED to
CLOSE_WAIT. The remote (client) system changes then its socket state machine
to FIN_WAIT_2, waiting for your system to send a FIN message to close the
other side of the connection. This is acomplished executing the close(fd)
function by YOUR APPLICATION.

If the socket on your side is in CLOSE_WAIT state, it will be at the
client's side in FIN_WAIT_2.
The cause for this is that your application is not detecting the end of the
connection by the client.

Best regards.

>

Christopher

unread,

Sep 21, 2001, 12:28:23 PM9/21/01

to

Thank you for the reply, Luis.

"Luis Rojas G." <albi...@pobox.com> wrote in message
news:3baa...@news.ifxnw.cl...

>
>
> Christopher <cwe...@adtel.com> escribió en el mensaje de noticias
> oG6q7.3098$L47.1...@news0.telusplanet.net...
> >

I get the server sockets in CLOSE_WAIT when the server sends data to the
client, then closes. If a remove the code that sends the data (the server
only receives data), then the server follows through into TIME_WAIT after
close(). In the first case, however, there is no problem with sockets
staying in CLOSE_WAIT or TIME_WAIT until after the server has been flooding
with several hundred requests.

So I guess this discussion is two fold. One, I think the server chokes
because the 'client' connections are coming in too fast and too many for the
server to handle (?). Two, the sockets are being held in either FIN_WAIT_2,
CLOSE_WAIT or TIME_WAIT. The latter is TCP behaving like it should. I
understand this and realise I'll just have to deal with it in order to play
nice with the rest of the internet - But how does one deal with this? What
about clients that don't play nice?

I've read articles where clients send RST to the server (many disagree
with doing this). I've also read a newgroup message where somebody modifies
the Linux kernel to just bypass the TIME_WAIT state altogether (ACk!).
Fortunately, it was for a custom application not meant for the Internet:
http://groups.google.com/groups?hl=en&rnum=12&selm=_DSo5.3%241e.5564%40news.
pacbell.net

If my application is not detecting the end of the connection by the
client, how do I recover from that?

-
wedman

Rick Jones

unread,

Sep 21, 2001, 2:28:43 PM9/21/01

to

In comp.protocols.tcp-ip Christopher <cwe...@adtel.com> wrote:

> I get the server sockets in CLOSE_WAIT when the server sends data
> to the client, then closes. If a remove the code that sends the

Um, if the server calls close, its end of the connection should not be
in CLOSE_WAIT. If the server's end of the connection is in CLOSE_WAIT,
it implies that the server has not yet called close. It could also
mean that the close was "lost" I suppose.

> data (the server only receives data), then the server follows
> through into TIME_WAIT after close(). In the first case, however,

This implies that the server intiaited connection close before the
client. The first end of the TCP connection to initiate shutdown is
responsible for the required TIME_WAIT state.

> there is no problem with sockets staying in CLOSE_WAIT or TIME_WAIT
> until after the server has been flooding with several hundred
> requests.

CLOSE_WAIT endpoints could still have FD's being allocated in the
server. TIME_WAIT state should not. I suppose it is possible to have
an "attached" TIME_WAIT if the code called shutdown and not close.

> So I guess this discussion is two fold. One, I think the server
> chokes because the 'client' connections are coming in too fast and
> too many for the server to handle (?). Two, the sockets are being

If connections are arriving faster than the server can handle them, a
number of things would happen:

*) The server app would exhaust its available file descriptors.

*) the listen queue would fill and connection requests (SYN segments)
would start to be dropped.

> held in either FIN_WAIT_2, CLOSE_WAIT or TIME_WAIT. The latter is
> TCP behaving like it should. I understand this and realise I'll
> just have to deal with it in order to play nice with the rest of the
> internet - But how does one deal with this? What about clients that
> don't play nice?

CLOSE_WAIT is dealt with by having the application call
close. FIN_WAIT_2 is dealt with by having the application enable TCP
keepalives, some stacks (HP-UX 11) will enable TCP keepalives
automagically when the connection becomes "detached" (the app calls
close).

> I've read articles where clients send RST to the server (many disagree
> with doing this). I've also read a newgroup message where somebody modifies
> the Linux kernel to just bypass the TIME_WAIT state altogether (ACk!).

Both are kludges.

> Fortunately, it was for a custom application not meant for the Internet:
> http://groups.google.com/groups?hl=en&rnum=12&selm=_DSo5.3%241e.5564%40news.
> pacbell.net

> If my application is not detecting the end of the connection by
> the client, how do I recover from that?

Finding the app bug and fixing it. Somewhere, perhaps it is
mis-interpteing a read/recv return of zero.

rick jones
--
If you do not carry on, you let the bastards win.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...

David Schwartz

unread,

Sep 21, 2001, 4:17:13 PM9/21/01

to

Rick Jones wrote:

> Um, if the server calls close, its end of the connection should not be
> in CLOSE_WAIT. If the server's end of the connection is in CLOSE_WAIT,
> it implies that the server has not yet called close. It could also
> mean that the close was "lost" I suppose.

To be precise, you don't mean "call close", you mean "call close on the
last file descriptor referencing that end of the connection". The
difference could be *very* important if you 'fork' or 'dup'.

DS

Christopher

unread,

Sep 21, 2001, 5:26:12 PM9/21/01

to

"Rick Jones" <f...@bar.baz.invalid> wrote in message
news:9og0sr$bsh$3...@web1.cup.hp.com...

> In comp.protocols.tcp-ip Christopher <cwe...@adtel.com> wrote:
>
> If connections are arriving faster than the server can handle them, a
> number of things would happen:
>
> *) The server app would exhaust its available file descriptors.
>
> *) the listen queue would fill and connection requests (SYN segments)
> would start to be dropped.

Would a full listen queue cause a thread to freeze? It seems that after a
while, the connection threads freeze and the server keeps on taking new
connections (and they 'freeze' too) until the server aborts on "accept(): No
buffer space available". So, I never get to the point of running out of
file descriptors.

> > If my application is not detecting the end of the connection by
> > the client, how do I recover from that?
>
> Finding the app bug and fixing it. Somewhere, perhaps it is
> mis-interpteing a read/recv return of zero.

Following is the source code for the thread:

/* Begin source code */
void set_nonblocking( int sock ) {
int flags = fcntl( sock, F_GETFL );
if( flags == -1 ) {
perror( "fcntl() F_GETFL" );
} else {
if( fcntl( sock, F_SETFL, flags | O_NONBLOCK ) != 0 ) {
perror( "fcntl() F_SETFL" );
}
}
return;
}

void* my_thread( void* arg ) {
int sock = *((int*)arg);
int length = 0;
char* message_in;
char* message_out;
int flag = 1;
fd_set fds;

message_in = malloc( BUFSIZ );
message_out = malloc( BUFSIZ );
strncpy( message_out, "Success\n\0", 9);

set_nonblocking( sock );

FD_ZERO( &fds );
FD_SET( sock, &fds );

if( select( sock+1, &fds, NULL, NULL, NULL ) <= 0 ) {
perror("select()");
} else {
length = recv( sock, message_in,BUFSIZ, 0 );
if( length <= 0 )
perror("recv()");
else {
message_in[length] = '\0';
length = send( sock, message_out, BUFSIZ, 0 );
if( length <= 0 )
perror("send()");
}
}

FD_CLR( sock, &fds );
if( close(sock) == -1 ) {
fprintf(stderr,"\n");
perror("close()");
}

free( message_in );
free( message_out );

pthread_exit(NULL);
return NULL;
}
/* End source code */

> CLOSE_WAIT is dealt with by having the application call
> close. FIN_WAIT_2 is dealt with by having the application enable TCP
> keepalives, some stacks (HP-UX 11) will enable TCP keepalives
> automagically when the connection becomes "detached" (the app calls
> close).

I tried setting SO_KEEPALIVE to no avail, as it seems that the thread
execution has stopped somewhere along the line. I set the socket
non-blocking... Does the sockets revert back to blocking when overloaded?

Thank for the help!

-
wedman

Rick Jones

unread,

Sep 21, 2001, 7:28:52 PM9/21/01

to

In comp.protocols.tcp-ip Christopher <cwe...@adtel.com> wrote:
> Would a full listen queue cause a thread to freeze?

I would not think so. A full listen queue is dealt with entirely
(iirc) within the sockets/TCP/IP code.

> It seems that after a while, the connection threads freeze and the

If the connection threads are thinking that something is still going
to happen on a connection - say one that is in CLOSE_WAIT - resulting
from some bug in the app, they could eventaully all be sitting there
waiting for something on a CLOSE_WAIT connection. A system call trace
might be in order.

> server keeps on taking new connections (and they 'freeze' too) until
> the server aborts on "accept(): No buffer space available". So, I
> never get to the point of running out of file descriptors.

Hmm, I suppose that if the stack on which this is running has a
maximum number of TCP connections or somesuch, or if the system does
run-out of RAM you might get an ENOBUFS.

However, my understanding is a well-written app should not abort upon
reciept of an ENOBUF errno from an accept() call. ENOBUF is (always?)
a transient error that does not mean that the listen socket is toast.

Specifically, I know that under HP-UX 11, accept() can return an
ENOBUF when the remote client has terminated the connection before the
app has gotten around to call accept(). The accept for "that"
connection then returns an ENOBUF. (It was an existing, non-fatal
error return so it was used instead of using another that was not
already returned by accept - precisely to try and avoid having
applications eroneously abort...)

>> > If my application is not detecting the end of the connection by
>> > the client, how do I recover from that?
>>
>> Finding the app bug and fixing it. Somewhere, perhaps it is
>> mis-interpteing a read/recv return of zero.

> void* my_thread( void* arg ) {

> int sock = *((int*)arg);
> int length = 0;
> char* message_in;
> char* message_out;
> int flag = 1;
> fd_set fds;

> message_in = malloc( BUFSIZ );
> message_out = malloc( BUFSIZ );
> strncpy( message_out, "Success\n\0", 9);

> set_nonblocking( sock );

> FD_ZERO( &fds );
> FD_SET( sock, &fds );

> if( select( sock+1, &fds, NULL, NULL, NULL ) <= 0 ) {
> perror("select()");

Given that you pass-in no timeout for the select, if the remote were
to simply dissapear off the face of the net, your thread would sit
here forever (unless you also set SO_KEEPALIVE). Or, if the remote
stayed there but simply neversent you anything you would stay there
for ever as well.

I think that sockets for just-disconnected connections usually are
considered "readable" but it is entirely possible that a stack might
only notify you if you have passed-in exception FD's. This could be
the source of your CLOSE_WAITs.

> } else {
> length = recv( sock, message_in,BUFSIZ, 0 );
> if( length <= 0 )
> perror("recv()");

A return of zero bytes on recv() is not really an error. Errno will
not be set, so the perror() output would be rather uninteresting. It
is simply a signal that the remote has said it will send no more
data. Also, how are you handling requests that do not arrive in a
single TCP segment? There is no guarantee that requests arriving from
remote clients will not trickle-in one byte at a time...

> else {
> message_in[length] = '\0';
> length = send( sock, message_out, BUFSIZ, 0 );
> if( length <= 0 )
> perror("send()");

If you ever try to do a non-blocking send of > socket buffer size, or
several smaller ones in a row, send might take all the data you send
in the one call when you are marked non-blocking.... Only BUFSIZ being
rather small is keeping you from hitting that bug in your code.

Don't take this the wrong way, but if you do not already have a copy
of W Richard Stevens' UNIX netowrk Programming you should probably get
and read it.

> I tried setting SO_KEEPALIVE to no avail, as it seems that the

The default time before TCP starts sending keepalive probes is two
hours, and even then if it does terminate the connection, there is
still the question of whether or not your TCP/IP/sockets stack
requires you to register interest in "exception" events to get back
out of select().

> thread execution has stopped somewhere along the line. I set the
> socket non-blocking... Does the sockets revert back to blocking when
> overloaded?

They should not. Also, given the way your code looks above, I see no
point in having marked the socket non-blocking in the first place.

Christopher

unread,

Sep 24, 2001, 12:14:28 PM9/24/01

to

"Rick Jones" <f...@bar.baz.invalid> wrote in message

news:9ogifk$gsm$3...@web1.cup.hp.com...

> Hmm, I suppose that if the stack on which this is running has a
> maximum number of TCP connections or somesuch, or if the system does
> run-out of RAM you might get an ENOBUFS.
>
> However, my understanding is a well-written app should not abort upon
> reciept of an ENOBUF errno from an accept() call. ENOBUF is (always?)
> a transient error that does not mean that the listen socket is toast.

Well, don't worry. This isn't something that'll apear on freshmeat
anytime soon. :)

> > if( select( sock+1, &fds, NULL, NULL, NULL ) <= 0 ) {
> > perror("select()");
>
> Given that you pass-in no timeout for the select, if the remote were
> to simply dissapear off the face of the net, your thread would sit
> here forever (unless you also set SO_KEEPALIVE). Or, if the remote
> stayed there but simply neversent you anything you would stay there
> for ever as well.

I had put in a timeout value before, but it didn't affect the problems I
had.

> I think that sockets for just-disconnected connections usually are
> considered "readable" but it is entirely possible that a stack might
> only notify you if you have passed-in exception FD's. This could be
> the source of your CLOSE_WAITs.

> > length = recv( sock, message_in,BUFSIZ, 0 );

> > if( length <= 0 )
> > perror("recv()");
>
> A return of zero bytes on recv() is not really an error. Errno will
> not be set, so the perror() output would be rather uninteresting. It
> is simply a signal that the remote has said it will send no more
> data. Also, how are you handling requests that do not arrive in a
> single TCP segment? There is no guarantee that requests arriving from
> remote clients will not trickle-in one byte at a time...

I did that just to see if I would get any message there regardless (which
I didn't).

> > else {
> > message_in[length] = '\0';
> > length = send( sock, message_out, BUFSIZ, 0 );
> > if( length <= 0 )
> > perror("send()");
>
> If you ever try to do a non-blocking send of > socket buffer size, or
> several smaller ones in a row, send might take all the data you send
> in the one call when you are marked non-blocking.... Only BUFSIZ being
> rather small is keeping you from hitting that bug in your code.
>
> Don't take this the wrong way, but if you do not already have a copy
> of W Richard Stevens' UNIX netowrk Programming you should probably get
> and read it.

Well, this is my first crack at sockets programming, so yeah, I should.
At the same time, I'm glad that you and others reading the newgroups have
taken time to help me out. You're not going to offend me. Anyway, I think
I'm well aware of the consequences of posting my own source code to a
newsgroup. :)

Even if you were trying to offend me: I've got thick skin. :P

> > thread execution has stopped somewhere along the line. I set the
> > socket non-blocking... Does the sockets revert back to blocking when
> > overloaded?
>
> They should not. Also, given the way your code looks above, I see no
> point in having marked the socket non-blocking in the first place.

The code is for my own learning experience. I read somewhere else on the
'net that not settting the socket non-blocking will cause the socket to
block other sockets in the program (but I haven't tried it yet). But now
I'm aware of the other potential problem you mentioned above.

All in all, your (plural) replies and a weekend _away_ from the computer
have caused me to take a fresh look at this and try some more. Thanks a
bunch - I'll take a look at that book.

-
wedman