Socket performance: stalls

pmi...@my-deja.com

unread,

Nov 17, 1999, 3:00:00 AM11/17/99

to

I'm using Solaris 2.6 and gcc 2.95.1, trying to optimize a program
that acts as a network server. I have a program that can run locally
or across a network to simulate numbers of clients and transactions.
By changing thread and I/O strategies in the server, I can see
differences in throughput and responsiveness...almost.

My problem is that from time-to-time socket connections get "lost".
The client program makes a valid connection, but the subsequent read
takes forever (actually, times out after 15 seconds). The frequency
of this happening varies: sometimes it goes for 30 seconds with no
stalls and sometimes it happens multiple times in a row. For this
client/server combination, 30 seconds is 1000..1500 connections.

The other oddity is that the problem is only happening on a 4 CPU
system. Using the same code, also on Solaris 2.6, running on a
single processor workstation, I've never seen the symptom. I've
also never seen it under normal loads, only during stress testing.

I understand that the problem is most likely in the application code,
but I thought it might be worthwhile to ask whether the description
sounds like a familiar problem to anyone.

Thanks in advance for any ideas or solutions.

Sent via Deja.com http://www.deja.com/
Before you buy.

David Schwartz

unread,

Nov 19, 1999, 3:00:00 AM11/19/99

to

Odds are you have one of two problems:

1) You didn't follow the POSIX memory visibility rules and played funky
games with 'volatile' or 'lockless' schemes.

2) You didn't set your sockets non-blocking and simply assumed that
because the socket was ready for write/read/accept when select/poll
returned, it must be ready now.

DS

pmi...@my-deja.com

unread,

Nov 19, 1999, 3:00:00 AM11/19/99

to

Although I sincerely appreciate your reply, I'm not sure I understand
either suggestion well enough to do anything about them.

In article <38350EEB...@webmaster.com>,

David Schwartz <dav...@webmaster.com> wrote:
>
> Odds are you have one of two problems:
>
> 1) You didn't follow the POSIX memory visibility rules and
played funky
> games with 'volatile' or 'lockless' schemes.

If this refers to pthread issues, I suspect it's not the problem. That
opinion is based on the fact that the symptom survives major changes
in thread creation and work allocation strategies. If you have an
example of the kind of "funkiness" that would cause this situation,
I'd be able to say better whether I've engaged in it. :-)

> 2) You didn't set your sockets non-blocking and simply assumed
that
> because the socket was ready for write/read/accept when select/poll
> returned, it must be ready now.

It's true that the sockets returned from accept() on the server are
not non-blocking. Also, they are handled by select() and, if not an
error and not a timeout, then recv(). But since only one thread talks
to a particular socket, I don't get how the above is a workable
strategy on a single processor and fails on a multi-processor system.

I'd be happy to receive any pointers to a better way of handling this.

> DS

Andrew Gierth

unread,

Nov 19, 1999, 3:00:00 AM11/19/99

to

>>>>> "pmills" == pmills <pmi...@my-deja.com> writes:

>> 2) You didn't set your sockets non-blocking and simply assumed
>> that because the socket was ready for write/read/accept when
>> select/poll returned, it must be ready now.

pmills> It's true that the sockets returned from accept() on the
pmills> server are not non-blocking. Also, they are handled by
pmills> select() and, if not an error and not a timeout, then recv().
pmills> But since only one thread talks to a particular socket, I
pmills> don't get how the above is a workable strategy on a single
pmills> processor and fails on a multi-processor system.

I've noticed, also using Solaris 2.6, a significant problem with
select() (or poll) returning read ready for a socket, and a subsequent
read() or recv() call blocking indefinitely. This didn't even involve
threads... just a straightforward process-per-connection server spawned
from inetd.

Also, on other servers (also non-threaded) running on Solaris that
_do_ set non-blocking mode, I often see errors logged when EAGAIN was
unexpectedly returned from a read() call on a socket that had already
selected as ready...

These are clearly OS bugs, though I've not checked to see if any
patches are available. (In my current project I've simply abandoned
poll() entirely, instead using blocking read() calls with timeouts
handled via pthread_kill.)

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
or <URL: http://www.whitefang.com/unix/>

Message has been deleted

David Schwartz

unread,

Nov 20, 1999, 3:00:00 AM11/20/99

to

Andrew Gierth wrote:
>
> I've noticed, also using Solaris 2.6, a significant problem with
> select() (or poll) returning read ready for a socket, and a subsequent
> read() or recv() call blocking indefinitely. This didn't even involve
> threads... just a straightforward process-per-connection server spawned
> from inetd.

This is only a problem for broken code. It is incorrect to assume that
because a socket was ready for write at one point, it will be again at a
future point. If you are using 'select', that indicates that you don't
want your sockets operations to block, and you should tell the kernel
this by making your sockets non blocking.

It is not possible to use accept or poll to assure that sockets
operations never block. And to attempt to do so is an error.

> Also, on other servers (also non-threaded) running on Solaris that
> _do_ set non-blocking mode, I often see errors logged when EAGAIN was
> unexpectedly returned from a read() call on a socket that had already
> selected as ready...

That's normal. It was ready, not it's not. Things change.

> These are clearly OS bugs, though I've not checked to see if any
> patches are available. (In my current project I've simply abandoned
> poll() entirely, instead using blocking read() calls with timeouts
> handled via pthread_kill.)

It is not at all clear that they are OS bugs. There is no rule that
requires that a socket that was once ready for I/O remain so at some
future time. The OS has every right to do things that change this.

DS

Message has been deleted

Andrew Gierth

unread,

Nov 21, 1999, 3:00:00 AM11/21/99

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> This is only a problem for broken code. It is
David> incorrect to assume that because a socket was ready for write
David> at one point,

Read, not write.

The only way a socket should select as ready for reading is if either:

- data is available
- the connection is closed
- an error condition has occurred

None of these conditions are temporary; they all persist at least
until the next read or write call on the socket. This is especially
true of TCP, where the only possible error conditions involve the
connection being aborted.

David> it will be again at a future point. If you are using 'select',
David> that indicates that you don't want your sockets operations to
David> block,

nonsense

Using select() indicates nothing more than that you wish to determine
the current state of your sockets.

David> It is not possible to use accept or poll to assure
David> that sockets operations never block. And to attempt to do so
David> is an error.

This is an error _in some broken implementations_.

I don't see anything in the standards that permits this behaviour.

Indeed, I can't find any indication that this was ever an issue prior to
1995, when it was reported as a bug against Solaris 2.4. INN, which is
probably one of the most widely used servers to rely heavily on select
and nonblocking sockets, still contains this comment:

#ifdef POLL_BUG
/* return of -2 indicates EAGAIN, for SUNOS5.4 poll() bug workaround */
if (errno == EAGAIN) {
return -2;
}
#endif

POLL_BUG seems not to be defined in the code anymore, even on Solaris,
which is probably why my INN logs are full of 'cant read: Resource
temporarily unavailable' errors.

>> Also, on other servers (also non-threaded) running on Solaris that
>> _do_ set non-blocking mode, I often see errors logged when EAGAIN was
>> unexpectedly returned from a read() call on a socket that had already
>> selected as ready...

David> That's normal. It was ready, not it's not. Things change.

How can a socket legitimately go from ready to non-ready for reading
without having had a read call performed on it?

>> These are clearly OS bugs, though I've not checked to see if any
>> patches are available. (In my current project I've simply abandoned
>> poll() entirely, instead using blocking read() calls with timeouts
>> handled via pthread_kill.)

David> It is not at all clear that they are OS bugs. There is
David> no rule that requires that a socket that was once ready for
David> I/O remain so at some future time. The OS has every right to
David> do things that change this.

and your authority for this statement is....?

David Schwartz

unread,

Nov 22, 1999, 3:00:00 AM11/22/99

to

All that is required of select is that the operation be possible to
complete _without_blocking_ at the time that 'select' returns.

To give a hypothetical, suppose that some data was ready for reading,
but then the operating system detected a condition that assured it that
the connection was going to break. At the moment, however, the operating
system cannot determine if the connection is going to close normally, or
abort, or if it will abort, which error code is the correct one to
return to the user.

In this case, you tell me what 'read' should return.

DS

Andrew Gierth

unread,

Nov 22, 1999, 3:00:00 AM11/22/99

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> All that is required of select is that the operation
David> be possible to complete _without_blocking_ at the time that
David> 'select' returns.

Yes. _Exactly_. This is the bug that occurs with Solaris - sometimes
it will return from select() even though a subsequent read _will_
block (or return EAGAIN if the socket is in nonblocking mode).

And the fact that it is well-known and frequently worked-around does
not make it any less of a bug.

David> To give a hypothetical, suppose that some data was
David> ready for reading, but then the operating system detected a
David> condition that assured it that the connection was going to
David> break. At the moment, however, the operating system cannot
David> determine if the connection is going to close normally, or
David> abort, or if it will abort, which error code is the correct
David> one to return to the user.

David> In this case, you tell me what 'read' should return.

It should return the data that is currently available. If the
connection is going to close normally then it would have to do so in
any case, and since the connection has not yet aborted it is correct
to return the data received so far.

If there were no data, or the rules of the protocol in question
precluded it from delivering the data to the application, then it
should not have marked the socket ready until the true state was
known.

Joe Halpin

unread,

Nov 22, 1999, 3:00:00 AM11/22/99

to

David Schwartz <dav...@webmaster.com> writes:

> Andrew Gierth wrote:
> >
> > I've noticed, also using Solaris 2.6, a significant problem with
> > select() (or poll) returning read ready for a socket, and a subsequent
> > read() or recv() call blocking indefinitely. This didn't even involve
> > threads... just a straightforward process-per-connection server spawned
> > from inetd.
>
> This is only a problem for broken code. It is incorrect to assume that
> because a socket was ready for write at one point, it will be again at a
> future point. If you are using 'select', that indicates that you don't
> want your sockets operations to block, and you should tell the kernel
> this by making your sockets non blocking.

If select() says a socket is ready for reading, then by definition it
will not block if you call read() on it. I rarely put sockets in
non-blocking mode because there's no need to when using select().

> It is not possible to use accept or poll to assure that sockets
> operations never block. And to attempt to do so is an error.

If you meant to say "select or poll", then the error is in your
understanding. If they say a socket is ready to read, then it won't
block when you call read on it. If it does, then the implementation is
in error, not the attempt to do so.

> > Also, on other servers (also non-threaded) running on Solaris that
> > _do_ set non-blocking mode, I often see errors logged when EAGAIN was
> > unexpectedly returned from a read() call on a socket that had already
> > selected as ready...
>

> That's normal. It was ready, not it's not. Things change.

No, they don't. If there's data available to be read, it's not going
to go away without reading it. See the TCP specs.

> It is not at all clear that they are OS bugs. There is no rule that
> requires that a socket that was once ready for I/O remain so at some
> future time. The OS has every right to do things that change this.

You're talking about Windows now, right?

In Unix, the OS is required to do what it's supposed to do. It may be
normal for Windows to throw data away, I wouldn't know, but TCP
doesn't allow data to just go away like this. A conforming TCP
implementation, once it has data available for a connection, will keep
that data there until the application calls read(), or the connection
is closed somehow.

The OS has no right to change that.

Joe

David Schwartz

unread,

Nov 23, 1999, 3:00:00 AM11/23/99

to

Joe Halpin wrote:
> If select() says a socket is ready for reading, then by definition it
> will not block if you call read() on it. I rarely put sockets in
> non-blocking mode because there's no need to when using select().

It would not block had you called read on it the instant select said it
was safe to read on. The operating system makes no future guarantees.

> > It is not possible to use accept or poll to assure that sockets
> > operations never block. And to attempt to do so is an error.
>
> If you meant to say "select or poll", then the error is in your
> understanding. If they say a socket is ready to read, then it won't
> block when you call read on it. If it does, then the implementation is
> in error, not the attempt to do so.

Umm, no. It wouldn't have blocked had you called read on it at that
particular point in time. Any future application is open to question,
since things can change.

> > That's normal. It was ready, not it's not. Things change.
>
> No, they don't. If there's data available to be read, it's not going
> to go away without reading it. See the TCP specs.

Select can return from read for reasons other than data being available
for read. It can return if there's an error, for example. At a later
point, it may not be clear what the correct error code is to return.

There is no requirement that a return from select indicating readiness
for any particular operation assures such readiness in the future. Using
select on blocking sockets will bite you eventually, on some platform.

> > It is not at all clear that they are OS bugs. There is no rule that
> > requires that a socket that was once ready for I/O remain so at some
> > future time. The OS has every right to do things that change this.
>
> You're talking about Windows now, right?
>
> In Unix, the OS is required to do what it's supposed to do. It may be
> normal for Windows to throw data away, I wouldn't know, but TCP
> doesn't allow data to just go away like this. A conforming TCP
> implementation, once it has data available for a connection, will keep
> that data there until the application calls read(), or the connection
> is closed somehow.
>
> The OS has no right to change that.

There is no guarantee that data is available for reading. It's entirely
possible that the implementation detected an error condition. It's later
possible that the operating system cannot immediately tell you what the
error condition is exactly -- perhaps it cannot allocate memory
necessary to do so at the instant.

DS

Colin Walters

unread,

Nov 23, 1999, 3:00:00 AM11/23/99

to

David Schwartz <dav...@webmaster.com> writes:

> There is no guarantee that data is available for reading. It's entirely
>possible that the implementation detected an error condition.

On my system (Debian GNU/Linux), select() will return a set of
filehandles which are ready for reading, writing, and also a set of
file descriptors on which errors occured.

--
Colin Walters <lev...@verbum.org>
http://web.verbum.org/levanti
(1024D/C207843A) A580 5AA1 0887 2032 7EFB 19F4 9776 6282 C207 843A

David Schwartz

unread,

Nov 23, 1999, 3:00:00 AM11/23/99

to

Colin Walters wrote:
>
> David Schwartz <dav...@webmaster.com> writes:
>
> > There is no guarantee that data is available for reading. It's entirely
> >possible that the implementation detected an error condition.
>
> On my system (Debian GNU/Linux), select() will return a set of
> filehandles which are ready for reading, writing, and also a set of
> file descriptors on which errors occured.

You are confused. The set of filehandles ready for reading includes
those on which an error has occured. The third set is for 'exceptional'
conditions.

DS

David Schwartz

unread,

Nov 23, 1999, 3:00:00 AM11/23/99

to

Andrew Gierth wrote:

> You still haven't given an authority for this.

I don't have to. You are the one asserting that the interface makes a
guarantee. I'm asserting that there is no such guarantee. I have no
obligation to prove a negative (and it's impossible -- how can I prove
that no such guarantee exists?). If you wish to maintain that such a
guarantee is made, you should present it.

DS

Andrew Gierth

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> It would not block had you called read on it the
David> instant select said it was safe to read on. The operating
David> system makes no future guarantees.

You still haven't given an authority for this.

I maintain that there are no cases in which a socket can legitimately
go from the state "read will not block" to "read will block" without
intervention from the application.

David> There is no requirement that a return from select
David> indicating readiness for any particular operation assures such
David> readiness in the future. Using select on blocking sockets will
David> bite you eventually, on some platform.

It will bite you, yes, but because there exist buggy implementations
such as Solaris, not because this is the intended behaviour of
select()/poll().

Andrew Gierth

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

> Andrew Gierth wrote:
>> You still haven't given an authority for this.

David> I don't have to. You are the one asserting that the
David> interface makes a guarantee. I'm asserting that there is no
David> such guarantee. I have no obligation to prove a negative (and
David> it's impossible -- how can I prove that no such guarantee
David> exists?). If you wish to maintain that such a guarantee is
David> made, you should present it.

From the description of poll() in both the Single Unix Spec and the
Solaris manpages:

POLLIN Data other than high-priority data may be read without
blocking.

There are also a good number of examples in vol.1 of Stevens' UNP in
which select is used on blocking sockets. Likewise the example of INN,
which prior to the Solaris 2.4 bug referred to earlier did not behave
gracefully when read calls returned EAGAIN after a successful select.
Furthermore there is the fact that _no_ manpage or formal description
(that I am aware of) of the select and poll functions suggests that
they can return spurious "ready" indications (contrast this with, for
example, pthread_cond_wait which specifically allows for such).

Your move.

sti...@my-deja.com

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

In article <87aeo4w...@erlenstar.demon.co.uk>,
Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:

<snip>

> There are also a good number of examples in vol.1 of Stevens' UNP in
> which select is used on blocking sockets.

The details from Stevens (paraphrased) :

(Note his use of "will" -- future tense and quite definate)

A socket is ready for reading (according to select) in these four cases:

1. There is data in the socket receive buffer greater than the socket
low water mark. A read *will* not block and will return that data.
2. The read-half of the connection is closed. A read *will* not
block and will return zero.
3. A listening socket that has a pending connection. Accept should
not block, but under certain odd timing conditions (which Stevens
details) it could block. A non-blocking socket is called for in
this one case.
4. An error is pending, in which case read *will* not block and return
a value of -1.

So, Stevens clearly supports the use of non-blocking sockets with
select, except in the case of a listening socket.

Regarding the prior example of what happens if an error occurs
after the select has returned with data, it is simple: the read
will return the data in the buffer just fine. Then, when the select
loop is entered again, the socket will be marked ready again, this
time the read will return the error condition.

--
Jeffrey L. Straszheim
sti...@my-dejanews.com

Andrew Gierth

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

>>>>> "stimuli" == stimuli <sti...@my-deja.com> writes:

stimuli> So, Stevens clearly supports the use of non-blocking sockets
stimuli> with select, except in the case of a listening socket.

One of my small contributions to the book :-)

(though Rich cited it as a private communication, it was actually a
courtesy email copy of my article <uflod0m...@zen.microlise.co.uk>
posted to this very newsgroup :-)

Joe Halpin

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

David Schwartz <dav...@webmaster.com> writes:

> Joe Halpin wrote:

> > If select() says a socket is ready for reading, then by definition
> > it will not block if you call read() on it. I rarely put sockets
> > in non-blocking mode because there's no need to when using
> > select().
>

> It would not block had you called read on it the instant
> select said it was safe to read on. The operating system makes no
> future guarantees.

Really? How long is an instant? Where is this documented?

TCP is a well defined protocol, which is to say that a conforming
implementation is guaranteed to act as specified in the RFCs (that's
what 'conforming' means). If the operating system implements TCP
correctly, it won't act as you say. If data is available in TCP's
buffers, it will stay there until the application reads it, or the
connection is closed.

The read system call is well defined in the man pages. They specify
how it will act under the relevant conditions, which is to say that
the OS which implements read() guarantees how it will work. Same for
select.

Where do any of these specifications say that data can just go away?
Where is the timeout specified? Where are you getting this?

> There is no guarantee that data is available for reading. It's
> entirely possible that the implementation detected an error

> condition. It's later possible that the operating system cannot
> immediately tell you what the error condition is exactly -- perhaps
> it cannot allocate memory necessary to do so at the instant.

And your basis for this is? Why does the system need to allocate
memory to return an error code? Socket errors are maintained in a
structure that exists for the life of the connection, so why does
memory need to be allocated? Can you describe a scenario in which the
system could detect an error, return -1 from read(), and not be able
to set errno to tell you what happened?

If read is called on a socket descriptor that has an error condition,
it will return immediately with -1, and set errno to indicate the
error. It won't block.

If the connection is closed, read will return 0 immediatly, it won't
block.

Joe

David Schwartz

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

Andrew Gierth wrote:
>
> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:
>
> > Andrew Gierth wrote:
> >> You still haven't given an authority for this.
>
> David> I don't have to. You are the one asserting that the
> David> interface makes a guarantee. I'm asserting that there is no
> David> such guarantee. I have no obligation to prove a negative (and
> David> it's impossible -- how can I prove that no such guarantee
> David> exists?). If you wish to maintain that such a guarantee is
> David> made, you should present it.
>
> From the description of poll() in both the Single Unix Spec and the
> Solaris manpages:
>
> POLLIN Data other than high-priority data may be read without
> blocking.

Yes, but _when_?

> There are also a good number of examples in vol.1 of Stevens' UNP in
> which select is used on blocking sockets.

So? You can use select on blocking sockets -- it can just block.

> Likewise the example of INN,
> which prior to the Solaris 2.4 bug referred to earlier did not behave
> gracefully when read calls returned EAGAIN after a successful select.
> Furthermore there is the fact that _no_ manpage or formal description
> (that I am aware of) of the select and poll functions suggests that
> they can return spurious "ready" indications (contrast this with, for
> example, pthread_cond_wait which specifically allows for such).

They are not superious. At the time the indication was made, the
operation was possible.

These functions do not, and cannot, come with future guarantees.

DS

David Schwartz

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

Joe Halpin wrote:
>
> David Schwartz <dav...@webmaster.com> writes:
>
> > Joe Halpin wrote:
>
> > > If select() says a socket is ready for reading, then by definition
> > > it will not block if you call read() on it. I rarely put sockets
> > > in non-blocking mode because there's no need to when using
> > > select().
> >
> > It would not block had you called read on it the instant
> > select said it was safe to read on. The operating system makes no
> > future guarantees.
>
> Really? How long is an instant? Where is this documented?

An instant is as long as it remains possible for the operating system
to complete the operation without blocking.

> TCP is a well defined protocol, which is to say that a conforming
> implementation is guaranteed to act as specified in the RFCs (that's
> what 'conforming' means). If the operating system implements TCP
> correctly, it won't act as you say. If data is available in TCP's
> buffers, it will stay there until the application reads it, or the
> connection is closed.

It will stay in the buffers. That doesn't mean a read will complete
without blocking. Suppose the operating system has swapped out the
buffers. (I realize that no current operating system does this, but the
TCP specification hardly prohibits it.)

> The read system call is well defined in the man pages. They specify
> how it will act under the relevant conditions, which is to say that
> the OS which implements read() guarantees how it will work. Same for
> select.

Yes, the call will not block if it was set non-blocking, and may block
otherwise -- for as long as it takes the operating system to complete
the operation. The sequence of steps required to complete the operation
could be complex, and could block for any number of reasons.

If you can find a guarantee that an operation that could complete
without blocking at time X must complete without blocking at time X+Y,
that would be another story.

> Where do any of these specifications say that data can just go away?
> Where is the timeout specified? Where are you getting this?

I'm not saying data has gone away. I'm saying that the operating system
cannot complete the operation without blocking at time X+Y, even though
it could have at time X. Perhaps it no longer has enough memory to
allocate a buffer it wants to copy the data into. Perhaps the connection
structure is locked in some way. Who knows.

Absent an actual guarantee, if you are trying to make portable code,
you should not assume you have one.

> > There is no guarantee that data is available for reading. It's
> > entirely possible that the implementation detected an error
> > condition. It's later possible that the operating system cannot
> > immediately tell you what the error condition is exactly -- perhaps
> > it cannot allocate memory necessary to do so at the instant.
>
> And your basis for this is? Why does the system need to allocate
> memory to return an error code? Socket errors are maintained in a
> structure that exists for the life of the connection, so why does
> memory need to be allocated?

Where in the specification is this guaranteed?

> Can you describe a scenario in which the
> system could detect an error, return -1 from read(), and not be able
> to set errno to tell you what happened?

It doesn't matter whether I can describe one or not. There is no
guarantee that no such scenario exists. Defensive code, portable code,
well-written code, should be coded to cope with any scenario that is not
guaranteed not to occur. The guarantee in question here does not exist.
There is no guarantee that the operation will continue to be possible
without blocking.

> If read is called on a socket descriptor that has an error condition,
> it will return immediately with -1, and set errno to indicate the
> error. It won't block.
>
> If the connection is closed, read will return 0 immediatly, it won't
> block.

Just because the operating system knows that the connection is going to
close/abort doesn't mean it knows which error code to return. And just
because the operating system at one time intended to return one
particular error code doesn't mean it has to return that same error code
later. Perhaps the socket structure is locked in some way, and the
operating system can't access it at that second.

It is not good enough to say that no current platform does this.
Portable code that follows standards should not be written based upon
how current implementations of the standard happen to behave.

DS

Andrew Gierth

unread,

Nov 25, 1999, 3:00:00 AM11/25/99

to

>>>>> "Russ" == Russ Allbery <r...@stanford.edu> writes:

>> POLL_BUG seems not to be defined in the code anymore, even on
>> Solaris, which is probably why my INN logs are full of 'cant read:
>> Resource temporarily unavailable' errors.

Russ> It used to be set in config.data and got lost with the autoconf
Russ> rewrite; it'll be turned on by default as of the next major
Russ> release.

Personally, I'd prefer it if Sun fixed the damn bug instead....

Andrew Gierth

unread,

Nov 25, 1999, 3:00:00 AM11/25/99

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> It will stay in the buffers. That doesn't mean a read
David> will complete without blocking. Suppose the operating system
David> has swapped out the buffers. (I realize that no current
David> operating system does this, but the TCP specification hardly
David> prohibits it.)

In which case read() waits for the buffers to be swapped in again.
You're interpreting "blocking" too strongly. For example, read() or
write() on a nonblocking socket will still wait for pager i/o if the
user's buffer happens to be paged out.

While I don't know of any systems that page out actual network data,
there do exist systems that can page out the file table entry
referring to the socket (though not the socket structure itself).
Likewise, this does not affect the visible behaviour.

David> It doesn't matter whether I can describe one or
David> not. There is no guarantee that no such scenario
David> exists. Defensive code, portable code, well-written code,
David> should be coded to cope with any scenario that is not
David> guaranteed not to occur.

The trouble with that approach is that even scenarios which are
guaranteed not to occur do in fact occur in practice. If you look at
the source for any widely-used networking application, you'll find it
littered with workarounds for this or that O/S bug.

There are Solaris systems in which the sigwait() function, for
example, simply does not work at all. Does that mean that you
summarily abandon the use of it in all your code, rather than
installing the necessary patch to _make_ it work?

David> Just because the operating system knows that the
David> connection is going to close/abort doesn't mean it knows which
David> error code to return.

In which case it should not have marked the socket as readable in the
first place.

Besides, real-life protocols do not behave that way - if an error
occurs, it is unambiguous. This is especially true of TCP.

David> And just because the operating system at one time intended to
David> return one particular error code doesn't mean it has to return
David> that same error code later. Perhaps the socket structure is
David> locked in some way, and the operating system can't access it
David> at that second.

See comments above about paging; same considerations apply.

David> It is not good enough to say that no current platform
David> does this.

Who said that? My first post to this thread specifically said that
Solaris 2.6 (and probably every version since 2.4) _does_ return ready
on select/poll and then block forever (or return EAGAIN if in
non-blocking mode) in some circumstances; and that I consider this to
be a bug in Solaris as the behaviour is not justified by either
standards or (more importantly) documentation. Indeed Sun appear to
have acknowledged it as a bug themselves, according to the 1995 report
I found on Sunsolve; it's perhaps disappointing that it has apparently
reappeared, but this is hardly unprecedented.

David> Portable code that follows standards should not be written
David> based upon how current implementations of the standard happen
David> to behave.

Code that runs on real systems doing real work had better take account
of how those systems behave....

(As an aside, one could argue that one disadvantage of the open-source
approach to development (for non-open-source O/Ss) is that O/S bugs
are more likely to be worked around rather than actually fixed,
leading to the accumulation, in some cases, of a ludicrous amount of
historical cruft that can easily become a maintenance nightmare.)

Joe Halpin

unread,

Nov 25, 1999, 3:00:00 AM11/25/99

to

David Schwartz <dav...@webmaster.com> writes:

> Joe Halpin wrote:
> >
> > David Schwartz <dav...@webmaster.com> writes:

> > TCP is a well defined protocol, which is to say that a conforming
> > implementation is guaranteed to act as specified in the RFCs (that's
> > what 'conforming' means). If the operating system implements TCP
> > correctly, it won't act as you say. If data is available in TCP's
> > buffers, it will stay there until the application reads it, or the
> > connection is closed.
>

> It will stay in the buffers. That doesn't mean a read will complete
> without blocking. Suppose the operating system has swapped out the
> buffers. (I realize that no current operating system does this, but the
> TCP specification hardly prohibits it.)

Is this what you mean by blocking? If so, then all processes always
block on all system calls, because they have to wait for the switch to
kernel mode to take place, and they have to wait for the kernel to do
anything else needed to execute the system call.

The only definition I've ever heard of for blocking is that the
process is put to sleep, and the processor is not currently executing
code on its behalf. This wouldn't be the case if the OS is pulling
data in from paging space as part of the system call. That's just part
of the overhead of read() (if it happens at all).

Blocking is when, for example, there is no data available, and the
process is put to sleep until some arrives. The processor goes off an
executes code for some other process then. CPU time isn't charged to a
blocked process, it is charged to a process when the CPU is being used
to process a system call on its behalf.

If all you mean is that things don't happen immediately, and system
calls don't always execute in a deterministic amount of time, then
fine.

> > If the connection is closed, read will return 0 immediatly, it
> > won't block.
>

> Just because the operating system knows that the connection is
> going to close/abort doesn't mean it knows which error code to
> return. And just

What does this mean? How would the operating system know that the
connection is *going* to close or abort? It can only know that it has
done one or the other. The fact that it knows one or the other means
that it knows what to return.

> because the operating system at one time intended to return one
> particular error code doesn't mean it has to return that same error
> code later. Perhaps the socket structure is locked in some way, and
> the operating system can't access it at that second.

You lost me here. Returning an error code is something that happens
when the OS is done with an operation. The error is returned and
that's the end of it. If another error occurs later, so what? When the
call returns, it's over.

If the system call terminates because the OS has found something
wrong, the OS returns an error code to describe why it terminated the
call. It doesn't keep going, looking for something else to happen. I
don't think I understand what you're saying here.

Joe

Andrew Gierth

unread,

Nov 25, 1999, 3:00:00 AM11/25/99

to

>>>>> "Joe" == Joe Halpin <jha...@nortelnetworks.com.nospam> writes:

Joe> The only definition I've ever heard of for blocking is that the
Joe> process is put to sleep, and the processor is not currently
Joe> executing code on its behalf. This wouldn't be the case if the
Joe> OS is pulling data in from paging space as part of the system
Joe> call.

Actually it would. If pages need to be pulled in from disk, then the
process will sleep until the disk I/O is complete.

That's not the point though. "blocking" as applied to sockets doesn't
really mean the same thing as "sleeping" in the kernel sense, what it
means is: the operation will not wait for external or slow events.
Short-term sleeps like paging or internal kernel locks don't count.

Joe> Blocking is when, for example, there is no data available, and
Joe> the process is put to sleep until some arrives. The processor
Joe> goes off an executes code for some other process then. CPU time
Joe> isn't charged to a blocked process, it is charged to a process
Joe> when the CPU is being used to process a system call on its
Joe> behalf.

When a process is waiting for pages to be read in from disk, then the
CPU will indeed be off doing something else.

Joe Halpin

unread,

Nov 25, 1999, 3:00:00 AM11/25/99

to

Andrew Gierth <and...@erlenstar.demon.co.uk> writes:

> >>>>> "Joe" == Joe Halpin <jha...@nortelnetworks.com.nospam> writes:
>
> Joe> The only definition I've ever heard of for blocking is that the
> Joe> process is put to sleep, and the processor is not currently
> Joe> executing code on its behalf. This wouldn't be the case if the
> Joe> OS is pulling data in from paging space as part of the system
> Joe> call.
>
> Actually it would. If pages need to be pulled in from disk, then the
> process will sleep until the disk I/O is complete.
>
> That's not the point though. "blocking" as applied to sockets doesn't
> really mean the same thing as "sleeping" in the kernel sense, what it
> means is: the operation will not wait for external or slow events.
> Short-term sleeps like paging or internal kernel locks don't count.

OK, I should have been more careful about how I said it.

If a socket is in non-blocking mode, and select says it's ready to
read because data is indeed available to be read, then read() will not
return until the data has been copied into the application buffer,
which may include a period of being put to sleep waiting for disk
I/O. This is no different than the case of a blocking socket when data
is available to be read.

So I'm claiming that if there's no difference between blocking and
non-blocking sockets in this respect, then blocking isn't what's
happening (or shouldn't be).

The difference in my mind is that I can't help having to put up with
system overhead. I can avoid lengthly periods of being blocked by
using select() to determine when descriptors will block and when they
won't (modulo the problems you mentioned with Solaris, et al).

Joe

Message has been deleted

David Schwartz

unread,

Nov 26, 1999, 3:00:00 AM11/26/99

to

Joe Halpin wrote:
> > It will stay in the buffers. That doesn't mean a read will complete
> > without blocking. Suppose the operating system has swapped out the
> > buffers. (I realize that no current operating system does this, but the
> > TCP specification hardly prohibits it.)
>
> Is this what you mean by blocking? If so, then all processes always
> block on all system calls, because they have to wait for the switch to
> kernel mode to take place, and they have to wait for the kernel to do
> anything else needed to execute the system call.

I give up. Do you understand that it does not matter what any
particular platform does? Do you get that?

> The only definition I've ever heard of for blocking is that the

> process is put to sleep, and the processor is not currently executing
> code on its behalf. This wouldn't be the case if the OS is pulling
> data in from paging space as part of the system call. That's just part
> of the overhead of read() (if it happens at all).
>
> Blocking is when, for example, there is no data available, and the
> process is put to sleep until some arrives. The processor goes off an
> executes code for some other process then. CPU time isn't charged to a
> blocked process, it is charged to a process when the CPU is being used
> to process a system call on its behalf.
>
> If all you mean is that things don't happen immediately, and system
> calls don't always execute in a deterministic amount of time, then
> fine.

No, I mean that there is no guarantee that the operating system might
not need to block to complete the call. And I mean that the operating
system might be 'nice' to processes who request non-blocking behavior by
not blocking when it's possible not to do so.

> > > If the connection is closed, read will return 0 immediatly, it
> > > won't block.
> >
> > Just because the operating system knows that the connection is
> > going to close/abort doesn't mean it knows which error code to
> > return. And just
>
> What does this mean? How would the operating system know that the
> connection is *going* to close or abort?

Perhaps a send times out, but it doesn't yet know whether the other
side will acknowledge its attempt to close or not. (Remember, it doesn't
matter here what actually _can_ happen, only what is not guaranteed not
to happen.)

> It can only know that it has
> done one or the other. The fact that it knows one or the other means
> that it knows what to return.

Where is that guaranteed?

> > because the operating system at one time intended to return one
> > particular error code doesn't mean it has to return that same error
> > code later. Perhaps the socket structure is locked in some way, and
> > the operating system can't access it at that second.
>
> You lost me here. Returning an error code is something that happens
> when the OS is done with an operation. The error is returned and
> that's the end of it. If another error occurs later, so what? When the
> call returns, it's over.

Not at all. ICMP messages, for example, might be used merely as
advisory. And they might convince the kernel to return a 'network
unreachable' instead of a 'send timed out'.

> If the system call terminates because the OS has found something
> wrong, the OS returns an error code to describe why it terminated the
> call. It doesn't keep going, looking for something else to happen. I
> don't think I understand what you're saying here.

Where is that guaranteed? Which standard?

Remember, you are the one advocating _RELYING_ on this behavior for
your program to work.

DS

Joe Halpin

unread,

Nov 26, 1999, 3:00:00 AM11/26/99

to

David Schwartz <dav...@webmaster.com> writes:

> Joe Halpin wrote:

> > > It will stay in the buffers. That doesn't mean a read will
> > > complete without blocking. Suppose the operating system has
> > > swapped out the buffers. (I realize that no current operating
> > > system does this, but the TCP specification hardly prohibits
> > > it.)
> >
> > Is this what you mean by blocking? If so, then all processes
> > always block on all system calls, because they have to wait for
> > the switch to kernel mode to take place, and they have to wait for
> > the kernel to do anything else needed to execute the system call.
>
> I give up. Do you understand that it does not matter what any
> particular platform does? Do you get that?

It seems to me that what a particular platform does is the only thing
that would matter for you. You have to know what a system actually
does, and why, before you can write code for it. Otherwise you have to
write code that takes an unspecified number of unspecified
possibilities into account. You would have to allow for everything
that isn't guaranteed not to happen, which is almost anything.

Look, you're a sharp guy, and I respect your expertise, I don't even
deny that I agree with you in theory. I'm not disputing that we need
to write robust code, that can handle system differences, but in real
life we expect standard behavior from systems that claim to conform to
standards, and deal with exceptional cases as needed. I take it that's
why we test things, and why we require conformance to standards from
our system vendors.

Joe

David Schwartz

unread,

Nov 26, 1999, 3:00:00 AM11/26/99

to

Joe Halpin wrote:
>
> David Schwartz <dav...@webmaster.com> writes:
>
> > Joe Halpin wrote:
>
> > > > It will stay in the buffers. That doesn't mean a read will
> > > > complete without blocking. Suppose the operating system has
> > > > swapped out the buffers. (I realize that no current operating
> > > > system does this, but the TCP specification hardly prohibits
> > > > it.)
> > >
> > > Is this what you mean by blocking? If so, then all processes
> > > always block on all system calls, because they have to wait for
> > > the switch to kernel mode to take place, and they have to wait for
> > > the kernel to do anything else needed to execute the system call.
> >
> > I give up. Do you understand that it does not matter what any
> > particular platform does? Do you get that?
>
> It seems to me that what a particular platform does is the only thing
> that would matter for you. You have to know what a system actually
> does, and why, before you can write code for it.

No. This is what standards are for. You should be able to write code
that will work on future platforms little to no adjustments required.

> Otherwise you have to
> write code that takes an unspecified number of unspecified
> possibilities into account.

Exactly, and that's precisely what you should do. You should assume
that anything that isn't certain not to happen can happen. You should
code for it, and handle it smoothly.

> You would have to allow for everything
> that isn't guaranteed not to happen, which is almost anything.

Exactly, everywhere possible, you should. You should try to prepare
your code to accept anything. You should not assume that the system is
static and that what's true at time X is true at time X+Y. It's foolish
to do so.

> Look, you're a sharp guy, and I respect your expertise, I don't even
> deny that I agree with you in theory. I'm not disputing that we need
> to write robust code, that can handle system differences, but in real
> life we expect standard behavior from systems that claim to conform to
> standards, and deal with exceptional cases as needed. I take it that's
> why we test things, and why we require conformance to standards from
> our system vendors.

Right. But the standard does not require the operation to continue to
be possible without blocking. And in many legitimate cases (especially
with 'accept/listen' but also with reading and writing) it's not easy
for the operating system to give this assurance.

If you don't want your sockets to block, you should make them
non-blocking. Otherwise, you cannot expect the operating system to know
what it is you want.

In any event, it usually doesn't matter, because most cases where you
use 'select', you wind up repeating the read/write operation until it
blocks (or you are 'done'), so you need to make the sockets non-blocking
anyway.

DS