Is socket thread-safe?

alex

unread,

Nov 3, 2002, 8:35:00 PM11/3/02

to

If a socket is shared between two threads, is it thread if both said threads
send() data via such socket? How about recv()? While one thread gets part
of data in socket buffer sometimes and at other time the other thread will
consume part of the data stream?

Thanks in advance!

Eugene Mayevski

unread,

Nov 4, 2002, 1:46:19 AM11/4/02

to

Sockets are not thread-safe in Windows and if you ever try to do
anything with the socket from another thread, you will get mysterious
errors in Winsock later.

Sincerely yours,
Eugene Mayevski

Sujeet Varakhedi

unread,

Nov 4, 2002, 2:51:21 AM11/4/02

to

Hi! Eugene,
If u say that sockets are not thread safe in windows,the how are they thrad
safe in UNIX or Linux....just wanted to know

Sujeet
"Eugene Mayevski" <maye...@eldos.org> wrote in message
news:aq554s$2ssb$1...@news.dg.net.ua...

those who know me have no need of my name

unread,

Nov 4, 2002, 5:25:30 AM11/4/02

to

in comp.protocols.tcp-ip i read:

>If u say that sockets are not thread safe in windows,the how are they thrad
>safe in UNIX or Linux....just wanted to know

they aren't thread-safe anywhere. you need to serialize access yourself.

--
bringing you boring signatures for 17 years

Eugene Mayevski

unread,

Nov 4, 2002, 9:22:13 AM11/4/02

to

Sujeet Varakhedi wrote:

> If u say that sockets are not thread safe in windows,the how are they thrad
> safe in UNIX or Linux....just wanted to know

I think this is platform- and implementation-specific. On Windows we
tried to close the socket from other thread and this caused large
problems. So I know it's not thread-safe :)

Sincerely yours,
Eugene Mayevski

Phil Frisbie, Jr.

unread,

Nov 4, 2002, 12:42:02 PM11/4/02

to

That sounds more like bad program architecture :)

It is common to close a socket on one thread when there is another thread
blocking on a socket call. When the socket is closed, the blocking call will
unblock and set the error ENOTSOCK. Your thread should recognize that as a
signal to clean up and exit.

Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com

Phil Frisbie, Jr.

unread,

Nov 4, 2002, 12:45:21 PM11/4/02

to

It makes no sense to use multiple threads on the same TCP socket sending or
receiving. UDP maybe, but never TCP unless you are using some sort of message
boundaries and mutexes to block the other threads.

However, it IS thread safe, and common, to use one thread to send and another to
receive.

David Schwartz

unread,

Nov 4, 2002, 1:02:17 PM11/4/02

to

They are guaranteed thread-safe in Winsock2, provided you do things
that make sense. For UDP, you can perfectly legally send and receive
from any combination of threads. For TCP, you can send in one thread
while you receive in another.

DS

Casper H.S. Dik

unread,

Nov 4, 2002, 3:25:02 PM11/4/02

to

those who know me have no need of my name <not-a-rea...@usa.net> writes:

>in comp.protocols.tcp-ip i read:

>>If u say that sockets are not thread safe in windows,the how are they thrad
>>safe in UNIX or Linux....just wanted to know

>they aren't thread-safe anywhere. you need to serialize access yourself.

They're thread-safe in Solaris.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

those who know me have no need of my name

unread,

Nov 4, 2002, 4:04:05 PM11/4/02

to

in comp.protocols.tcp-ip i read:

>those who know me have no need of my name <not-a-rea...@usa.net>
>writes:
>>in comp.protocols.tcp-ip i read:

>>>If u say that sockets are not thread safe in windows,the how are they
>>>thrad safe in UNIX or Linux....just wanted to know
>
>>they aren't thread-safe anywhere. you need to serialize access yourself.
>
>They're thread-safe in Solaris.

by themselves they are safe on every platform i know of, but the
programmers invariably do something stupid that transforms them
into uselessness.

Eugene Mayevski

unread,

Nov 4, 2002, 4:18:24 PM11/4/02

to

Phil Frisbie, Jr. wrote:

>>I think this is platform- and implementation-specific. On Windows we
>>tried to close the socket from other thread and this caused large
>>problems. So I know it's not thread-safe :)
> That sounds more like bad program architecture :)
> It is common to close a socket on one thread when there is another thread

Ok, how would one explain that Winsock crashes when the socket is closed
in another thread from where it was created? There is no "program
architecture" at all. Just one socket transferring the data and another
which closes the socket.

I am not trying to defned my point of wview, but rather trying to find
out the truth. We had terible Aaccess violations and other strange
things after doing what I've described.

Sincerely yours,
Eugene Mayevski

David Schwartz

unread,

Nov 4, 2002, 4:29:16 PM11/4/02

to

Eugene Mayevski wrote:

> Ok, how would one explain that Winsock crashes when the socket is closed
> in another thread from where it was created? There is no "program
> architecture" at all. Just one socket transferring the data and another
> which closes the socket.

1) Were you using Winsock2?

2) Are you sure no other thread could possibly have been using that
socket when you closed it?

If your answer to both questions is "yes", you are seeing something
nobody else has reported. If your answer to either question is "no", why
not?!

DS

Phil Frisbie, Jr.

unread,

Nov 4, 2002, 7:38:12 PM11/4/02

to

Eugene Mayevski wrote:
>
> Phil Frisbie, Jr. wrote:
>
> >>I think this is platform- and implementation-specific. On Windows we
> >>tried to close the socket from other thread and this caused large
> >>problems. So I know it's not thread-safe :)
> > That sounds more like bad program architecture :)
> > It is common to close a socket on one thread when there is another thread
>
> Ok, how would one explain that Winsock crashes when the socket is closed
> in another thread from where it was created?

What was the thread doing? Blocking on a Winsock call or something else?

> There is no "program
> architecture" at all.

ALL programs have an architecture; it is simply how you coded it.

> Just one socket transferring the data and another
> which closes the socket.
>
> I am not trying to defned my point of wview, but rather trying to find
> out the truth. We had terible Aaccess violations and other strange
> things after doing what I've described.

Can you code up a single source code sample that shows the problem? Feel free to
post it here or email it directly to me to look at.

> Sincerely yours,
> Eugene Mayevski

alex

unread,

Nov 5, 2002, 1:38:00 AM11/5/02

to

David Schwartz <dav...@webmaster.com> wrote in message
news:3DC6B629...@webmaster.com...

If "They are guaranteed thread-safe in Winsock2", how about Winsock below 2?
Will Winsock1 be thread-safe?

In Winsock2, will send and receive in two thread be a problem?

Thanks for your help!

Alun Jones

unread,

Nov 5, 2002, 6:44:42 AM11/5/02

to

In article <3dc6d79e$0$46603$e4fe...@news.xs4all.nl>, Casper H.S. Dik

<Caspe...@Sun.COM> wrote:
>those who know me have no need of my name <not-a-rea...@usa.net> writes:
>>in comp.protocols.tcp-ip i read:
>
>>>If u say that sockets are not thread safe in windows,the how are they thrad
>>>safe in UNIX or Linux....just wanted to know
>
>>they aren't thread-safe anywhere. you need to serialize access yourself.
>
>They're thread-safe in Solaris.

"thread-safe" doesn't necessarily mean anything, and at the very least it
depends on what you're talking about.

There isn't a "thread-safe" implementation of sockets available on _any_
platform if what you're talking about is two threads sending TCP data on one
socket over the same brief time period. And, whether there was a proscription
against it or not, no sockets implementation is going to give you trouble if
you are using one thread to send and one thread to receive, otherwise that
implementation would be unusable for many applications.

So, yes, sockets are going to be thread safe for sane operations. No, sockets
can't make up for a lack of synchronisation in your own code, if the protocol
requires it (as TCP does). And those two statements will apply to every
sockets implementation - Windows, Unix, Linux, whatever. [Windows 3.1,
obviously, since it doesn't have threads, doesn't have to be thread safe]

Alun.
~~~~

[Please don't email posters, if a Usenet response is appropriate.]
--
Texas Imperial Software | Try WFTPD, the Windows FTP Server. Find us at
1602 Harvest Moon Place | http://www.wftpd.com or email al...@texis.com
Cedar Park TX 78613-1419 | VISA/MC accepted. NT-based sites, be sure to
Fax/Voice +1(512)258-9858 | read details of WFTPD Pro for XP/2000/NT.

Alun Jones

unread,

Nov 5, 2002, 6:44:43 AM11/5/02

to

In article <aq6o8f$1cq6$1...@news.dg.net.ua>, Eugene Mayevski

Look elsewhere for the causes of your access violations. What you have
described is a programming method that is used so often in Windows that if it
were as broken as you described, your newsreader wouldn't work long enough for
you to post your question.

When developing on Windows, it's tempting (and safe from ridicule) to blame
Windows for any and all crashes. It's important to remember, however, that
many people use Windows day in, day out, with no significant failures
whatever. If your application's crashing on you, and you're the developer of
that application, it's more than likely to be your fault.

Eugene Mayevski

unread,

Nov 5, 2002, 7:22:55 AM11/5/02

to

David Schwartz wrote:

> 2) Are you sure no other thread could possibly have been using that
> socket when you closed it?

Of course no. The socket is in blocking mode and is busy doing
something. I close the socket from other thread to cancel blocking call.

Sincerely yours,
Eugene Mayevski

Eugene Mayevski

unread,

Nov 5, 2002, 7:26:07 AM11/5/02

to

Alun Jones wrote:

>>Ok, how would one explain that Winsock crashes when the socket is closed
>>in another thread from where it was created? There is no "program

>>architecture" at all. Just one thread transferring the data and another
>>which closes the socket.

> Look elsewhere for the causes of your access violations. What you have
> described is a programming method that is used so often in Windows that if it
> were as broken as you described, your newsreader wouldn't work long enough for
> you to post your question.

This was a sample project which created a secondary thread. That thread
did extensive data transfer in blocking mode and the main thread closed
the socket. There is nothing there that can crash. Tested on WinXP home.

Sincerely yours,
Eugene Mayevski

Casper H.S. Dik

unread,

Nov 5, 2002, 10:41:31 AM11/5/02

to

al...@texis.com (Alun Jones) writes:

>There isn't a "thread-safe" implementation of sockets available on _any_
>platform if what you're talking about is two threads sending TCP data on one
>socket over the same brief time period.

Solaris sockets are thread-safe under that definition.

I.e., all the data gets out, it gets out in the order specified by
the oepration and no duplkicate data gets out nor do you get
OS crashes or application crashes. There's only a partial order of
data defined.

Whether these are useful semantics is unclear, but that's true for
many "thread safe" calls, including multiple writers to the same
fd.

Phil Frisbie, Jr.

unread,

Nov 5, 2002, 11:43:47 AM11/5/02

to

If you are so sure it is not your own coding error then why not either post the
code or take me up on my offer to look at at?

David Schwartz

unread,

Nov 5, 2002, 12:57:28 PM11/5/02

to

Eugene Mayevski wrote:

> This was a sample project which created a secondary thread. That thread
> did extensive data transfer in blocking mode and the main thread closed
> the socket. There is nothing there that can crash. Tested on WinXP home.

1) Were you using Winsock2?

2) When one thread closed the socket, are you absolutely positive that
no other thread could possibly have used that socket in any way shape or
form?

If the answer to either of these is "no", then that's your problem.

DS

David Schwartz

unread,

Nov 5, 2002, 1:28:22 PM11/5/02

to

Eugene Mayevski wrote:

> This was a sample project which created a secondary thread. That thread
> did extensive data transfer in blocking mode and the main thread closed
> the socket. There is nothing there that can crash. Tested on WinXP home.

Was the secondary thread still doing a data transfer when you closed
the socket from another thread? This can cause problems on *any*
platform.

Do not believe anyone who tells you that the blocking call is
guaranteeed to exit with some particular error indication. It is not.
Destroying a shared resource while it is in use is always an error.

DS

Casper H.S. Dik

unread,

Nov 5, 2002, 2:47:37 PM11/5/02

to

David Schwartz <dav...@webmaster.com> writes:

> Do not believe anyone who tells you that the blocking call is
>guaranteeed to exit with some particular error indication. It is not.
>Destroying a shared resource while it is in use is always an error.

The operating system should handle such a condition gracefully;
older versions of Solaris did not and you really don't want processes
hanging around in such cases waiting for fds that went away.

But for the apst couple of years, Solaris has handled this gracefully
and threads using file descriptors that are closed get notified,
usually using the EBADF error return for whatever they were doing.

Of course, this usually is a coding error, but some operating system
vendors believe that their OS should continue to run even if the
application progammer makes a mistake.

David Schwartz

unread,

Nov 5, 2002, 3:11:27 PM11/5/02

to

"Casper H.S. Dik" wrote:

> David Schwartz <dav...@webmaster.com> writes:

> > Do not believe anyone who tells you that the blocking call is
> >guaranteeed to exit with some particular error indication. It is not.
> >Destroying a shared resource while it is in use is always an error.

> The operating system should handle such a condition gracefully;

I don't believe that this is possible.

> older versions of Solaris did not and you really don't want processes
> hanging around in such cases waiting for fds that went away.

> But for the apst couple of years, Solaris has handled this gracefully
> and threads using file descriptors that are closed get notified,
> usually using the EBADF error return for whatever they were doing.

Really? Even if another thread gets the CPU, open another file
descriptor, and gets the same fd before the thread that was blocked gets
a chance to run?

> Of course, this usually is a coding error, but some operating system
> vendors believe that their OS should continue to run even if the
> application progammer makes a mistake.

I don't know of any OS where the OS itself actually crashes. However,
crashing the application is totally acceptible. IMO, this is a bug as
serious as one thread 'free'ing a chunk of memory while another thread
is using it.

DS

Casper H.S. Dik

unread,

Nov 5, 2002, 3:30:35 PM11/5/02

to

David Schwartz <dav...@webmaster.com> writes:

> Really? Even if another thread gets the CPU, open another file
>descriptor, and gets the same fd before the thread that was blocked gets
>a chance to run?

The close of that fd will not finish until the the fd is cleaned up.

Of course, threads that rememebr that the fd was really some other fd will
now have problems. But the application will continue onwards.

Eugene Mayevski

unread,

Nov 5, 2002, 3:40:23 PM11/5/02

to

Phil Frisbie, Jr. wrote:

> If you are so sure it is not your own coding error then why not either post the
> code or take me up on my offer to look at at?

1) It's Delphi code
2) It's a small part of the large product not yet released.

I am sure you can create such a test yourself -- connect a thread, a
socket in that thread and start sending data (lots of data). And another
thread will close the socket. After application is closed, an Access
Violation in the range of addresses that belong to Winsock, happens.

Sincerely yours,
Eugene Mayevski

Eugene Mayevski

unread,

Nov 5, 2002, 3:42:06 PM11/5/02

to

David Schwartz wrote:

And this is what I had :). There's an idea that a blocking call can be
cancelled from another thread by closing this socket. Now I know it's
not true :) (the idea was not mine, but written in FAQ for Delphi
library called Indy, which is now part of Delphi).

Sincerely yours,
Eugene Mayevski

Alun Jones

unread,

Nov 5, 2002, 3:49:50 PM11/5/02

to

In article <3dc82a6b$0$46605$e4fe...@news.xs4all.nl>, Casper H.S. Dik

<Caspe...@Sun.COM> wrote:
>David Schwartz <dav...@webmaster.com> writes:
>
>> Really? Even if another thread gets the CPU, open another file
>>descriptor, and gets the same fd before the thread that was blocked gets
>>a chance to run?
>
>The close of that fd will not finish until the the fd is cleaned up.

Non sequitur. David didn't say that the thread that was blocked was in a read
or write operation on the fd, therefore, the close will finish, the thread
will unblock, and will try to act on an fd that has been closed and then
reopened.

>Of course, threads that rememebr that the fd was really some other fd will
>now have problems. But the application will continue onwards.

.. and will now have one thread communicating merrily on the wrong socket /
file.

Because descriptors get re-used - and in most OS's, reused quickly - if more
than one thread gets to hear of a descriptor to the same object, and the
object isn't keeping a reference count of who has a descriptor to it, there's
always a possibility - a likelihood, even - that the descriptor will later
refer to an object that is not the right one.

Closing a socket in order to free up a potentially blocked socket operation
leads to code that leads to the above problem. The solution, if there is one,
is to ensure that no operation can possibly be in progress on a socket handle
at the time that you close it. You can do this by being single-threaded, or
you can do this by ensuring that your socket operations never block, or by
ensuring that every blocked operation is "timed out" at some point, and that
there is some signalling mechanism that prevents you from re-blocking on a
socket that's "about to close". The latter is significantately
complexificated.

Alun Jones

unread,

Nov 5, 2002, 3:49:48 PM11/5/02

to

In article <cHJx9.46662$Mb3.2...@bgtnsc04-news.ops.worldnet.att.net>,

"alex" <alex...@hotmail.com> wrote:
>If "They are guaranteed thread-safe in Winsock2", how about Winsock below 2?
>Will Winsock1 be thread-safe?
>
>In Winsock2, will send and receive in two thread be a problem?

Winsock 1 was specified before Windows had threads. Hence, it didn't consider
whether implementations should, or should not, be thread-safe.

As 32-bit threaded Windows came along, it was obvious that people would put
Winsock code into multi-threaded programs, but it was left up to the Winsock
implementation to choose whether to be thread-safe, or to put the onus back to
the application programmer. However, I'd be surprised if you were to find one
that didn't work for multi-threaded programs as you'd expect it to.

Alun Jones

unread,

Nov 5, 2002, 3:49:57 PM11/5/02

to

In article <aq9adc$1lu3$1...@news.dg.net.ua>, Eugene Mayevski

<maye...@eldos.org> wrote:
>Phil Frisbie, Jr. wrote:
>
>> If you are so sure it is not your own coding error then why not either post
> the
>> code or take me up on my offer to look at at?
>
>1) It's Delphi code

That's right, you're the only programmer here who understands Delphi. Oh,
sorry, I forgot the <sarcasm> tag.

>2) It's a small part of the large product not yet released.

As Phil says, either put up or go debug. If you can create a smallish sample
that demonstrates the problem, then the problem exists. Otherwise, it's with
your ancillary code.

>I am sure you can create such a test yourself -- connect a thread, a
>socket in that thread and start sending data (lots of data). And another
>thread will close the socket. After application is closed, an Access
>Violation in the range of addresses that belong to Winsock, happens.

As noted before, if this were the sum total of the flaw, then many other
applications than yours would be fragile beyond belief. Show us a sample and
we might believe you - or we might very well show you the error.

Either way, you win.

Right now, however, you're asking us to write a test for a behaviour that we
don't believe exists, when we've seen previous samples that would seem to
refute your suggestions. You won't persuade anyone to write code to
demonstrate something they believe is impossible!

Alun Jones

unread,

Nov 5, 2002, 3:49:54 PM11/5/02

to

In article <aq9agj$1lu3$2...@news.dg.net.ua>, Eugene Mayevski

<maye...@eldos.org> wrote:
>And this is what I had :). There's an idea that a blocking call can be
>cancelled from another thread by closing this socket. Now I know it's
>not true :) (the idea was not mine, but written in FAQ for Delphi
>library called Indy, which is now part of Delphi).

The idea of cancelling a blocking call by closing the socket is great for any
system where you can be sure that the socket's descriptor will not be
re-allocated before all references to the descriptor (in the application _and_
in the system) are dropped. Many client applications fit this profile (or can
be shoe-horned into it).

Blocking sockets may seem the obvious partner for multi-threaded code, but as
David has pointed out, there are some subtleties that can lead to migraines.

Alun Jones

unread,

Nov 5, 2002, 3:49:46 PM11/5/02

to

In article <3dc7e6ab$0$46602$e4fe...@news.xs4all.nl>, Casper H.S. Dik

<Caspe...@Sun.COM> wrote:
>al...@texis.com (Alun Jones) writes:
>
>>There isn't a "thread-safe" implementation of sockets available on _any_
>>platform if what you're talking about is two threads sending TCP data on one
>>socket over the same brief time period.
>
>Solaris sockets are thread-safe under that definition.
>
>I.e., all the data gets out, it gets out in the order specified by
>the oepration and no duplkicate data gets out nor do you get
>OS crashes or application crashes. There's only a partial order of
>data defined.

Let me get this straight:

If you have two threads, accessing the same TCP socket, and they both call
send(), the operating system will block the second thread that called send()
until it has queued up _all_ of the data from the first thread that called
send()?

>Whether these are useful semantics is unclear, but that's true for
>many "thread safe" calls, including multiple writers to the same
>fd.

It's thoroughly unclear that these are useful semantics, indeed, and one does
have to ask whether the question "are sockets thread-safe" wouldn't best be
answered with "if you need to know that, then your code is flawed".

Phil Frisbie, Jr.

unread,

Nov 5, 2002, 4:38:39 PM11/5/02

to

Eugene Mayevski wrote:
>
> Phil Frisbie, Jr. wrote:
>
> > If you are so sure it is not your own coding error then why not either post the
> > code or take me up on my offer to look at at?
>
> 1) It's Delphi code

Pascal was the first language I took in college, after I taught myself BASIC and
assembly in High School.

> 2) It's a small part of the large product not yet released.

A code sample should be fine.

> I am sure you can create such a test yourself -- connect a thread, a
> socket in that thread and start sending data (lots of data). And another
> thread will close the socket. After application is closed, an Access
> Violation in the range of addresses that belong to Winsock, happens.

Are you using Winsock API directly, or a wrapper?

> Sincerely yours,
> Eugene Mayevski

Phil Frisbie, Jr.

unread,

Nov 5, 2002, 4:51:11 PM11/5/02

to

David Schwartz wrote:
>
> Eugene Mayevski wrote:
>
> > This was a sample project which created a secondary thread. That thread
> > did extensive data transfer in blocking mode and the main thread closed
> > the socket. There is nothing there that can crash. Tested on WinXP home.
>
> Was the secondary thread still doing a data transfer when you closed
> the socket from another thread? This can cause problems on *any*
> platform.

Yes, that is part of the program architecture I tried to explain. That operation
is not normal, but closing the socket at other times on another thread is common
and trouble free.

> Do not believe anyone who tells you that the blocking call is
> guaranteeed to exit with some particular error indication. It is not.
> Destroying a shared resource while it is in use is always an error.

It depends on how you are using it (your program architecture).

For example, it is common to use multiple threads in a server application. You
might have one thread blocking for a few ms on select() waiting for received
data on multiple sockets, and another that sleeps for several seconds at a time
looking for stale connections to close. When that second thread closes a socket,
select() will unblock and return ENOTSOCK. There is no real error since you just
closed the socket and the handle is no longer valid.

Of course, there are many other ways to do this. You could simply flag the
socket to be closed and then close it later in your select() loop. Or you could
simply perform the stale connection check in that select() loop.

Casper H.S. Dik

unread,

Nov 6, 2002, 3:39:20 AM11/6/02

to

al...@texis.com (Alun Jones) writes:

>Let me get this straight:

>If you have two threads, accessing the same TCP socket, and they both call
>send(), the operating system will block the second thread that called send()
>until it has queued up _all_ of the data from the first thread that called
>send()?

That's not what I said; I said there's partial ordering defined.
(i.e., the data of the first wrte and second write are send in the proper
order, but no order is defined for the data of first and second write;
they can be interleaved.

For STREAMs, there's a guarantee that upto PIPE_MAX bytes are written
in one write.

>It's thoroughly unclear that these are useful semantics, indeed, and one does
>have to ask whether the question "are sockets thread-safe" wouldn't best be
>answered with "if you need to know that, then your code is flawed".

That might be one answer; but the idea of a monitor thread that closes
hanging connections has some appeal.

Fernando Gont

unread,

Nov 6, 2002, 5:56:20 AM11/6/02

to

On 06 Nov 2002 08:39:20 GMT, Casper H.S. Dik <Caspe...@Sun.COM>
wrote:

>>If you have two threads, accessing the same TCP socket, and they both call

>>send(), the operating system will block the second thread that called send()
>>until it has queued up _all_ of the data from the first thread that called
>>send()?
>That's not what I said; I said there's partial ordering defined.
>(i.e., the data of the first wrte and second write are send in the proper
>order, but no order is defined for the data of first and second write;
>they can be interleaved.

Sorry?

--
Fernando Gont
e-mail: fern...@ANTISPAM.gont.com.ar

[To send a personal reply, please remove the ANTISPAM tag]

Alun Jones

unread,

Nov 6, 2002, 7:23:54 PM11/6/02

to

In article <3dc8f1f...@News.CIS.DFN.DE>, arie...@softhome.net (Fernando

Gont) wrote:
>On 06 Nov 2002 08:39:20 GMT, Casper H.S. Dik <Caspe...@Sun.COM>
>wrote:
>
>>>If you have two threads, accessing the same TCP socket, and they both call
>>>send(), the operating system will block the second thread that called send()
>>>until it has queued up _all_ of the data from the first thread that called
>>>send()?
>>That's not what I said; I said there's partial ordering defined.
>>(i.e., the data of the first wrte and second write are send in the proper
>>order, but no order is defined for the data of first and second write;
>>they can be interleaved.
>
>Sorry?

I think he means that sockets is thread-safe unless you try to do something
that might cause them to be not so. :-)

Actually, he's illustrated exactly my point. It depends on what you mean by
"thread safe". If you're sending into one TCP socket on two threads, the data
may get interleaved - the socket stack is, therefore, not thread-safe. But
then, if you're going to do stupid stunts like that, then you're not looking
for any sane kind of thread-safety.

Notwithstanding Casper's obvious alliance to one stack, his definition of
"thread-safe" applies just as well to Windows as it does to his favoured
stack. It might not be the OP's definition of "thread-safe", though.

Alun Jones

unread,

Nov 6, 2002, 8:32:24 PM11/6/02

to

In article <3dc8d538$0$46608$e4fe...@news.xs4all.nl>, Casper H.S. Dik

<Caspe...@Sun.COM> wrote:
>al...@texis.com (Alun Jones) writes:
>
>>Let me get this straight:
>
>>If you have two threads, accessing the same TCP socket, and they both call
>>send(), the operating system will block the second thread that called send()
>>until it has queued up _all_ of the data from the first thread that called
>>send()?
>
>That's not what I said; I said there's partial ordering defined.

Actually, what you said was:

In article <3dc7e6ab$0$46602$e4fe...@news.xs4all.nl>, Casper H.S. Dik
<Caspe...@Sun.COM> wrote:
>al...@texis.com (Alun Jones) writes:
>
>>There isn't a "thread-safe" implementation of sockets available on _any_
>>platform if what you're talking about is two threads sending TCP data on one
>>socket over the same brief time period.
>
>Solaris sockets are thread-safe under that definition.

That sounds very much like what I posted. If you don't block, then data is
interleaved, and the sockets aren't "thread-safe".

>>It's thoroughly unclear that these are useful semantics, indeed, and one does
>>have to ask whether the question "are sockets thread-safe" wouldn't best be
>>answered with "if you need to know that, then your code is flawed".
>
>That might be one answer; but the idea of a monitor thread that closes
>hanging connections has some appeal.

There's plenty of things that have some appeal, but don't work.

Chris Pearson

unread,

Nov 7, 2002, 12:58:01 AM11/7/02

to

> Now I know it's not true :)

The closesocket doc says, "If this is the last reference to an underlying
socket, the associated naming information and queued data are discarded. Any
pending blocking, asynchronous calls issued by any thread in this process
are canceled without posting any notification messages."

The notions of reference counting, pending blocking calls and multiple
threads are all mentioned.

And how could you gracefully shutdown a (simplistic) server that used a
thread-per-blocking-socket architecture if you couldn't close the blocked
sockets from the SCM thread?

Run your program under a real debugger (one that can show the machine code,
registers, the call stack, and system symbols), wait for your crash, then
look at the stack -- your likely to find yourself staring at familiar code
:-)

-- CCP

"Eugene Mayevski" <maye...@eldos.org> wrote in message
news:aq9agj$1lu3$2...@news.dg.net.ua...

David Schwartz

unread,

Nov 7, 2002, 1:48:25 PM11/7/02

to

Chris Pearson wrote:

> The closesocket doc says, "If this is the last reference to an underlying
> socket, the associated naming information and queued data are discarded. Any
> pending blocking, asynchronous calls issued by any thread in this process
> are canceled without posting any notification messages."

It's not entirely clear what cancellation of a blocking socket call
entails.

> The notions of reference counting, pending blocking calls and multiple
> threads are all mentioned.

Yup.

> And how could you gracefully shutdown a (simplistic) server that used a
> thread-per-blocking-socket architecture if you couldn't close the blocked
> sockets from the SCM thread?

You can't. That you can is a myth. It's simply impossible. Consider a
thread doing this:

// some code
// a blocking socket call

There is no way another thread could possibly tell if the thread above
is in 'some code' or 'a blocking socket call'. There is no atomic 'set
flag and make blocking socket call' instruction.

So, if you close the socket in another thread just before the thread
above enters 'a blocking socket call', you cannot ensure that another
thread won't create a socket and get the same socket decriptor as the
thread above was about to perform a blocking socket call on. Now this
thread performs an operation on the *wrong* socket.

As I've said many times, it's an error to close a socket in one thread
while another thread is or might be using it. There is no way to do it
sanely.

> Run your program under a real debugger (one that can show the machine code,
> registers, the call stack, and system symbols), wait for your crash, then
> look at the stack -- your likely to find yourself staring at familiar code
> :-)

That's true.

DS

Vernon Schryver

unread,

Nov 7, 2002, 2:58:41 PM11/7/02

to

In article <3DCAB579...@webmaster.com>,
David Schwartz <dav...@webmaster.com> wrote:

>> The closesocket doc says, "If this is the last reference to an underlying
>> socket, the associated naming information and queued data are discarded. Any
>> pending blocking, asynchronous calls issued by any thread in this process
>> are canceled without posting any notification messages."
>
> It's not entirely clear what cancellation of a blocking socket call
>entails.

If you've ever looked at operating systems, it is entirely clear for
reasonable operating systems. Of necessity, the kernel or library code
that actually fiddles with a socket is single-threaded wherever the basic
common state of the socket is changed. You must protect the socket's
fundamental state from races. Whether the socket is "open" is its most
basic state. Part of that single-threading is almost certain to involve
every thread fiddling with a socket doing an equivalent to this dance:

socket_system_call()
{
if (validate_stuff() == failure)
return failure to caller;
loop: if (lock_socket() == failure
return failure to caller;
if (validate_more_stuff() == failure) {
unlock_socket();
return failure to caller;
}
case (try_to_do_stuff_like_connect_close_send_or_receive) {
failed: unlock_socket(); return failure to caller;
done: unlock_socket(); return success to caller;
partly_done:
unlock_socket();
block_on_something_or_other();
goto loop;
}
}

"Cancellation of a blocking socket call" is generally no more or less
than waking up any threads stuck in block_on_something(). If the
socket has disappeared out from under the thread, the normal validation
of the socket specifier (e.g. UNIX-style FD int or Winsock pointer)
will notice and return with some sort of error or other indication to
the application.

In other words, what do you suppose must happen when the entire
application is terminated by the operating system while some threads
are waiting for the socket? Somehow the system must shoot down all
of the threads, without leaving any orphan locks. The system's solution
to the problem of killing all threads in an application is likely to
imply a reasonable set of reasults for what must happen when one thread
closes a file descriptor or socket or otherwise trashes a resource in
use by another thread.

> ...

>> And how could you gracefully shutdown a (simplistic) server that used a
>> thread-per-blocking-socket architecture if you couldn't close the blocked
>> sockets from the SCM thread?
>
> You can't. That you can is a myth. It's simply impossible.

I think closing a socket out from under a blocked thread is nasty, but
"simply impossible" seems true only if you don't know how things work.

> Consider a
>thread doing this:
>
>// some code
>// a blocking socket call
>
> There is no way another thread could possibly tell if the thread above
>is in 'some code' or 'a blocking socket call'. There is no atomic 'set
>flag and make blocking socket call' instruction.

Actually, if you know how things work, you know that statement is usually
false. For example, if you can stop thread context switching
(e.g. pthread_setschedparam() in some POSIX threading), you might be
able to find other threads's stacks and see if they are in socket
system call wrappers. That's other than clean, but it is certainly
not "impossible," as reasonable debuggers demonstrate every time you
use them on threaded applications using blocking sockets.

> So, if you close the socket in another thread just before the thread
>above enters 'a blocking socket call', you cannot ensure that another
>thread won't create a socket and get the same socket decriptor as the
>thread above was about to perform a blocking socket call on. Now this
>thread performs an operation on the *wrong* socket.

That's true or false, depending on the the system. It's more likely
to be true than false, but there's no law of nature or programming
that mandates that socket specifiers (UNIX FDs or Winsock things) be
recycled. If FDs are not recycled, perhaps to minimize multi-CPU
locking in NUMA multi-processors, it's false. When it's true, it is
more likely to cause hard to find bugs elsewhere, such as when you
close resources hidden in libraries. Even when it is true, problems
due to such bugs are likely to be rare, because context switches are
unlikely to occur where they'll cause problems. A thread is more
likely to be blocked on a socket that is closed out from under it than
to be between uses of it.

> As I've said many times, it's an error to close a socket in one thread
>while another thread is or might be using it. There is no way to do it
>sanely.

> ...

If by "error" you mean "something any competent programmer spends a lot
of effort and significant CPU overhead to avoid," then you are right.
If you mean "impossible to make it work reliably," then you are wrong.

Vernon Schryver v...@rhyolite.com

Casper H.S. Dik

unread,

Nov 7, 2002, 4:02:12 PM11/7/02

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:

>That's true or false, depending on the the system. It's more likely
>to be true than false, but there's no law of nature or programming
>that mandates that socket specifiers (UNIX FDs or Winsock things) be
>recycled. If FDs are not recycled, perhaps to minimize multi-CPU
>locking in NUMA multi-processors, it's false. When it's true, it is
>more likely to cause hard to find bugs elsewhere, such as when you
>close resources hidden in libraries. Even when it is true, problems
>due to such bugs are likely to be rare, because context switches are
>unlikely to occur where they'll cause problems. A thread is more
>likely to be blocked on a socket that is closed out from under it than
>to be between uses of it.

I'm pretty sure some part of some standard requires the kernel to
return the lowest fd available; so yes, they have to be reused and
yues, they are reused quickly.

(As you can get at the lowest fd in O(log(n)) time, this really isn't
much of a problem)

Vernon Schryver

unread,

Nov 7, 2002, 4:13:49 PM11/7/02

to

In article <3dcad4d3$0$46599$e4fe...@news.xs4all.nl>,

Casper H.S. Dik <Caspe...@Sun.COM> wrote:

> ...

>I'm pretty sure some part of some standard requires the kernel to
>return the lowest fd available; so yes, they have to be reused and
>yues, they are reused quickly.

I'm interested in which standard says that.
I've seen plenty of code that assumes that the numerically smallest
available FD will be used on the next open() The major complications
guaranteeing it or even defining the notion of "the lowest FD available"
in a non-trivial multi-processor makes me reluctant to write such code
without a handy chapter and verse.

I found a reference in http://www.standardml.org/Basis/posix-io.html
for dup() yielding the "lowest one available" but no definitions
of either "lowest" or "available."
There are similar vague words in
http://www.opengroup.org/onlinepubs/007904975/functions/open.html

Does close() make an FD "available" after it has been returned by open()?
How soon is must that availability happen in a multi-process with weak
consistency? Contemplate races for "available" in an NUMA multi-processor.
It should be possible or at least desirable for a process with no open
files have open() to race on two CPUs, with one open() ultimately
returning -1 and other returning 1.

Vernon Schryver v...@rhyolite.com

James Carlson

unread,

Nov 8, 2002, 8:07:59 AM11/8/02

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:
> Does close() make an FD "available" after it has been returned by open()?

If you're into standards-based humor, consider the implications of
EINTR returned by close(2). If you get that error, is the fd still
open or has it been closed? The standards don't say. If it's still
open and you don't close it, you have an fd leak. If it's closed and
you try to reclose it, you can (and because of the 'lowest available'
logic, probably *will*) nuke the result of a simultaneous open by some
other thread.

You can't win. :-/

--
James Carlson, Solaris Networking <james.d...@east.sun.com>
SUN Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677

Eugene A. Zharkov

unread,

Nov 8, 2002, 10:05:04 AM11/8/02

to

David Schwartz <dav...@webmaster.com> wrote in message news:<3DCAB579...@webmaster.com>...

> As I've said many times, it's an error to close a socket in one thread
> while another thread is or might be using it. There is no way to do it
> sanely.

I completely agree with David.

But I have a question that has been bugging me for quite some
time and that is somewhat related to the original question.
The question is "why there is no call to abort a blocking
socket call ?". For example, is there a problem in requiring
"shutdown" in one thread to cancel blocking calls in another ?
I know that "shutdown" actually works on many platforms.
But that behavior does not seem to be documented, so I can
not be sure that it will keep working in future versions.
On Windows it does not work (at least it does not unblock
"select"). It looks like there was WSACancelBlockingCall
in Winsocks, which have been removed, I guess because it was
not possible to implement it correctly. But I think that
"shutdown" CAN be implemented. I.e., it should most likely
be a call after which normal operation on the socket is
no longer possible. That is just fine for many applications
that I worked on. How do people cancel blocking select calls ?
Using timeouts or an extra socket ? Or thread cancellation ?
All these methods seem overly complicated to me. So, back
to my question in the beginning of this message: why do you
think that there is no call to "abort" a blocking call in
another thread ? And if it is possible to implement one,
may be the "user" community can somehow encourage "operating
system developers" to implement/document such a call ?

Thanks,
Eugene

David Schwartz

unread,

Nov 8, 2002, 2:31:58 PM11/8/02

to

My news server lost a few posts, but I want to respond to this comment
by Vernon Shryver:

"Cancellation of a blocking socket call" is generally no more or less
than waking up any threads stuck in block_on_something(). If the
socket has disappeared out from under the thread, the normal validation
of the socket specifier (e.g. UNIX-style FD int or Winsock pointer)
will notice and return with some sort of error or other indication to
the application.

Your last sentence is horribly wrong and makes my point. The normal
validation of the socket specifier will fail if and only if the
identifier hasn't been reused by that point. Normally, there is no way
you could control that. So again, the actual behavior is undefined and
you could well wind up continuing the blocking call on the wrong
connection.

Please do not repeat the myth that you can close a socket in one thread
while it's in use in another thread and have any idea what's going to
happen.

DS

David Schwartz

unread,

Nov 8, 2002, 2:35:35 PM11/8/02

to

Again, I don't have the actual post, but I'm responding to Phil Frisbee
who writes:

>> Do not believe anyone who tells you that the blocking call is
>> guaranteeed to exit with some particular error indication. It is not.
>> Destroying a shared resource while it is in use is always an error.

>It depends on how you are using it (your program architecture).

>For example, it is common to use multiple threads in a server application. You
>might have one thread blocking for a few ms on select() waiting for received
>data on multiple sockets, and another that sleeps for several seconds at a time
>looking for stale connections to close. When that second thread closes a socket,
>select() will unblock and return ENOTSOCK. There is no real error since you just
>closed the socket and the handle is no longer valid.

It *might* unblock and return ENOTSOCK. Or before it notices, a new
socket might get the same descriptor. It isn't guaranteed to do anything
in particular and what it actually does will depend upon what else is
going on at the time.

>Of course, there are many other ways to do this. You could simply flag the
>socket to be closed and then close it later in your select() loop. Or you could
>simply perform the stale connection check in that select() loop.

You had better do something other than close the descriptor in one
thread while it's still in use in another thread. Otherwise, there will
be some combination of conditions under which your application will fail
horribly.

One should always use code that is guaranteed to work rather than code
that happens to work until the conditions change.

DS

Phil Frisbie, Jr.

unread,

Nov 8, 2002, 5:11:52 PM11/8/02

to

David Schwartz wrote:
>
> Again, I don't have the actual post, but I'm responding to Phil Frisbee
> who writes:
>
> >> Do not believe anyone who tells you that the blocking call is
> >> guaranteeed to exit with some particular error indication. It is not.
> >> Destroying a shared resource while it is in use is always an error.
>
> >It depends on how you are using it (your program architecture).
>
> >For example, it is common to use multiple threads in a server application. You
> >might have one thread blocking for a few ms on select() waiting for received
> >data on multiple sockets, and another that sleeps for several seconds at a time
> >looking for stale connections to close. When that second thread closes a socket,
> >select() will unblock and return ENOTSOCK. There is no real error since you just
> >closed the socket and the handle is no longer valid.
>
> It *might* unblock and return ENOTSOCK. Or before it notices, a new
> socket might get the same descriptor. It isn't guaranteed to do anything
> in particular and what it actually does will depend upon what else is
> going on at the time.

That is a good point, but I personally ONLY do this when shutting down the
application, so no more sockets are being accepted/created.

I agree that this is not something that should be done as a normal action.

> >Of course, there are many other ways to do this. You could simply flag the
> >socket to be closed and then close it later in your select() loop. Or you could
> >simply perform the stale connection check in that select() loop.
>
> You had better do something other than close the descriptor in one
> thread while it's still in use in another thread. Otherwise, there will
> be some combination of conditions under which your application will fail
> horribly.

Agreed. Perhaps I read more into the original post than was said. My remarks
were aimed at CLOSING down a multithreaded application. I never meant to imply
that is was a good or acceptable architecture to close a socket from another
thread just because it may be convenient programming.

> One should always use code that is guaranteed to work rather than code
> that happens to work until the conditions change.

I second that....

Vernon Schryver

unread,

Nov 8, 2002, 6:15:51 PM11/8/02

to

In article <3DCC112E...@webmaster.com>,
David Schwartz <dav...@webmaster.com> wrote:

>"Cancellation of a blocking socket call" is generally no more or less
>than waking up any threads stuck in block_on_something(). If the
>socket has disappeared out from under the thread, the normal validation
>of the socket specifier (e.g. UNIX-style FD int or Winsock pointer)
>will notice and return with some sort of error or other indication to
>the application.
>
> Your last sentence is horribly wrong and makes my point. The normal
>validation of the socket specifier will fail if and only if the
>identifier hasn't been reused by that point. Normally, there is no way
>you could control that. So again, the actual behavior is undefined and
>you could well wind up continuing the blocking call on the wrong
>connection.

You are again projecting your worse than naive misunderstanding of
how systems work to reach a false conclusion. Your misunderstanding
is worse than naive because you so consistently (not just in this
thread) refuse to consider the possibility that you are merely guessing.

Yes, if you naively implement the psuedo-code I offered, your scenario
might be possible. However, no one competent would write exactly what
I wrote; that's why it's called "psuedo-code." When you release and
re-acquire a lock on a resource, you always ensure that the state of
the resource is sufficently unchanged. For a file descriptor or
socket, that state is not only whether the FD or socket is still open,
but also that it specifies the same disk file or network connection.

A common trick in such code is to put a "generation number" in the
TSB, file control block, U-area, or whatever. Each time the resource
is (re)allocated, the generation number is changed (usually just
incremented). If the worst case re-allocation rate of the thing times
the worst case duration of a lock is less than the range of the
generation number, then you can be confident that the right thing will
happen and your scenario of using the wrong FD can't happen.

> Please do not repeat the myth that you can close a socket in one thread
>while it's in use in another thread and have any idea what's going to
>happen.

Please stop make authoritative statements beyond your competence.

As I wrote before, code that closes sockets out from under threads
should be strenuously avoided, but you are simply wrong to claim that
it is always wrong.

Vernon Schryver v...@rhyolite.com

Eugene A Zharkov

unread,

Nov 8, 2002, 9:18:28 PM11/8/02

to

Vernon Schryver wrote:
>
> Yes, if you naively implement the psuedo-code I offered, your scenario
> might be possible. However, no one competent would write exactly what
> I wrote; that's why it's called "psuedo-code." When you release and
> re-acquire a lock on a resource, you always ensure that the state of
> the resource is sufficently unchanged. For a file descriptor or
> socket, that state is not only whether the FD or socket is still open,
> but also that it specifies the same disk file or network connection.
>
> A common trick in such code is to put a "generation number" in the
> TSB, file control block, U-area, or whatever. Each time the resource
> is (re)allocated, the generation number is changed (usually just
> incremented). If the worst case re-allocation rate of the thing times
> the worst case duration of a lock is less than the range of the
> generation number, then you can be confident that the right thing will
> happen and your scenario of using the wrong FD can't happen.

Since no one seem to want to answer my question about using
shutdown to unblock select, I am going to waste some more of your
time with one more posting ...

What "lock" are you talking about ?

Suppose that we have 4 threads:
1 - very CPU intensive
2 - is about to make a blocking call
3 - is thinking that #2 already made the call and calls "close" to unblock #2
4 - is about to make an open call

A few seconds later, #3 gets lucky enough, manages to steal a few
CPU cycles from #1, and closes the socket.

Half an hour later, #4 gets lucky and opens something else with
the same FD.

One hour later, #2 finally makes the blocking system call.

Is there supposed to be some kind of lock held for an hour ?

Eugene

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Vernon Schryver

unread,

Nov 8, 2002, 10:07:55 PM11/8/02

to

In article <3DCC7074...@vista-control.com>,

Eugene A Zharkov <zha...@vista-control.com> wrote:

> ...
>What "lock" are you talking about ?

I'm talking about the locks that the operating system uses to ensure
that threads do not trick the operating system into messing up the FD
by doing two or more things to the FD at once.

> ...

>One hour later, #2 finally makes the blocking system call.
>
>Is there supposed to be some kind of lock held for an hour ?

None of the talk I've seen has been about that sort of thing.
Instead, it has concerned #2 being stuck in the blocking system
call when some other thread closes the socket.

Of course the scenario you describe is broken and won't work. It
does not matter whether your four threads are stomping on a file
descriptor or some other resource. Your scenario is equally broken
whether the common value is FD 3, the number of threads, or the
answer the whole program is trying to compute.

The blocking socket case is special. In most other cases, nothing
thread #2 does with the shared resource will take long, so that all
4 threads can do the obvious, use a mutual exclusion mechanism (e.g.
"mutex" or "lock") to ensure that nothing bad happens. With blocking
sockets (and often with non-blocking sockets), individual operations
can take a long time, sometimes even an hour. You avoid holding locks
for milliseconds, not to mention an hour.

If one thread is stuck in a blocking call, how does another thread awaken
it to tell it something like "we need to close this connection?" Perhaps
the user has typed control-C or clicked a "CANCEL" button. You might use
a non-blocking socket, select or the Winsock alternatives, the whole pile
of implied code, and the bugs in any pile of code. On the other hand,
if your code always checks system call results and if you trust the
people who wrote your operating system to done the obvious and right,
you might just close the socket and let your error-handling sort it
out. If you are using UNIX instead of Windows, you might use signals
and hope against your experience that EINTR works right, but when you
get finished, that code looks a lot like just closing the socket out
from under a blocking call.

Based on painful experience with Winsock2 (not to mention other arenas),
I'm don't have much trust in code from Redmond. However, I do have
enough to bet that if the person with the crash would look, he would
find errors from Winsock calls being ignored. I bet that the crash
is not happening in blocking call where the socket is closed, but in
a later call.

Vernon Schryver v...@rhyolite.com

David Schwartz

unread,

Nov 8, 2002, 10:09:38 PM11/8/02

to

When I was a much younger and less experienced programmer, and
especially when I first started writing multithreaded code, it was fun
to see what you could get away with. If the program worked when I tested
it and could do what I needed it to do right then, I was completely
satisfied. If I could leave out a lock or a test and convince myself my
code performed better and still seemed to work, well, that was the
coolest thing in the world.

Now I am an older programmer, a wiser programmer, a more experienced
programmer, and most importantly, a professional programmer. I've
learned a painful lesson that things that 'just happen to work'
sometimes happen to not work. When you compile for another platform, a
different library version, a newer compiler version with better
optimizations, it breaks.

And it doesn't always break the first time you test it under test
conditions. It breaks when you really need it to work. When you're
demonstrating it on national television.

There are a bunch of myths going around that fall into the 'happen to
work when I tried it' category. They aren't true, and they don't work.
This is true no matter who says they work or what web page you read it
on or even how experienced the person who says it is.

Closing a socket in one thread to abort a blocking operation in another
thread is one of those things. It doesn't work. Period.

Some people find it endlessly entertaining to concoct crazy scenarios
in which they thing it work where you suspend all the threads in a
process and inspect their stacks. Some people say it will work so long
as other threads don't do specific things that they won't make their
threads do. Some people still like to play complex games with their code
until it happens to work.

This is mental masturbation. It never results in good, reliable code.
Worse, it always results in unmaintainable code. When someone opens a
socket in a code path that didn't open a socket before, suddenly the
code that happened to work before happens to not work anymore. Or a new
OS or library version comes out that opens a socket behind the scenes in
a function that didn't open one before. Now your code doesn't work, and
you have no idea why.

Just say no to using volatile to avoid a lock. Just say no to closing a
handle in one thread while another thread is (or may be) using it. Just
say no to assuming things are guaranteed to work because they happen to.

DS

David Schwartz

unread,

Nov 8, 2002, 10:13:09 PM11/8/02

to

Vernon Schryver wrote:

> If one thread is stuck in a blocking call, how does another thread awaken
> it to tell it something like "we need to close this connection?"

Programming 101, you don't make blocking socket calls unless you want
to block until the operation complets.

> Perhaps
> the user has typed control-C or clicked a "CANCEL" button. You might use
> a non-blocking socket, select or the Winsock alternatives, the whole pile
> of implied code, and the bugs in any pile of code.

At least in this case there only might be bugs. Your way 100%
guarantees them.

> On the other hand,
> if your code always checks system call results and if you trust the
> people who wrote your operating system to done the obvious and right,
> you might just close the socket and let your error-handling sort it
> out.

Alright smart guy, how do you ensure that you're actually in the
blocking socket call rather than about to make the blocking socket call?
I've already raised this exact point. Why do you keep advising people to
do things that it has already been clearly explained don't work?!

DS

David Schwartz

unread,

Nov 8, 2002, 10:26:01 PM11/8/02

to

David Schwartz wrote:
>
> Vernon Schryver wrote:
>
> > If one thread is stuck in a blocking call, how does another thread awaken
> > it to tell it something like "we need to close this connection?"
>
> Programming 101, you don't make blocking socket calls unless you want
> to block until the operation complets.

Oh, one other thing. You could 'shutdown' the connection from another
thread.

DS

Dan Swartzendruber

unread,

Nov 8, 2002, 10:47:59 PM11/8/02

to

In article <3DCC7C72...@webmaster.com>, dav...@webmaster.com
says...

*snip*

> And it doesn't always break the first time you test it under test
> conditions. It breaks when you really need it to work. When you're
> demonstrating it on national television.

*snip*

> This is mental masturbation. It never results in good, reliable code.
> Worse, it always results in unmaintainable code. When someone opens a
> socket in a code path that didn't open a socket before, suddenly the
> code that happened to work before happens to not work anymore. Or a new
> OS or library version comes out that opens a socket behind the scenes in
> a function that didn't open one before. Now your code doesn't work, and
> you have no idea why.
>
> Just say no to using volatile to avoid a lock. Just say no to closing a
> handle in one thread while another thread is (or may be) using it. Just
> say no to assuming things are guaranteed to work because they happen to.

Amen, brother. I maintain OS and comms software for a living. I've
seen the most amazingly bad coding practices causing bugs that never got
hit for 10 years or longer. Blind luck. I've also seen race conditions
that would only be hit if another thread/process hit you between two
specific instructions. All to be clever and avoid proper locking.

Vernon Schryver

unread,

Nov 8, 2002, 10:57:35 PM11/8/02

to

In article <3DCC7D45...@webmaster.com>,
David Schwartz <dav...@webmaster.com> wrote:

> ...

>> Perhaps
>> the user has typed control-C or clicked a "CANCEL" button. You might use
>> a non-blocking socket, select or the Winsock alternatives, the whole pile
>> of implied code, and the bugs in any pile of code.
>
> At least in this case there only might be bugs. Your way 100%
>guarantees them.

That is wrong in more than one aspect. One is what I would choose
as "my way."

>> On the other hand,
>> if your code always checks system call results and if you trust the
>> people who wrote your operating system to done the obvious and right,
>> you might just close the socket and let your error-handling sort it
>> out.
>
> Alright smart guy, how do you ensure that you're actually in the
>blocking socket call rather than about to make the blocking socket call?

You've already responded to an article in which I pointed out one
method.

Here is a second. If the thread that does the blocking socket call
is the only thread that calls socket() or open(), then it does not
matter whether another thread closes the FD. This restriction on the
FD closing thread is plausible for the examples I gave where a thread
or signal handler is dealing with a "CANCEL" button or SIGINT.

>I've already raised this exact point. Why do you keep advising people to
>do things that it has already been clearly explained don't work?!

Why do you keep saying things that have been clearly explained to
be obviously wrong?!

Do or did you sometimes sign yourself "skybuck"?
I'll assume so and make the necessary adjustments.

Vernon Schryver v...@rhyolite.com

Dan Lanciani

unread,

Nov 8, 2002, 11:13:08 PM11/8/02

to

In article <af0b5c61.0211...@posting.google.com>, ezha...@yahoo.com (Eugene A. Zharkov) writes:
| David Schwartz <dav...@webmaster.com> wrote in message news:<3DCAB579...@webmaster.com>...
|
| > As I've said many times, it's an error to close a socket in one thread
| > while another thread is or might be using it. There is no way to do it
| > sanely.
|
| I completely agree with David.
|
| But I have a question that has been bugging me for quite some
| time and that is somewhat related to the original question.
| The question is "why there is no call to abort a blocking
| socket call ?".

I think that you have answered your own question quite well. :)

| For example, is there a problem in requiring
| "shutdown" in one thread to cancel blocking calls in another ?

No, there is no problem and this was always understood to work in real
sockets API implementations. Even before there were multiple threads
per process this worked to unblock another process. Although I can't
recall seeing it explicitly documented, everyone who really understood
sockets knew that you had to implement it that way or people would be
unhappy.

| I know that "shutdown" actually works on many platforms.

Pretty much anything other than Windows.

| But that behavior does not seem to be documented, so I can
| not be sure that it will keep working in future versions.

I suspect it will keep working on everything except Windows.

| On Windows it does not work (at least it does not unblock
| "select").

Of course it works on my Winsock stack, but that's the exception. :) You
have to understand the history here. The folks who specified Winsock were
not really sockets API fans. Some of the companies involved had their own
sockets-like libraries for their proprietary stacks and others really didn't
care one way or the other. What emerged as Winsock was a compromise of those
existing sockets-like APIs. The intent was to have something that looked enough
like sockets to appeal to developers without making too much work for the
vendors. In the process a few subtle features that made sockets "complete"
were lost and a few gratuitous changes were made. (A few not-so-subtle features
like multi-homed support were lost too.)

The ability to use shutdown() to safely unblock was one of the subtle features
that was lost. At some point somebody noticed that there had to be some way
to get things unstuck if only at program termination and they pushed to make
close "work right" without really explaining what "right" means. Those of us
writing stacks had little choice but to try to make it "work right" at least
to the extent of not destroying the internal state of the stack. Since it
was documented to "work right," application writers used it without realizing
the contortions they should go through to use it "right." After all, it was
intuitively obvious that there had to be some way to unblock their calls, and
the documentation says that close is the way...

| It looks like there was WSACancelBlockingCall
| in Winsocks, which have been removed, I guess because it was
| not possible to implement it correctly.

That call causes a lot of confusion because it was meant only to cancel
a blocking call from the same thread that made the call in the first place.
This could happen only from within a blocking hook.

| But I think that
| "shutdown" CAN be implemented.

Yes, of course it can. But we are stuck without in in Winsock. And Winsock
continues to diverge from sockets, with restrictions on mixing protocols in
a select call making select less and less useful anyway. Besides, we have
been told that using select is lame...

Dan Lanciani
ddl@danlan.*com

Chris Pearson

unread,

Nov 9, 2002, 9:16:33 AM11/9/02

to

> So, if you close the socket in another thread just before the thread
> above enters 'a blocking socket call', you cannot ensure that another
> thread won't create a socket and get the same socket decriptor as the
> thread above was about to perform a blocking socket call on. Now this
> thread performs an operation on the *wrong* socket.

In a design where you have a main thread (i.e. the service control thread)
closing sockets for other threads, you first set a global signal (variable,
event, whatever) that tells other threads your shutting down, and thus not
to create sockets.

-- CCP

"David Schwartz" <dav...@webmaster.com> wrote in message
news:3DCAB579...@webmaster.com...

David Schwartz

unread,

Nov 9, 2002, 4:27:23 PM11/9/02

to

Vernon Schryver wrote:

> > Alright smart guy, how do you ensure that you're actually in the
> >blocking socket call rather than about to make the blocking socket call?

> You've already responded to an article in which I pointed out one
> method.

Okay, you're the kind of person who likes to dispense really bad advice
and then back it up with insanely arcane defenses.

> Here is a second. If the thread that does the blocking socket call
> is the only thread that calls socket() or open(), then it does not
> matter whether another thread closes the FD. This restriction on the
> FD closing thread is plausible for the examples I gave where a thread
> or signal handler is dealing with a "CANCEL" button or SIGINT.

There is no way you can ensure this. How do you know that a library
call won't create a thread or open a socket. Again, mental masturbation.
Your techniques can't be guaranteed to work and you do the readers of
this newsgroup a disservice by suggesting them.

> >I've already raised this exact point. Why do you keep advising people to
> >do things that it has already been clearly explained don't work?!
>
> Why do you keep saying things that have been clearly explained to
> be obviously wrong?!

I'm right, and you're wrong. You can't know what other threads might be
running or might open sockets. Does 'gethostbyname' open a socket?
Maybe. Maybe not. Does 'WSASend' create a new thread in the context of
your process? Maybe, maybe not.

> Do or did you sometimes sign yourself "skybuck"?
> I'll assume so and make the necessary adjustments.

Huh?!

DS

Dan Swartzendruber

unread,

Nov 9, 2002, 4:53:38 PM11/9/02

to

In article <3DCD7DBB...@webmaster.com>, dav...@webmaster.com
says...

> > Do or did you sometimes sign yourself "skybuck"?
> > I'll assume so and make the necessary adjustments.
>
> Huh?!

there was a thread recently from "skybuck flying" about tcp/ip
performance over his cable modem link (as i recall). vernon apparently
has a low opinion of him (e.g. this is an ad-hominem).

Chris Pearson

unread,

Nov 10, 2002, 5:10:48 AM11/10/02

to

> code that closes sockets out from under threads should be
> strenuously avoided

I maintain that, as far as the OS is concerned, it is perfectly acceptable
(and frankly rather unremarkable) to close a blocked socket from another
thread. Doing so releases any and all threads that are blocked on the
socket, in exactly the same manner as would any other asynchronous network
event (packet arrival, buffer availability, connection termination, etc.).

When you close the socket, a blocked connect() returns WSAENOTSOCK (or
possibly WSAECONNABORTED if partially connected), while a blocked recv()
returns WSAECONNABORTED.

As far as I can see, the only substantive issue mentioned in this entire
mail thread is a race condition at the application level, in which it is
possible (given sufficient blundering :-) to create a new socket having the
same handle as one just closed, before a work thread quits using the old
handle.

This is strictly an application design problem, and is easily solved through
basic thread synchronization. For instance, if you expect the work thread
to die after you've closed its socket handle, you simply wait 'till the
thread handle is signaled.

Closing a blocked socket from another thread may not be the cat's meow in
software design, but it's hardly a sin either :-) Particularly in the case
of app termination, I think it's a perfectly acceptable use.

FWIW, I've written a test app that endlessly spawns threads and closes their
connected sockets. It confirms both that Winsock does reallocate closed
socket handles and that this problem is easily dealt with. I'll be glad to
send the code to anyone who's interested.

-- CCP

"Vernon Schryver" <v...@calcite.rhyolite.com> wrote in message
news:aqhgj7$4hf$1...@calcite.rhyolite.com...

Stephen J. Bevan

unread,

Nov 10, 2002, 11:39:21 AM11/10/02

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:
> That's true or false, depending on the the system. It's more likely
> to be true than false, but there's no law of nature or programming
> that mandates that socket specifiers (UNIX FDs or Winsock things) be
> recycled.

Historically Unix has always returned the lowest numbered unused file
descriptor on a call to dup(2). Hence the common idiom after fork(2)
of doing something like the following to bind one end of a pipe(2) to
the standard output :-

close(1);
dup(fd[1]);
close(fd[1]);

where fd[1] is the writer side of a pair returned by pipe(2) just
before the fork(2). With the introduction of dup2(2), the semantics
required by dup(2) are not really needed but neverthless they are
still standard. Since a socket specifier under Unix is just a
file-descriptor then it plays buy the same rules with respect to
dup(2) i.e. always returning the lowest numbered unused one.

Vernon Schryver

unread,

Nov 10, 2002, 12:20:45 PM11/10/02

to

In article <m3isz55...@dino.dnsalias.com>,

Stephen J. Bevan <ste...@dino.dnsalias.com> wrote:

>> That's true or false, depending on the the system. It's more likely
>> to be true than false, but there's no law of nature or programming
>> that mandates that socket specifiers (UNIX FDs or Winsock things) be
>> recycled.
>
>Historically Unix has always returned the lowest numbered unused file
>descriptor on a call to dup(2).

Historically UNIX did not run on big multi-processors where it is very
expensive to synchronize all of the threads in a process. The WIntel
multi-processor architecture where things like the LOCK prefix can
work has been obsolete for more than 15 years. The speed of light,
not to mention practical bus (or switch fabric) speeds make close CPU
synchronization impossible except in trivial or less than trival
multi-processors.

> Hence the common idiom after fork(2)
>of doing something like the following to bind one end of a pipe(2) to
>the standard output :-
>
> close(1);
> dup(fd[1]);
> close(fd[1]);

> ...

Many common idioms from the old days are no long tolerable. That
one as well as

close(0);
open("/dev/null",O_RDWR);
dup2(0,1);
dup2(0,2);

strike me as only a little less bad than most uses of gets(), particularly
in threaded code.

> ... Since a socket specifier under Unix is just a

>file-descriptor then it plays buy the same rules with respect to
>dup(2) i.e. always returning the lowest numbered unused one.

In the absense of standards (e.g. POSIX) words saying as much, I think
those old idioms must be avoided. I've looked but not found any
standards words that say the following must set i==0:

close(0)
i = open("/dev/null,O_RDWR);

The standards words I've found say that i must be the lowest "available"
file descriptor, but do not define "available." Can a big multi-processor
delay making FD 0 "available" for a few 100,000 CPU cycles until the
inter-CPU message saying that one of the threads has closed FD 0 has
been heard and acknowledged by all CPUs?

Could the compiler and the kernel in that fragment notice that the
application does not care about the exit status of close() and let
the TCP FIN handshake continue in parallel with the open() of /dev/null?
In that case, FD 0 might not be "avaliable" several minutes.

Vernon Schryver v...@rhyolite.com

Stephen J. Bevan

unread,

Nov 10, 2002, 5:11:16 PM11/10/02

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:
> Historically UNIX did not run on big multi-processors where it is very
> expensive to synchronize all of the threads in a process.

I agree, much of the Unix API was not designed with threads in mind.
However, for backwards compatability much of it also has not changed
with the introduction of threads. Consequently in the absense of
threads the close&dup semantics must be honoured. Once threads are
in picture I have no idea what requirements and so whether they can be
different from the non-threaded case.

> Many common idioms from the old days are no long tolerable. That
> one as well as
>
> close(0);
> open("/dev/null",O_RDWR);
> dup2(0,1);
> dup2(0,2);
>
> strike me as only a little less bad than most uses of gets(), particularly
> in threaded code.

Unlike the close&dup example I gave, there is no requirement that the
close&open sequence work on a conforming POSIX system. It does happen
to work on various Unix systems (e.g. I remember accidentally making
use of it under SunOS some years ago) but it is not portable.

> In the absense of standards (e.g. POSIX) words saying as much, I think
> those old idioms must be avoided. I've looked but not found any
> standards words that say the following must set i==0:
>
> close(0)
> i = open("/dev/null,O_RDWR);

That's because IMHO there is no such requirement. Some flavours of
Unix do support this (e.g. SunOS), but POSIX requirement is only about
dup(2), not about open(2). The requirement for dup(2) is described at
<http://www.opengroup.org/onlinepubs/007904975/toc.htm> which is (now)
defined in terms of the behaviour of fcntl(2) with the F_DUPD argument
<http://www.opengroup.org/onlinepubs/007904975/toc.htm> where it notes :-

F_DUPFD
Return a new file descriptor which shall be the lowest
numbered available (that is, not already open) file
descriptor greater than or equal to the third argument, arg,
taken as an integer of type int. The new file descriptor
shall refer to the same open file description as the
original file descriptor, and shall share any locks.

> The standards words I've found say that i must be the lowest "available"
> file descriptor, but do not define "available."

The above wording for F_DUPFD is not sufficiently precise to answer
your question. However, as I noted the dup&close behaviour has been
around a long time. The oldest description I personally have is from
Advanced UNIX Programming, Marc J. Rochkind, Prentice Hall 1985 where
on page 129 the close&dup behaviour is explained with just the
semantics I described. The book was written targetting Bell Labs v7,
Sys III, Sys IV, BSD 4.2BSD, and Xenix. Similar descriptions and
examples are given in Stevens' Unix Network Programming from 1992, his
Advanced Programming in the Unix Networking Environment from 1994 and
Robbins&Robbin's Practical Unix Programming from 1996. If the
behaviour was to change then all these books would be wrong and code
based on them would fail.

[ questions about what a compiler and kernel can do on a
multi-processor snipped ]

As I noted, I don't know what the requirements are with respect to
close&dup in the presence of threads. In their absence, then the
file-descriptors must be recycled in a specific way, which is the only
point I wanted to clarify when I responded to you writing :-

Stephen J. Bevan

unread,

Nov 11, 2002, 2:05:30 AM11/11/02

to

ste...@dino.dnsalias.com (Stephen J. Bevan) writes:
> > In the absense of standards (e.g. POSIX) words saying as much, I think
> > those old idioms must be avoided. I've looked but not found any
> > standards words that say the following must set i==0:
> >
> > close(0)
> > i = open("/dev/null,O_RDWR);
>
> That's because IMHO there is no such requirement.

Well actually that deserves a qualifier ...

In v7 (1979) the manual pages for open(2) and dup(2) don't mention
anything about either returning the lowest numbered one available.
However, there are various examples in the v7 source of the close&dup
idiom being used implying that dup(2) returning the lowest available
descriptor is the expected behaviour. I didn't spot any examples of
close&open in the v7 source but I could easily have missed them so I
don't know if it is expected to work despite not being documented (it
certainly would work given the implementation of close&open in v7).

In SysIII (1982) the manual page for open(2) does not mention
anything about returning the lowest available descriptor. However,
the dup(2) manual page includes the sentence :-

The file descriptor returned is the lowest one available.

So clearly someone got around to documenting the close&dup behaviour
that various bits of code had been relying on for years. Nobody saw
fit to update the open(2) page with a similar comment. Perhaps this
was a simple omission or perhaps because nobody was using that idiom
and hence considered it required behaviour? Similarly while Rochkind
covers the close&dup idom I couldn't find any mention of the
close&open idiom.

Fast forward to the present day and the POSIX standard does require
that open(2) return the lowest numbered available descriptor
(http://www.opengroup.org/onlinepubs/007904975/functions/open.html).
Presumably at some point someone decided that there was enough code
that depended on this that the behaviour should be considered
standard.

Vernon Schryver

unread,

Nov 11, 2002, 2:22:07 AM11/11/02

to

In article <m34rao6...@dino.dnsalias.com>,

Stephen J. Bevan <ste...@dino.dnsalias.com> wrote:

> ...

>Fast forward to the present day and the POSIX standard does require
>that open(2) return the lowest numbered available descriptor
>(http://www.opengroup.org/onlinepubs/007904975/functions/open.html).

> ...

You'd hope that open() would not return an "unavailable" descriptor,
so the requirement that open() must yield the lowest available
desscriptor doesn't necessarily say whether {close(0);i=open("/dev/null"}
must set i=0 when descriptor 0 was not "available" at the start of
the block.

Again, does close(fd) make fd "available" when close() returns or can
that "availability" be delayed a few or even a few bazillion CPU
cycles? Could close(fd) return immediately if the application ignores
the close() exit status, letting the TCP FIN handshake continue and
the descriptor remain "unavailable" until the distant future?

Vernon Schryver v...@rhyolite.com

Stephen J. Bevan

unread,

Nov 11, 2002, 11:55:34 AM11/11/02

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:
> You'd hope that open() would not return an "unavailable" descriptor,
> so the requirement that open() must yield the lowest available
> desscriptor doesn't necessarily say whether {close(0);i=open("/dev/null"}
> must set i=0 when descriptor 0 was not "available" at the start of
> the block.

In the exact case above I'd argue it does require i=0, just as it
would if you changed it to close&dup.

> Again, does close(fd) make fd "available" when close() returns or can
> that "availability" be delayed a few or even a few bazillion CPU
> cycles?

Given the expection of close&dup and by implication close&open then
the close(2) call can take a few bazillion CPU cycles if necessary but
when it returns (a non-error status) it must have made the descriptor
available for further use. To do otherwise would break a lot of code.

> Could close(fd) return immediately if the application ignores
> the close() exit status, letting the TCP FIN handshake continue and
> the descriptor remain "unavailable" until the distant future?

close(2) can only return 3 errors, one of which is due to providing a
bad file descriptor. Therefore I'm not sure there is much lattitude
for the system to decide that because the error result is ignored it
can do something different. One possibility perhaps would be to have
close(2) return EINTR in this case and if the user didn't test for it
and re-try then it would be compliant not to make the descriptor
available for further use, though this would result in a
file-descriptor leak in the program. If however, EINTR is tested for
and close(2) is re-tried then that closes that loophole.

Casper H.S. Dik

unread,

Nov 11, 2002, 12:08:43 PM11/11/02

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:

>Again, does close(fd) make fd "available" when close() returns or can
>that "availability" be delayed a few or even a few bazillion CPU
>cycles? Could close(fd) return immediately if the application ignores
>the close() exit status, letting the TCP FIN handshake continue and
>the descriptor remain "unavailable" until the distant future?

Well, UNIX98 close(2) says:

The close() function will deallocate the file descriptor indicated by
fildes. To deallocate means to make the file descriptor available for
return by subsequent calls to open() or other functions that allocate
file descriptors

I don't think that that leaves much wiggle room.

In other words:

close(0);
dup(X);

most always return 0 (or -1) in a single threaded application.
(Because of dup's guarantee of returning the lowest numbered fd)

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Vernon Schryver

unread,

Nov 11, 2002, 12:48:59 PM11/11/02

to

In article <3dcfe41b$0$46611$e4fe...@news.xs4all.nl>,
Casper H.S. Dik <Caspe...@Sun.COM> wrote:

> ...

>Well, UNIX98 close(2) says:
>
> The close() function will deallocate the file descriptor indicated by
> fildes. To deallocate means to make the file descriptor available for
> return by subsequent calls to open() or other functions that allocate
> file descriptors
>
>I don't think that that leaves much wiggle room.

> ...

Yes, I agree that resolves the issue. Thanks.

Vernon Schryver v...@rhyolite.com

JohnC

unread,

Nov 26, 2002, 6:05:57 PM11/26/02

to

Hello,

I am developing a network appliance that must differentiate between
operation in standalone mode - direct connection to a laptop/pc and
operation in a traditional network via the same RJ45 port.

Is there a method/protocol available.

Thanks
John C

Chris Pearson

unread,

Nov 27, 2002, 2:02:20 AM11/27/02

to

I don't know if there's a specific API (are you running WinCE?),
but if not, I think you could deduce standalone mode:

- the appliance and PC will be on the same subnet (get netmask
with SIO_GET_INTERFACE_LIST)

- you may have a self-assigned IP address (APIPA) 169.254.0.1 -
169.254.255.254 (but you might also have a static IP, so this
isn't conclusive)

- there will either be no default gateway (unless PC is running
ICS), or the PC will be the default gateway (unlikely, only
if PC is running a server OS)

- there will be no DNS server addresses configured (assuming
appliance supports DNS)

On WinCE, I think you can get the last two items from the
registry.

-- CCP

"JohnC" <no...@home.com> wrote in message
news:p7TE9.2650$fk5.2...@news0.telusplanet.net...