can connect() with AF

Felix Palmen

unread,

Jun 20, 2020, 5:00:03 AM6/20/20

to

Are there any systems and/or circumstances that could cause a connect()
to a UNIX socket to block?

Background: I'm writing a simple single-threaded network service with an
event loop built around pselect(), so it should never block on anything
else than the central pselect(). It handles both TCP and UNIX
connections, and for TCP, I use O_NONBLOCK before connect() and wait for
the socket to become writable. For UNIX, I did some testing on FreeBSD
and found that connect() always returns immediately -- if a process
called listen() on the socket, it indicates success (even if the
listener never calls accept()), otherwise it fails. My question now is
whether I can rely on that behavior and keep the synchronous connect()
call for AF_UNIX sockets?

--
Dipl.-Inform. Felix Palmen <fe...@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

Richard Kettlewell

unread,

Jun 20, 2020, 5:52:04 AM6/20/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> Are there any systems and/or circumstances that could cause a connect()
> to a UNIX socket to block?

I’d expect that it block once the listen queue was full.

--
https://www.greenend.org.uk/rjk/

Felix Palmen

unread,

Jun 20, 2020, 6:22:03 AM6/20/20

to

* Richard Kettlewell <inv...@invalid.invalid>:

> fe...@palmen-it.de (Felix Palmen) writes:
>> Are there any systems and/or circumstances that could cause a connect()
>> to a UNIX socket to block?
>
> I’d expect that it block once the listen queue was full.

It doesn't do that on FreeBSD but fails immediately, tested with a
listen queue of 1 and a listener that never accept()s. So this could be
different on other systems?

Richard Kettlewell

unread,

Jun 20, 2020, 6:28:59 AM6/20/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Richard Kettlewell <inv...@invalid.invalid>:
>> fe...@palmen-it.de (Felix Palmen) writes:
>>> Are there any systems and/or circumstances that could cause a connect()
>>> to a UNIX socket to block?
>>
>> I’d expect that it block once the listen queue was full.
>
> It doesn't do that on FreeBSD but fails immediately, tested with a
> listen queue of 1 and a listener that never accept()s. So this could be
> different on other systems?

The Linux implementation has stuff that looks like it will block if the
queue is full. I don’t know about FreeBSD.

--
https://www.greenend.org.uk/rjk/

Felix Palmen

unread,

Jun 20, 2020, 7:30:03 AM6/20/20

to

* Richard Kettlewell <inv...@invalid.invalid>:
> fe...@palmen-it.de (Felix Palmen) writes:

>> * Richard Kettlewell <inv...@invalid.invalid>:

>>> I’d expect that it block once the listen queue was full.
>>
>> It doesn't do that on FreeBSD but fails immediately, tested with a
>> listen queue of 1 and a listener that never accept()s. So this could be
>> different on other systems?
>
> The Linux implementation has stuff that looks like it will block if the
> queue is full. I don’t know about FreeBSD.

You're right, did the same test on Linux now and it blocks, even
indefinitely. Well, seems I have to rework some code now…

Felix Palmen

unread,

Jun 20, 2020, 8:48:03 AM6/20/20

to

* Felix Palmen <fe...@palmen-it.de>:

> Are there any systems and/or circumstances that could cause a connect()
> to a UNIX socket to block?

I have a followup-question: can I at least rely on connect() to fail
immediately with errno != EINPROGRESS if there is *no* listener on the
UNIX socket (IOW, either a non-existing or a "stale" socket)?

This is, according to my tests, the case on both FreeBSD and Linux, but
I don't find it described in any manpages either.

James Kuyper

unread,

Jun 20, 2020, 11:34:58 AM6/20/20

to

On 6/20/20 8:47 AM, Felix Palmen wrote:
> * Felix Palmen <fe...@palmen-it.de>:
>> Are there any systems and/or circumstances that could cause a connect()
>> to a UNIX socket to block?
>
> I have a followup-question: can I at least rely on connect() to fail
> immediately with errno != EINPROGRESS if there is *no* listener on the
> UNIX socket (IOW, either a non-existing or a "stale" socket)?

The IEEE Std 1003.1-2017 description for connect() has only two uses of
the word "listen", and one of them is:

"[ECONNREFUSED]
The target address was not listening for connections or refused the
connection request."

Felix Palmen

unread,

Jun 20, 2020, 12:28:03 PM6/20/20

to

* James Kuyper <james...@alumni.caltech.edu>:

> The IEEE Std 1003.1-2017 description for connect() has only two uses of
> the word "listen", and one of them is:
>
> "[ECONNREFUSED]
> The target address was not listening for connections or refused the
> connection request."

Yes, so, again it depends on the implementation whether I get
ECONNREFUSED or EINPROGRESS when my socket is in O_NONBLOCK mode. Even
if there *is* a listener, I could still get EINPROGRESS according to
this spec. I just wonder whether this is a concern in practice.

Now that I know different systems behave differently when the listen
queue is full, can I at least rely on connect(), regardless of
O_NONBLOCK, to return immediately with ECONNREFUSED if there isn't a
listener? Cause I assume with local/UNIX sockets, this should be
possible and "sane" behavior, but I know assumptions are dangerous ;)

Or do I really have to go all the way and handle EINPROGRESS by waiting
some time for the socket to become writable in order to be sure there is
noone listening?

For context, this is about unlink()ing a stale socket on startup in case
the previous listener died.

Kaz Kylheku

unread,

Jun 20, 2020, 7:21:44 PM6/20/20

to

On 2020-06-20, Felix Palmen <fe...@palmen-it.de> wrote:
> Are there any systems and/or circumstances that could cause a connect()
> to a UNIX socket to block?
>
> Background: I'm writing a simple single-threaded network service with an
> event loop built around pselect(), so it should never block on anything
> else than the central pselect(). It handles both TCP and UNIX
> connections, and for TCP, I use O_NONBLOCK before connect() and wait for
> the socket to become writable. For UNIX, I did some testing on FreeBSD
> and found that connect() always returns immediately -- if a process
> called listen() on the socket, it indicates success (even if the
> listener never calls accept()), otherwise it fails. My question now is
> whether I can rely on that behavior and keep the synchronous connect()
> call for AF_UNIX sockets?

I wouldn't depend on it; it seems hokey.

What cases did you try on FreeBSD? What if listen(fd, 1) is called, and
fifteen clients call connect() without any accept(); do they all
succeed?

Another question is: will a non-blocking AF_UNIX socket give you an
error indication right from the connect call, if the connection cannot
be made?

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1

Felix Palmen

unread,

Jun 20, 2020, 10:14:04 PM6/20/20

to

* Kaz Kylheku <937-05...@kylheku.com>:

> What cases did you try on FreeBSD? What if listen(fd, 1) is called, and
> fifteen clients call connect() without any accept(); do they all
> succeed?

Well, that I tried, and the result is that two(!) calls succeed, the
third one fails -- on FreeBSD. On Linux, the third call blocks if the
socket is not in O_NONBLOCK mode.

> Another question is: will a non-blocking AF_UNIX socket give you an
> error indication right from the connect call, if the connection cannot
> be made?

That's exactly the question I still have. If there are systems where
this isn't the case, checking for a "stale" socket will be pretty
cumbersome…

James Kuyper

unread,

Jun 21, 2020, 1:23:38 AM6/21/20

to

On 6/20/20 12:27 PM, Felix Palmen wrote:
> * James Kuyper <james...@alumni.caltech.edu>:
>> The IEEE Std 1003.1-2017 description for connect() has only two uses of
>> the word "listen", and one of them is:
>>
>> "[ECONNREFUSED]
>> The target address was not listening for connections or refused the
>> connection request."
>
> Yes, so, again it depends on the implementation whether I get
> ECONNREFUSED or EINPROGRESS when my socket is in O_NONBLOCK mode. Even
> if there *is* a listener, I could still get EINPROGRESS according to
> this spec. I just wonder whether this is a concern in practice.

The description for EINPROGRESS says "O_NONBLOCK is set for the file
descriptor for the socket and the connection cannot be immediately
established; the connection shall be established asynchronously.". I
don't see how that "shall" could be satisfied if there is no listener. I
think that "in progress" is intended to mean that the connection will be
created, it's just not ready yet.

However, I'm not an expert, merely someone with access to the relevant
document providing his own unofficial guess as to it's meaning.

Rainer Weikusat

unread,

Jun 21, 2020, 11:41:57 AM6/21/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Kaz Kylheku <937-05...@kylheku.com>:

[...]

>> Another question is: will a non-blocking AF_UNIX socket give you an
>> error indication right from the connect call, if the connection cannot
>> be made?
>
> That's exactly the question I still have. If there are systems where
> this isn't the case, checking for a "stale" socket will be pretty
> cumbersome…

You can't really check for a stale socket, anyway: The process listening
on it could exit immediately after a check indicating some process was
still listening on it.

Per Hedeland

unread,

Jun 22, 2020, 7:07:06 AM6/22/20

to

In article <rcmqsn$trm$1...@dont-email.me> James Kuyper

Since, as far as I can see
(https://pubs.opengroup.org/onlinepubs/009695399/functions/connect.html),
this is part of the documentation for connect() in general, i.e. not
specific to AF_UNIX/AF_LOCAL, at the very least it uses the word
"established" in a meaning that IMHO does not agree with common
English usage.

At least for a connection to a non-local IP address, connect() on a
SOCK_STREAM socket with O_NONBLOCK set will *always* result in
EINPROGRESS, regardless of whether there is a listener or not - since
finding that out would require sending a packet to the remote
destination and a potentially indefinite wait for a reply packet,
which is very much at odds with "non-blocking".

Typical *nix man pages (FreeBSD, Linux) use the word "completed"
instead of "established", i.e. there is no implication that the
connection will be successful (the Linux page I'm looking at also goes
into details about how to "determine whether connect() completed
successfully [...] or unsuccessfully").

Bottom line, for the OP's question, this particular piece of
documentation can't be taken to mean that EINPROGRESS implies that
the connect() will eventually succeed.

--Per Hedeland

Felix Palmen

unread,

Jun 28, 2020, 4:30:04 AM6/28/20

to

* Rainer Weikusat <rwei...@talktalk.net>:

> You can't really check for a stale socket, anyway: The process listening
> on it could exit immediately after a check indicating some process was
> still listening on it.

Such a race isn't relevant to my usecase. My goal was only to allow a
startup of my service without manual intervention after a previous
instance crashed, or received a SIGKILL by accident, leaving the stale
socket in the filesystem. So in the case you describe, trying to start
the service will give an error message for the socket still being
listened on -- and the next attempt would succeed.

I now refactored my code, so connect() is only ever called on a socket
in O_NONBLOCK mode. My observation on both Linux and FreeBSD is that
this connect() still immediately returns with an error (errno !=
EINPROGRESS) when trying to connect to a Unix socket without a listener.
The only doubt left is that the specs would still allow an
implementation to return EINPROGRESS in that case and deliver the error
later through SO_ERROR. The only consequence would be that my code would
wrongly assume that the socket is still listened on and consequently
refuse to startup, which is exactly the same result as if I didn't add
any code for detecting a stale socket, so I guess this is fine for now.

Still I'm curious to know whether there are POSIX implementations that
would behave that way. The Linux manpage specifies that Unix domain
sockets never give you EINPROGRESS, but EAGAIN instead, without
explaining what exactly that should mean. The FreeBSD manpage doesn't
include such a thing...

Rainer Weikusat

unread,

Jun 28, 2020, 4:16:39 PM6/28/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Rainer Weikusat <rwei...@talktalk.net>:
>> You can't really check for a stale socket, anyway: The process listening
>> on it could exit immediately after a check indicating some process was
>> still listening on it.
>
> Such a race isn't relevant to my usecase. My goal was only to allow a
> startup of my service without manual intervention after a previous
> instance crashed, or received a SIGKILL by accident, leaving the stale
> socket in the filesystem. So in the case you describe, trying to start
> the service will give an error message for the socket still being
> listened on -- and the next attempt would succeed.

Or not. Assuming an infinite restart loop running in parallell with the
server process, one of these arbitrary many restart attempts would
eventually succeed. In the meantime, the loop would just be busy-waiting for
a resource.

For Linux, a solution could be to use an address in the so-called
"abstract namespace" for AF_UNIX socket (basically, zero byte followed
by a string of non-zero bytes) as these will vanish when the
corresponding process exists.

For a conventional name in the file system namespace, I would simply not
check for this, just do an unlink of the name prior to the bind. At
worst, this would result in an old server process clients couldn't reach
anymore and a new one which would server them instead.

Felix Palmen

unread,

Jun 28, 2020, 5:10:03 PM6/28/20

to

* Rainer Weikusat <rwei...@talktalk.net>:

> fe...@palmen-it.de (Felix Palmen) writes:
>> Such a race isn't relevant to my usecase. My goal was only to allow a
>> startup of my service without manual intervention after a previous
>> instance crashed, or received a SIGKILL by accident, leaving the stale
>> socket in the filesystem. So in the case you describe, trying to start
>> the service will give an error message for the socket still being
>> listened on -- and the next attempt would succeed.
>
> Or not. Assuming an infinite restart loop running in parallell with the
> server process, one of these arbitrary many restart attempts would
> eventually succeed. In the meantime, the loop would just be busy-waiting for
> a resource.

Could you elaborate on that? I suspect you misunderstood my idea, as
there are no loops involved -- just check on startup whether the socket
is "responsive", and if it is, abandon. But maybe I misunderstood what
you meant.

> For Linux, a solution could be to use an address in the so-called
> "abstract namespace" for AF_UNIX socket (basically, zero byte followed
> by a string of non-zero bytes) as these will vanish when the
> corresponding process exists.

Not a solution for me, as this is definitely non-portable.

> For a conventional name in the file system namespace, I would simply not
> check for this, just do an unlink of the name prior to the bind. At
> worst, this would result in an old server process clients couldn't reach
> anymore and a new one which would server them instead.

I don't think this is acceptable either. It will leave a service process
running, blissfully unaware that it can't handle anything anymore.
That's IMHO violating the POLA principle.

Anyways, after asking the same question on the FreeBSD forums, I got an
interesting answer, as somebody had a look on the FreeBSD documentation
concerning O_NONBLOCK on regular files. It's basically saying that
O_NONBLOCK has currently no effect for regular files, so operations will
block "briefly" in any case, but applications shouldn't rely on that,
because async semantics might be implemented later. This was an
eye-opener for me, cause I realized getting EINPROGRESS even on a local
socket without listener could make sense: For connecting to the socket,
filesystem permissions must be checked, and this *could* be done
asynchronously.

Therefore, I again refactored my code, it now tries connect(), checks
for EINPROGRESS, in that case does a select() with timeout, then checks
SO_ERROR with getsockopt(). Only if connect() failed immediately, or
select() ran into the timeout, or SO_ERROR is nonzero, the socket is
indeed unlinked. I think this is now safe on any POSIX-compliant system.

Linux is a special beast here. On Linux, when a process listens on the
socket, but the listen queue is full (for failure to accept() the
connections), a connect() on that socket blocks. But when connecting
with O_NONBLOCK, errno won't be EINPROGRESS but EAGAIN. If you select
the socket afterwards, it will be writable immediately, and SO_ERROR
will be zero. Well, this is strange, but doesn't break my code ;) It
seems Linux isn't fully POSIX-compliant anyways, the select() manpage
states the following troubling sentence in the BUGS section:

| On Linux, select() may report a socket file descriptor as "ready for
| reading", while nevertheless a subsequent read blocks.

Rainer Weikusat

unread,

Jun 29, 2020, 7:38:57 AM6/29/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Rainer Weikusat <rwei...@talktalk.net>:
>> fe...@palmen-it.de (Felix Palmen) writes:
>>> Such a race isn't relevant to my usecase. My goal was only to allow a
>>> startup of my service without manual intervention after a previous
>>> instance crashed, or received a SIGKILL by accident, leaving the stale
>>> socket in the filesystem. So in the case you describe, trying to start
>>> the service will give an error message for the socket still being
>>> listened on -- and the next attempt would succeed.
>>
>> Or not. Assuming an infinite restart loop running in parallell with the
>> server process, one of these arbitrary many restart attempts would
>> eventually succeed. In the meantime, the loop would just be busy-waiting for
>> a resource.
>
> Could you elaborate on that? I suspect you misunderstood my idea, as
> there are no loops involved -- just check on startup whether the socket
> is "responsive", and if it is, abandon. But maybe I misunderstood what
> you meant.

You wrote about a "next attempt" which would "succeed". I was just
pointing out that "next" is not well-defined here: Given an potentially
infinite number of attempts, one of them would eventually succeed.

>> For Linux, a solution could be to use an address in the so-called
>> "abstract namespace" for AF_UNIX socket (basically, zero byte followed
>> by a string of non-zero bytes) as these will vanish when the
>> corresponding process exists.
>
> Not a solution for me, as this is definitely non-portable.

BSD is the home of NIH, but they'll eventually implement useful features
if software simply relies on them :->.

>> For a conventional name in the file system namespace, I would simply not
>> check for this, just do an unlink of the name prior to the bind. At
>> worst, this would result in an old server process clients couldn't reach
>> anymore and a new one which would server them instead.
>
> I don't think this is acceptable either. It will leave a service process
> running, blissfully unaware that it can't handle anything anymore.
> That's IMHO violating the POLA principle.

It's a working solution and one which can be implemented in a single
line of code. And there's no more complicated approach which could
reliably prevent such a scenario. Eg, the same process could be started
twice, both connects would fail, both processe would create a socket
after unlinking the name but the second would unlink the name of the
first.

Something like file lock would be a more suitable approach to achieve
"process mutual exclusion" (I'm using the "connect to detemine if"
approach myself, albeit blocking and with a message exchange to verify
that the other process is really still alive --- I'm thinking about
changing that).

> Anyways, after asking the same question on the FreeBSD forums, I got an
> interesting answer, as somebody had a look on the FreeBSD documentation
> concerning O_NONBLOCK on regular files. It's basically saying that
> O_NONBLOCK has currently no effect for regular files, so operations will
> block "briefly" in any case, but applications shouldn't rely on that,
> because async semantics might be implemented later.

That's not particularly interesting: "Blocking" means "wait for an
indeterminate amount of time for an event which might never
happen". While this is technically also true for locally connected,
"slow" devices, it's traditionally assumed that this won't happen, as
it's a sign of broken hardware.

Hence "disk operations" never "block".

> This was an eye-opener for me, cause I realized getting EINPROGRESS even on a local
> socket without listener could make sense: For connecting to the socket,
> filesystem permissions must be checked, and this *could* be done
> asynchronously.

Relying on a particular AF_UNIX socket name filesystem permission
semantics is reportedly[*] not portable, either.

[*] ie, the Linux unix(7) manpage claims this.

[...]

> It seems Linux isn't fully POSIX-compliant anyways, the select() manpage
> states the following troubling sentence in the BUGS section:
>
> | On Linux, select() may report a socket file descriptor as "ready for
> | reading", while nevertheless a subsequent read blocks.

This used to refer to receiving UDP datagrams with checksums enabled and
checksumming being done in software: The checksum would be calculated
while the datagram was being copied to userspace, hence, a message
reported as available by select/ poll/ epoll/ etc could vanish while
being received.

This might not even be true anymore.

Felix Palmen

unread,

Jun 29, 2020, 8:50:03 AM6/29/20

to

* Rainer Weikusat <rwei...@talktalk.net>:

> You wrote about a "next attempt" which would "succeed". I was just
> pointing out that "next" is not well-defined here: Given an potentially
> infinite number of attempts, one of them would eventually succeed.

Sure, the first one that happens to run after the original listener went
away. So, what's the point?

> BSD is the home of NIH, but they'll eventually implement useful features
> if software simply relies on them :->.

We shouldn't start *that* discussion, when looking at how Linux
needlessly implemented a lot of cruft replacing well-known (and working)
stuff. But it's off-topic anyways ;)

> It's a working solution and one which can be implemented in a single
> line of code. And there's no more complicated approach which could
> reliably prevent such a scenario. Eg, the same process could be started
> twice, both connects would fail, both processe would create a socket
> after unlinking the name but the second would unlink the name of the
> first.

How exactly is having a "dead" process in an extremely unlikely
edge-case worse than having such a process every single time you happen
to start a second instance?

> Something like file lock would be a more suitable approach to achieve
> "process mutual exclusion" (I'm using the "connect to detemine if"
> approach myself, albeit blocking and with a message exchange to verify
> that the other process is really still alive --- I'm thinking about
> changing that).

I have this flavor as well in some multi-window GUI program, the socket
is used to transfer argv. The implementation is probably not blocking,
but uses Qt's abstractions. I would keep it. Sure, these things are not
100% race free, but *if* they fail, the result is just as bad as if you
hadn't done anything.

Locking might be an idea, but gives more complexity and portability
concerns.

> Hence "disk operations" never "block".

With portability in mind, it's relevant what behavior is within the
bounds of the specs.

> Relying on a particular AF_UNIX socket name filesystem permission
> semantics is reportedly[*] not portable, either.

Indeed, and I wasn't telling anything about relying on this. But
checking the permissions is something that happens in practice on many
systems, and it *could* be implemented with async semantics.

> This used to refer to receiving UDP datagrams with checksums enabled and
> checksumming being done in software: The checksum would be calculated
> while the datagram was being copied to userspace, hence, a message
> reported as available by select/ poll/ epoll/ etc could vanish while
> being received.
>
> This might not even be true anymore.

Well, if it is true, it's a serious bug. If it's solved meanwhile, I
would clearly change the manpage accordingly to include which version
solves it. So, I have doubts, but just maybe someone forgot to update
documentation.

Rainer Weikusat

unread,

Jun 29, 2020, 11:09:17 AM6/29/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Rainer Weikusat <rwei...@talktalk.net>:

[...]

>> It's a working solution and one which can be implemented in a single
>> line of code. And there's no more complicated approach which could
>> reliably prevent such a scenario. Eg, the same process could be started
>> twice, both connects would fail, both processe would create a socket
>> after unlinking the name but the second would unlink the name of the
>> first.
>
> How exactly is having a "dead" process in an extremely unlikely
> edge-case worse than having such a process every single time you happen
> to start a second instance?

What is or isn't considered to be "an extremely unlikely edge case" here
is very a matter of opinion. The point I was trying to make is that
there is no correct, algorithmic solution to this particular problem.

If it's expected that such a server is only started on system boot or
manually for a specific reason, simply unlinking the socket is - in my
opinion - a perfectly workable approach: It will ensure that a process
which was started to perform a certain task will actually perform the
task. If it was started in error, someone will need to sort this out
manually.

The only reason why I'm doing something more complicated is because of
staggered Debian package updates which may end up starting something
more than once if several packages needed to start and stop the same
thing during the update.

[...]

>> Hence "disk operations" never "block".
>
> With portability in mind, it's relevant what behavior is within the
> bounds of the specs.

With portability in mind, IOW, "make stuff work", it's revelant to pay
attention to actual system behaviour and actual system behaviours is
that "blocking" means "waiting an indeterminate time for an event which
may never happen". This includes networked and local IPC and doesn't
include "operations on regular files" despite processes may have to wait
for "disk operations" to complete.

[...]

>> This used to refer to receiving UDP datagrams with checksums enabled and
>> checksumming being done in software: The checksum would be calculated
>> while the datagram was being copied to userspace, hence, a message
>> reported as available by select/ poll/ epoll/ etc could vanish while
>> being received.
>>
>> This might not even be true anymore.
>
> Well, if it is true, it's a serious bug.

I've just checked this in the 5.1.21 sources (just a version I'm using)
and this is definitely still true: UDP checksumming is being done as
part of the copy by a function called from udp_recvmsg. If the checksum
calculation fails, the current message will be dropped and an attempt is
made to retrieve the next message which may block or fail with EAGAIN.

Felix Palmen

unread,

Jun 29, 2020, 11:42:03 AM6/29/20

to

* Rainer Weikusat <rwei...@talktalk.net>:

> With portability in mind, IOW, "make stuff work", it's revelant to pay
> attention to actual system behaviour and actual system behaviours is
> that "blocking" means "waiting an indeterminate time for an event which
> may never happen". This includes networked and local IPC and doesn't
> include "operations on regular files" despite processes may have to wait
> for "disk operations" to complete.

That's not the definition of blocking. Blocking is any preemption of a
task that issues an I/O operation, only rescheduling it when the
operation finished. It doesn't matter how long it takes. Linux obviously
doesn't implement O_NONBLOCK for regular files, but still contains this
note in open(2):

| Note that this flag has no effect for regular files and block
| devices; that is, I/O operations will (briefly) block when
| device activity is required, regardless of whether O_NONBLOCK
| is set. Since O_NONBLOCK semantics might eventually be
| implemented, applications should not depend upon blocking
| behavior when specifying this flag for regular files and block
| devices.

For the same reason, you should assume that a connect() with O_NONBLOCK
could give EINPROGRESS, even when connecting to a local socket.
Probably, a system working that way exists somewhere.

For checking a local socket, connect()ing in normal blocking mode isn't
a good solution either, as on Linux, this will block indefinitely if the
listener's listen queue is full. Inconsequently, with O_NONBLOCK, you'll
just get EAGAIN without any possibility to use select() to wait for the
result.

Anyways, I agree there is no 100% race-free solution to check whether a
local socket is still active -- this would require an atomic "check,
unlink, bind and listen" call. I still think it adds value, and the way
I do it now (handle a possible EINPROGRESS with a select() with short
timeout and then check the SO_ERROR socket option) should work correctly
on any system, as far as this is possible.

Rainer Weikusat

unread,

Jun 29, 2020, 12:32:30 PM6/29/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Rainer Weikusat <rwei...@talktalk.net>:
>> With portability in mind, IOW, "make stuff work", it's revelant to pay
>> attention to actual system behaviour and actual system behaviours is
>> that "blocking" means "waiting an indeterminate time for an event which
>> may never happen". This includes networked and local IPC and doesn't
>> include "operations on regular files" despite processes may have to wait
>> for "disk operations" to complete.
>
> That's not the definition of blocking. Blocking is any preemption of a
> task that issues an I/O operation, only rescheduling it when the
> operation finished. It doesn't matter how long it takes. Linux obviously
> doesn't implement O_NONBLOCK for regular files, but still contains this
> note in open(2):

This is a UNIX design decision dating back to the 1970s and not
something "Linux doesn't implement": Processes can sleep interruptibly
or non-interruptibly in the kernel. The former is what is correctly
referred to as "blocking" -- waiting for input, originally from a
terminal or a pipe, which might never arrive. Because of this, such
waits can be interrupted by signals and (since addition of networking to
the system and the advent of synchronous I/O multiplexing) can be
handled by select/ poll/ ...

A process which has to wait until a disk operation completed will just
wait for it. This is not considered "blocking" because it's expected
that the operation will always complete withing a "short" time interval ...

Scott Lurndal

unread,

Jun 29, 2020, 2:18:53 PM6/29/20

to

fe...@palmen-it.de (Felix Palmen) writes:
>* Rainer Weikusat <rwei...@talktalk.net>:
>> With portability in mind, IOW, "make stuff work", it's revelant to pay
>> attention to actual system behaviour and actual system behaviours is
>> that "blocking" means "waiting an indeterminate time for an event which
>> may never happen". This includes networked and local IPC and doesn't
>> include "operations on regular files" despite processes may have to wait
>> for "disk operations" to complete.
>
>That's not the definition of blocking. Blocking is any preemption of a
>task that issues an I/O operation, only rescheduling it when the
>operation finished. It doesn't matter how long it takes. Linux obviously
>doesn't implement O_NONBLOCK for regular files, but still contains this
>note in open(2):
>
>| Note that this flag has no effect for regular files and block
>| devices; that is, I/O operations will (briefly) block when
>| device activity is required, regardless of whether O_NONBLOCK
>| is set. Since O_NONBLOCK semantics might eventually be
>| implemented, applications should not depend upon blocking
>| behavior when specifying this flag for regular files and block
>| devices.

Although, in general, regular file writes seldom block (unless O_DIRECT, O_SYNC or O_DATASYNC are
specified, or a new page needs to be allocated to the page cache for the write and there
was no free page available.

Reads of a regular file, in general, are satisfied by the page cache (assuming a
hit in the current file cache page, or an adequate kernel prefetch algorithm (see memadvise))
without blocking (again, absent O_DIRECT).

Nicolas George

unread,

Jun 29, 2020, 3:10:07 PM6/29/20

to

Scott Lurndal, dans le message <egqKG.46718$pC.4...@fx19.iad>, a
écrit :

> Although, in general, regular file writes seldom block (unless O_DIRECT,
> O_SYNC or O_DATASYNC are specified, or a new page needs to be allocated to
> the page cache for the write and there was no free page available.
>
> Reads of a regular file, in general, are satisfied by the page cache
> (assuming a hit in the current file cache page, or an adequate kernel
> prefetch algorithm (see memadvise)) without blocking (again, absent
> O_DIRECT).

Even when it cannot perform the system call immediately, on most systems it
does not result in blocking, as understood by O_NONBLOCK and poll() but in a
different state for the process.

Felix Palmen

unread,

Jun 29, 2020, 3:34:03 PM6/29/20

to

* Rainer Weikusat <rwei...@talktalk.net>:

> A process which has to wait until a disk operation completed will just
> wait for it. This is not considered "blocking" because it's expected
> that the operation will always complete withing a "short" time interval ...

You should really re-read about the meaning of blocking. Whether a
system call is interruptible or not has nothing to do with this term. On
a side note, sleep() is interruptible. I'd say the event it is waiting
for is pretty deterministic, though.

Rainer Weikusat

unread,

Jun 29, 2020, 3:56:39 PM6/29/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Rainer Weikusat <rwei...@talktalk.net>:
>> A process which has to wait until a disk operation completed will just
>> wait for it. This is not considered "blocking" because it's expected
>> that the operation will always complete withing a "short" time interval ...
>
> You should really re-read about the meaning of blocking.

You should be perhaps stop telling me what you believe I should do just
because I've been posting some (factually accurate) information about
UNIX kernels you seem to be unaware of.

> Whether a system call is interruptible or not has nothing to do with
> this term. On a side note, sleep() is interruptible. I'd say the event
> it is waiting for is pretty deterministic, though.

And while we're at that, you could perhaps also stop bringing up example
which are completely unrelated to the topic at hand. Yes, indeed,
"interruptible sleep" is also used for system calls which are not
blocking in an I/O call (which may wait for an indefinite amount of time
for an event which might never happen). But that's completely besides
the point wrt the different between I/O on a socket and I/O on a regular
file.

I happen to have implemented a couple of such waits in the past,
including some bits (fixed/ reimplemented) of the Linux AF_UNIX socket
poll implementation and I happen to know how this code looks like and
why.

Felix Palmen

unread,

Jun 30, 2020, 2:30:03 AM6/30/20

to

* Rainer Weikusat <rwei...@talktalk.net>:

> You should be perhaps stop telling me what you believe I should do just
> because I've been posting some (factually accurate) information about
> UNIX kernels you seem to be unaware of.

All you're doing is trying to redefine a well-know term to something you
like better. That won't work, and I really don't get why you're trying
to fight for that.

> I happen to have implemented a couple of such waits in the past,
> including some bits (fixed/ reimplemented) of the Linux AF_UNIX socket
> poll implementation and I happen to know how this code looks like and
> why.

So? That's really no excuse. The meaning of "blocking" is still
well-defined (and independent of the OS), and trying to change it serves
no purpose at all.

Apart from that, the question here has long been answered. Now you're
just trying to tell me what I'm implementing makes no sense. I disagree,
simple as that. A well-behaved service should try to avoid operator
error as far as this is possible. Contrary to the unnecessary discussion
about what "blocking" means, there are no hard facts but just opinions,
but again, it makes no sense to fight over it.

Boris Dorestand

unread,

Jun 30, 2020, 5:39:42 PM6/30/20

to

fe...@palmen-it.de (Felix Palmen) writes:

> * Rainer Weikusat <rwei...@talktalk.net>:
>> You should be perhaps stop telling me what you believe I should do just
>> because I've been posting some (factually accurate) information about
>> UNIX kernels you seem to be unaware of.
>
> All you're doing is trying to redefine a well-know term to something you
> like better. That won't work, and I really don't get why you're trying
> to fight for that.
>
>> I happen to have implemented a couple of such waits in the past,
>> including some bits (fixed/ reimplemented) of the Linux AF_UNIX socket
>> poll implementation and I happen to know how this code looks like and
>> why.
>
> So? That's really no excuse. The meaning of "blocking" is still
> well-defined (and independent of the OS), and trying to change it serves
> no purpose at all.

Could you share a reference for the definition of blocking that you use?

Felix Palmen

unread,

Jun 30, 2020, 6:40:27 PM6/30/20

to

* Boris Dorestand <bdore...@example.com>:

> Could you share a reference for the definition of blocking that you use?

Possibly any textbook you can find on operating systems. Could you share
a single authorative source that makes a "long time" or the fact the
result "might never arrive" an integral part of the definition of
blocking? I don't think so.

"Blocking" always meant preempting a process because it needs the result
of an I/O operation to resume its work.

Kaz Kylheku

unread,

Jun 30, 2020, 7:01:17 PM6/30/20

to

On 2020-06-30, Felix Palmen <fe...@palmen-it.de> wrote:
> * Boris Dorestand <bdore...@example.com>:
>> Could you share a reference for the definition of blocking that you use?
>
> Possibly any textbook you can find on operating systems. Could you share
> a single authorative source that makes a "long time" or the fact the
> result "might never arrive" an integral part of the definition of
> blocking? I don't think so.
>
> "Blocking" always meant preempting a process because it needs the result
> of an I/O operation to resume its work.

Careful there. Preempting means taking the processor away from a task
that isn't waiting for anything and which remains ready to execute.

Tasks that block do so voluntarily.

Though that last remark is incomplete without observing that in a
virtual memory system, a task can block at just about "any" point in
its execution due to requiring the next instruction or piece of data
to come from memory which is currently not present, triggering a
fault. Therefore we cannot assert that blocking always occurs when a
task neatly invokes an operating system service function with
arguments determined by the application programmer at a well-defined
spot in its program. However, a virtual memory fault does pertain to
something that the task did; it's not an unrelated event taking the
CPU away so that another task could be dispatched.

I agree with you that all scheduling suspensions to wait for a resource
---mutex being available, semaphore being signaled, I/O completing, page
fault being fixed up---are blocking.

I also agree that uninterruptible sleeps are in fact an instance of
blocking.

It can so happen that a device driver uses uninterruptible sleeps for
synchronizing with the hardware on very short time scales, and not for
the main I/O payloads. So then whether the application uses blocking,
non-blocking or asynchronous I/O, these uninterruptible sleeps happen
anyway. In that situation it makes sense to think of those sleeps as
a form of blocking separate from the regular blocking seen by the
application that chooses blocking I/O.

Felix Palmen

unread,

Jun 30, 2020, 7:26:03 PM6/30/20

to

* Kaz Kylheku <937-05...@kylheku.com>:

> Careful there. Preempting means taking the processor away from a task
> that isn't waiting for anything and which remains ready to execute.

Well, careful as well, as preempting just means "taking away the CPU
without active help from the task" (like the classic yield()). You could
probably argue a lot about semantics here, a lot of I/O calls could or
could not block, so you find this described as a sort of preemption,
therefore ...

> Tasks that block do so voluntarily.

... that's not necessarily true (or, depends on interpretation)

> Though that last remark is incomplete without observing that in a
> virtual memory system, a task can block at just about "any" point in
> its execution due to requiring the next instruction or piece of data
> to come from memory which is currently not present, triggering a
> fault.

That's an interesting aspect. I'm not sure it's agreed upon to call this
"blocking" as well. Traditionally, the term "blocking" is tied to I/O as
the most straight-forward measure to avoid busy waiting. Anyways, ...

> I agree with you that all scheduling suspensions to wait for a resource
> ---mutex being available, semaphore being signaled, I/O completing, page
> fault being fixed up---are blocking.

... this definition is even wider than mine. Given that blocking is also
used quite often with e.g. synchronization primitives as you mention
them, it makes sense. It doesn't make sense trying to limit the term to
things that "take a long time" or "might never occur", first and
foremost because this is no clear definition, but also because I've
never seen it used that way.

Anyways, that starts to deviate a lot (remind you, my initial question
is fully answered) -- still interesting!

Boris Dorestand

unread,

Jun 30, 2020, 8:26:58 PM6/30/20

to

fe...@palmen-it.de (Felix Palmen) writes:

> * Boris Dorestand <bdore...@example.com>:
>> Could you share a reference for the definition of blocking that you use?
>
> Possibly any textbook you can find on operating systems. Could you share
> a single authorative source that makes a "long time" or the fact the
> result "might never arrive" an integral part of the definition of
> blocking? I don't think so.
>
> "Blocking" always meant preempting a process because it needs the result
> of an I/O operation to resume its work.

I think Richard W. Stevens makes the distinction. Let's have a look
with an open heart. The more interesting passage is in Chapter 14, but
let's begin with chapter 10 of the 3rd edition.

Sectio 10.5, APUE, Richard W. Stevens, 3rd edition, page 327.

--8<---------------cut here---------------start------------->8---
A characteristic of earlier UNIX systems was that if a process caught
a signal while the process was blocked in a ‘‘slow’’ system call, the
system call was interrupted. [...]

To support [the interruption of system calls], the system calls are
divided into two categories: the ‘‘slow’’ system calls and all the
others. The slow system calls are those that can block
forever. Included in this category are

* Reads that can block the caller forever if data isn’t present
with certain file types (pipes, terminal devices, and network
devices)

* Writes that can block the caller forever if the data can’t be
accepted immediately by these same file types

[...]

The notable exception to these slow system calls is anything related
to disk I/O. Although a read or a write of a disk file can block the
caller temporarily (while the disk driver queues the request and then
the request is executed), unless a hardware error occurs, the I/O
operation always returns and unblocks the caller quickly.
--8<---------------cut here---------------end--------------->8---

14.2 Nonblocking I/O, APUE, Richard W. Stevens, 3rd edition, page 482.

--8<---------------cut here---------------start------------->8---
[...]

We also said that system calls related to disk I/O are not considered
slow, even though the read or write of a disk file can block the
caller temporarily.

Nonblocking I/O lets us issue an I/O operation, such as an open, read,
or write, and not have it block forever. If the operation cannot be
completed, the call returns immediately with an error noting that the
operation would have blocked.
--8<---------------cut here---------------end--------------->8---

So it seems that Richard W. Stevens does not consider blocking those
reads and writes that might finish quickly. This last passage is
present even on the first edition.

``If the operation cannot be completed'' then some error would be
returned notifying the caller of the temporary difficulty. But if it
can be completed quickly, then it's not considered blocking.

Perhaps your definition of blocking might be coming from more recent
works such as Node.js. Here's their definition of blocking.

Blocking is when the execution of additional JavaScript in the Node.js
process must wait until a non-JavaScript operation completes.
Source: https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/

This implies that when a system call is issued, the process is blocked
because a system call is a non-Javascript operation.

Felix Palmen

unread,

Jul 1, 2020, 2:00:04 AM7/1/20

to

* Boris Dorestand <bdore...@example.com>:

> So it seems that Richard W. Stevens does not consider blocking those
> reads and writes that might finish quickly.

Maybe you should first read the text you cite thoroughly? Yes, there's
some distinction between slow and fast, in the context of interruption
by a signal. What happens to the process in any case is always called
blocking in this entire text.

> Perhaps your definition of blocking might be coming from more recent
> works such as Node.js. Here's their definition of blocking.
>
> Blocking is when the execution of additional JavaScript in the Node.js
> process must wait until a non-JavaScript operation completes.
> Source: https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/

This is completely unrelated to the topic here, but this definition
doesn't make sense. Maybe the result of meaning the right thing combined
with an attempt to translate it into something "everybody would
understand", failing to preserve correctness in the process.

Boris Dorestand

unread,

Jul 1, 2020, 5:43:06 AM7/1/20

to

fe...@palmen-it.de (Felix Palmen) writes:

> * Boris Dorestand <bdore...@example.com>:
>> So it seems that Richard W. Stevens does not consider blocking those
>> reads and writes that might finish quickly.
>
> Maybe you should first read the text you cite thoroughly? Yes, there's
> some distinction between slow and fast, in the context of interruption
> by a signal. What happens to the process in any case is always called
> blocking in this entire text.

He does say it blocks either way. On the other hand, it is only the
slow system calls that could return values such as EAGAIN, that is,
there are situations in which a system call does not block the caller
*in the sense that* errors such as EAGAIN will not be returned.

When I read Rainer Weikusat's post on June 29th 2020

<87366d3...@doppelsaurus.mobileactivedefense.com>

it is my understanding that he remarks precisely that.

A process which has to wait until a disk operation completed will just
wait for it. This is not considered "blocking" because it's expected
that the operation will always complete withing a "short" time
interval ...

>> Perhaps your definition of blocking might be coming from more recent
>> works such as Node.js. Here's their definition of blocking.
>>
>> Blocking is when the execution of additional JavaScript in the Node.js
>> process must wait until a non-JavaScript operation completes.
>> Source: https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/
>
> This is completely unrelated to the topic here, but this definition
> doesn't make sense. Maybe the result of meaning the right thing
> combined with an attempt to translate it into something "everybody
> would understand", failing to preserve correctness in the process.

What's the problem with the definition?

Felix Palmen

unread,

Jul 1, 2020, 6:02:04 AM7/1/20

to

* Boris Dorestand <bdore...@example.com>:

> He does say it blocks either way. On the other hand, it is only the
> slow system calls that could return values such as EAGAIN, that is,
> there are situations in which a system call does not block the caller
> *in the sense that* errors such as EAGAIN will not be returned.

This just means a system might not implement non-blocking semantics for
some calls that are expected to only block very briefly. I don't know
what POSIX has to say about that, it's probably allowed (?), but even
the Linux manual page states you shouldn't rely on this "always
blocking" behavior as non-blocking semantics might be implemented at a
later time.

> it is my understanding that he remarks precisely that.
>
> A process which has to wait until a disk operation completed will just
> wait for it. This is not considered "blocking"

*snip*
This is where it's wrong. Of course this is blocking.

> [nodejs' definition of blocking]

> What's the problem with the definition?

It defines blocking based on the language the executed code was written
in. How should that make sense?

Boris Dorestand

unread,

Jul 1, 2020, 6:56:03 AM7/1/20

to

fe...@palmen-it.de (Felix Palmen) writes:

[...]

>> it is my understanding that he remarks precisely that.
>>
>> A process which has to wait until a disk operation completed will just
>> wait for it. This is not considered "blocking"
>
> *snip*
> This is where it's wrong. Of course this is blocking.

I don't think it's wrong if taken in the proper context and I've shown
the context where it is adequately placed. There is a UNIX context.
This is comp.unix.programmer. In a different context, I can agree with
the definition of blocking you have in mind.

>> [nodejs' definition of blocking]
>> What's the problem with the definition?
>
> It defines blocking based on the language the executed code was written
> in. How should that make sense?

It's a different context. The Node.js programmer must be able to tell
which calls are blocking in the Node.js sense and which calls are not.
The definition spells out which is which. It makes sense in the context
in which it is.

Rainer Weikusat

unread,

Jul 1, 2020, 10:26:02 AM7/1/20

to

He really uses the same definition as I do, he apparently just doesn't
understand how this works (the process is not preempted[*], that's
something entirely different), upon detecting that it presently cannot
do anything useful, it will put itself onto a suitable waitqueue and
invoke the scheduler to select a different process (that's Linux, but
presumably broadly similar to how it works in other kernels). That's
also called "going to sleep".

He is also really unwilling to understand the different between
"blocking I/O" and "disk I/O". The former occurs whenever the process is
going to wait for an indeterminate time for an event which might never
happen. As signals might need to be delivered while the process is
blocked, processes in this state "sleep interruptibly" (again, the Linux
term, buf sufficiently general enough to apply to other systems as
well). This has been in UNIX since ancient times, originally (AFAIK)
applying to pipes and terminals and later on (when they were introduced)
extended to sockets, as socket I/O has the general properties in this
area.

[*] Preempting a process means "interrupting it while it is running
because either a higher prioty process became runnable or it has
completely used its time slice.

This isn't exactly rocket science.

Rainer Weikusat

unread,

Jul 1, 2020, 10:28:00 AM7/1/20

to

Tut tut. How can you be so unkind to introduce Mr SmartAlec to "basic
UNIX text books" when he was so vehemently into a crusade against
"improper thinking by people with funny names who must surely just make
up stuff" ...

Rainer Weikusat

unread,

Jul 1, 2020, 10:36:35 AM7/1/20

to

Kaz Kylheku <937-05...@kylheku.com> writes:
> On 2020-06-30, Felix Palmen <fe...@palmen-it.de> wrote:
>> * Boris Dorestand <bdore...@example.com>:

[...]

> I agree with you that all scheduling suspensions to wait for a resource
> ---mutex being available, semaphore being signaled, I/O completing, page
> fault being fixed up---are blocking.
>
> I also agree that uninterruptible sleeps are in fact an instance of
> blocking.

"Pretend that this didn't really happen" is a basic UNIX design
decision. It's possible to disagree with this decision, ie, argue that
disk I/O (presumably the most prominent example) should really work like
"blocking I/O" instead of silently causing the process to go to
sleep. But using the same term for both just causes confusion, as
evidenced in the "O_NONBLOCK is not implemented for regular files but
eventually ...". Yes, someone could eventually rewrite all the code
written around the other assumption but in practice, that's not going to
happen except possibly for an entirely new kernel.

Boris Dorestand

unread,

Jul 1, 2020, 11:37:41 AM7/1/20

to

So let me see if I myself understand your definition of blocking. You
say that ``with portability in mind, [...] `blocking' means `waiting an
indeterminate time for an event which may never happen'''. Operations
on regular files are exceptions, despite processes having to wait for
disk operations to complete.

So, for example, if I am a process, I might call read() and read() might
take a certain short time interval to return. In a certain sense, this
short time interval is blocking me, but assuming the system guarantees
the time interval is short relative to my perspective as a process, it
does not qualify as blocking me.

It is obvious that, no matter how short a time interval is, it is always
a time I have to wait in the list of sequential operations which I must
compute. (My use of ``sequential operations'' implies that I must wait
for each operation to finish before I can start the next.) So, for
example, when I request to read a value from main memory, it does take
some time for the data to be fetched from main memory and be placed
somewhere in the CPU where I can access and compute with. But we are
not going to say this short time interval is blocking me. Otherwise I
think we would have major problems with such definition of blocking
because every computation always takes some fixed time to finish, no
matter how small.

Boris Dorestand

unread,

Jul 1, 2020, 11:38:08 AM7/1/20

to

Lol. It might have looked that way. Detected that it was a matter of
definitions, I thought I should present mine and look at his. I'm
willing to look at it in earnest until I have a clear understanding of
all sides.

Rainer Weikusat

unread,

Jul 1, 2020, 1:50:44 PM7/1/20

to

Boris Dorestand <bdore...@example.com> writes:

[...]

> So let me see if I myself understand your definition of blocking. You
> say that ``with portability in mind, [...] `blocking' means `waiting an
> indeterminate time for an event which may never happen'''. Operations
> on regular files are exceptions, despite processes having to wait for
> disk operations to complete.
>
> So, for example, if I am a process, I might call read() and read() might
> take a certain short time interval to return. In a certain sense, this
> short time interval is blocking me, but assuming the system guarantees
> the time interval is short relative to my perspective as a process, it
> does not qualify as blocking me.
>
> It is obvious that, no matter how short a time interval is, it is always
> a time I have to wait in the list of sequential operations which I must
> compute. (My use of ``sequential operations'' implies that I must wait
> for each operation to finish before I can start the next.) So, for
> example, when I request to read a value from main memory, it does take
> some time for the data to be fetched from main memory and be placed
> somewhere in the CPU where I can access and compute with. But we are
> not going to say this short time interval is blocking me.

It's not "my definition" but more or less paraphrased from APUE. It's
also not a question of "having to wait for some time" but about a
certain mechanism for waiting: A process calls into the kernel in order
to perform "some I/O operation". It is determined that the operation
cannot complete immediately, hence, the process state is changed from
"runnable" to "waiting for some event" and the scheduler selects another
process to run. If "some event" actually occurs, the process will become
runnable again ("be woken up"), will eventually be scheduled to run and
pick up where it stopped.

That's the basic I/O wait mechanism and it's used for all I/O waits (and
for other waits, eg, waiting for a mutex/ semaphore to become
available).

NB: The following uses "input" instead of "I/O" for simplification.

There are certain kinds of "input sources" in a system where the kernel
cannot predict when (if ever) input will actually become
available. Originally (before networking was added to the system), these
were terminals and pipes. Later on, sockets where added to this set as
well. As a process which sleeps in the kernel until input from such a
source becomes available might never be woken up, this utilizes a
so-called "interruptible sleep": Should a signal become pending for such
a process, it's going to be woken up in order to handle the signal and
the original I/O call will return with an EINTR error.

It's possible to configure a file descriptor referring to such an input
source for "non-blocking mode" by setting the O_NONBLOCK flag. If this
is done, a read will immediately fail with an EAGAIN (EWOULDBLOCK on
early BSD and BSD-derived systems) error instead of blocking the
process. Further, a single process can handle input on a number of such
file descriptors by using one of the synchronous I/O multiplexing calls,
eg, select or poll. It will then be woken up if input becomes available
on any of the descriptors in the set (or will receive an EINTR error if
a signal which had to be handled occurred first).

====================
BIG LINE IN THE SAND
====================

x x x x x x x x x x -
\
x x x x x x x x x /- barbed wire

(machine guns are not pictured)

On the opposite side of this hopefully invincible fortification, there's disk
I/O. In certain circumstances, a process will need to wait for data from
the disk as well. As that's a local device, the kernel knows that this
wait will only take a fairly short, finite amount of time. Hence, the
kernel just lies to the process: It causes it to go asleep in the
abovementioned way. There's no way the process can avoid this (with the
synchronous I/O calls) and also, signals which become pending during the
sleep will remain pending. The kernel basically pretends that the
process hasn't really been descheduled.

Hence, while this technically uses means similar to the "blocking I/O"
described above, that's an implementation detail which is hidden from
applications.

Boris Dorestand

unread,

Jul 1, 2020, 10:15:06 PM7/1/20

to

:-)

> On the opposite side of this hopefully invincible fortification, there's disk
> I/O. In certain circumstances, a process will need to wait for data from
> the disk as well. As that's a local device, the kernel knows that this
> wait will only take a fairly short, finite amount of time. Hence, the
> kernel just lies to the process: It causes it to go asleep in the
> abovementioned way. There's no way the process can avoid this (with the
> synchronous I/O calls) and also, signals which become pending during the
> sleep will remain pending. The kernel basically pretends that the
> process hasn't really been descheduled.
>
> Hence, while this technically uses means similar to the "blocking I/O"
> described above, that's an implementation detail which is hidden from
> applications.

I think I got it. As Richard W. Stevens puts it: ``[t]he notable
*exception* to these slow system calls is anything related to disk
I/O''.

Felix Palmen seems dissatisfied with saying that this is not blocking
the process. I honestly don't really care what definitions one uses, as
long as they're somewhat reasonable.

The bitterness that goes on here sometimes is really hard to explain
taking the events here at face value. Definitions are about words. It
can't be words that's hurting people, can it?

Felix Palmen

unread,

Jul 2, 2020, 3:04:03 AM7/2/20

to

* Rainer Weikusat <rwei...@talktalk.net>:
> Tut tut.

You're showing the worst possible combination in any discussion:

* Being totally convinced *you're* right with factually wrong knowledge
* Refusal to read and understand any text that might contradict it
* Insulting people that contradict

Don't expect me to write any more comments here, it's all in the thread
for anyone to read.

Rainer Weikusat

unread,

Jul 2, 2020, 10:06:31 AM7/2/20

to

fe...@palmen-it.de (Felix Palmen) writes:
> * Rainer Weikusat <rwei...@talktalk.net>:
>> Tut tut.
>
> You're showing the worst possible combination in any discussion:
>
> * Being totally convinced *you're* right with factually wrong knowledge
> * Refusal to read and understand any text that might contradict it
> * Insulting people that contradict
>
> Don't expect me to write any more comments here, it's all in the thread
> for anyone to read.

If you print that on a piece of paper and think about it for some time,
it might lead to valuable insight about yourself.

Rainer Weikusat

unread,

Jul 2, 2020, 10:17:42 AM7/2/20

to

Boris Dorestand <bdore...@example.com> writes:
> Rainer Weikusat <rwei...@talktalk.net> writes:

[...]

>> On the opposite side of this hopefully invincible fortification, there's disk
>> I/O. In certain circumstances, a process will need to wait for data from
>> the disk as well. As that's a local device, the kernel knows that this
>> wait will only take a fairly short, finite amount of time. Hence, the
>> kernel just lies to the process: It causes it to go asleep in the
>> abovementioned way. There's no way the process can avoid this (with the
>> synchronous I/O calls) and also, signals which become pending during the
>> sleep will remain pending. The kernel basically pretends that the
>> process hasn't really been descheduled.
>>
>> Hence, while this technically uses means similar to the "blocking I/O"
>> described above, that's an implementation detail which is hidden from
>> applications.
>
> I think I got it. As Richard W. Stevens puts it: ``[t]he notable
> *exception* to these slow system calls is anything related to disk
> I/O''.
>
> Felix Palmen seems dissatisfied with saying that this is not blocking
> the process.

That's sort-of a political statement: It's roughly "I really think disk
I/O should honor the O_NONBLOCK flag, because - after all - the process
is sleeping in the kernel." This is technically correct but this fact is
intentionally hidden from applications. That's a design decision some
people disagree with. But this should (IMHO, I don't really have an
opinion on the topic) then be argued in the open, ie, demanding that
disk I/O should be treated like so-called blocking I/O despite its
different properties and not by claiming that it "is" blocking I/O and
that these different properties don't exist (in the usual
"shoot the messenger for want of anyone else" mode of operation).

Boris Dorestand

unread,

Jul 2, 2020, 10:35:59 AM7/2/20

to

One thing is to describe how things are; another completely different
thing is to argue on how they should be. I couldn't agree more.

can connect() with AF_UNIX block?

Felix Palmen

Richard Kettlewell

Felix Palmen

Richard Kettlewell

Felix Palmen

Felix Palmen

James Kuyper

Felix Palmen

Kaz Kylheku

Felix Palmen

James Kuyper

Rainer Weikusat

Per Hedeland

Felix Palmen

Rainer Weikusat

Felix Palmen

Rainer Weikusat

Felix Palmen

Rainer Weikusat

Felix Palmen

Rainer Weikusat

Scott Lurndal

Nicolas George

Felix Palmen

Rainer Weikusat

Felix Palmen

Boris Dorestand

Felix Palmen

Kaz Kylheku

Felix Palmen

Boris Dorestand

Felix Palmen

Boris Dorestand

Felix Palmen

Boris Dorestand

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Boris Dorestand

Boris Dorestand

Rainer Weikusat

Boris Dorestand

Felix Palmen

Rainer Weikusat

Rainer Weikusat

Boris Dorestand