Non-blocking I/O

0 views
Skip to first unread message

Somnath Roy

unread,
Jul 16, 1998, 3:00:00 AM7/16/98
to

Hi,

Is there any non-blocking I/O available in Linux ?

For e.g., an application sends some read request and does some other work ...
once the data is available, the drivers wakes up (or sends some signal) the
user application.

Thanx in advance,
- Somnath.

/~~~|~~~~|~~~~|~~~~|~~~~|~~~~|~~~~|~~~~|~~~~|~~~~|~~~~~/~~~|~\
|____|____|____|____|____|____|____|____|____|____|____/ |) \
| Somnath Roy, Wipro Ltd., E-ml:sr...@wipinfo.soft.net /_____|___\
/DivyaSree,30,Mission Rd.,Bl-27, Ph-2241730 (Ext-3305)///////| |
| |///////| |
| Journey of thousand miles begins with a single step |///////| |
~~~/~~~\~/~~~\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/~~~\~~~~
\___/ \___/ \___/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html

Alan Cox

unread,
Jul 16, 1998, 3:00:00 AM7/16/98
to
> Is there any non-blocking I/O available in Linux ?

Unix has had non blocking I/O since 1978. Man fcntl, selecty and signal

> For e.g., an application sends some read request and does some other work ...
> once the data is available, the drivers wakes up (or sends some signal) the
> user application.

Yep

Chris Wedgwood

unread,
Jul 17, 1998, 3:00:00 AM7/17/98
to
On Thu, Jul 16, 1998 at 02:43:18PM +0100, Alan Cox wrote:
> > Is there any non-blocking I/O available in Linux ?
>
> Unix has had non blocking I/O since 1978. Man fcntl, selecty and signal

Yeah, but things like:

open("/mounts/scsi6/cache/04/D3/0004D396", O_RDONLY|O_NONBLOCK)

block.

This really sucks because it means we have to write smarter apps. to put IO
like this in another thread or some such and I'm a lazy bastard who would
like to be able to avoid having to do this.

I know a while back someone suggestion we could have kernel threads do this,
which at the time I didn't like the sound of, but since then I've got lazier
and wouldn't be so opposed to having say 5 or so idle/blocked kernel threads
hanging around to do this when required.

Perhaps spawn more of these when the number of free kernel threads gets low
and kill excess threads when the idle numbers get too high. But 5 sounds
about right, as on a fairly busy squid, I usually don't see more than 5
threads in the D state waiting for disk seek or whatever.

Hmm... I guess more importantly, what does POSIX say here about the use of
O_NONBLOCK with open(2)?


-cw

Alan Cox

unread,
Jul 17, 1998, 3:00:00 AM7/17/98
to
> Yeah, but things like:
>
> open("/mounts/scsi6/cache/04/D3/0004D396", O_RDONLY|O_NONBLOCK)
>
> block.

The open call completes yes, but it doesnt block on device ready

> This really sucks because it means we have to write smarter apps. to put IO
> like this in another thread or some such and I'm a lazy bastard who would
> like to be able to avoid having to do this.

So put it in a library _NOT_ the kernel

Chris Wedgwood

unread,
Jul 17, 1998, 3:00:00 AM7/17/98
to
> > This really sucks because it means we have to write smarter apps. to put
> > IO like this in another thread or some such and I'm a lazy bastard who
> > would like to be able to avoid having to do this.
>
> So put it in a library _NOT_ the kernel

To be perfectly honest, I'm not really sure how to modify libc/open(3) to do
so correctly and make sure everything still works with regards to signals
and what have you.

I don't even know if its worth it, because I think someone (Gooch?) has
already modified glibc2 so that the aio_* calls already do this.


One thing that might be an issue, but perhaps isn't because clone(2) is
obscenely fast, is that no 'IO threads' will exist on the first call to
open(3) so we have to create them and do the setup, which can take time.

I'm pretty much lost these days on how to make threads applications work
properly with signals anyhow, at one point you needn't a helper thread, now,
I'm not so sure...

-cw

Chris Adams

unread,
Jul 17, 1998, 3:00:00 AM7/17/98
to
According to Chris Wedgwood <ch...@cybernet.co.nz>:

>Yeah, but things like:
>
> open("/mounts/scsi6/cache/04/D3/0004D396", O_RDONLY|O_NONBLOCK)
>
>block.
>
>Hmm... I guess more importantly, what does POSIX say here about the use of
>O_NONBLOCK with open(2)?

Here is what Unix98 says at
http://www.opengroup.org/onlinepubs/7908799/xsh/open.html

O_NONBLOCK
When opening a FIFO with O_RDONLY or O_WRONLY set:
If O_NONBLOCK is set:
An open() for reading only will return without delay. An open()
for writing only will return an error if no process
currently has the file open for reading.
If O_NONBLOCK is clear:
An open() for reading only will block the calling thread until a
thread opens the file for writing. An open() for writing
only will block the calling thread until a thread opens
the file for reading.
When opening a block special or character special file that
supports non-blocking opens:
If O_NONBLOCK is set:
The open() function will return without blocking for the device
to be ready or available. Subsequent behaviour of the
device is device-specific.
If O_NONBLOCK is clear:
The open() function will block the calling thread until the
device is ready or available before returning.
Otherwise, the behaviour of O_NONBLOCK is unspecified.
--
Chris Adams - cad...@ro.com
System Administrator - Renaissance Internet Services
I don't speak for anybody but myself - that's enough trouble.

Richard Gooch

unread,
Jul 17, 1998, 3:00:00 AM7/17/98
to
Chris Wedgwood writes:
> > > This really sucks because it means we have to write smarter apps. to put
> > > IO like this in another thread or some such and I'm a lazy bastard who
> > > would like to be able to avoid having to do this.
> >
> > So put it in a library _NOT_ the kernel
>
> To be perfectly honest, I'm not really sure how to modify libc/open(3) to do
> so correctly and make sure everything still works with regards to signals
> and what have you.
>
> I don't even know if its worth it, because I think someone (Gooch?) has
> already modified glibc2 so that the aio_* calls already do this.

Er, I don't think it was me: I haven't modified any glibc2
code. Perhaps you're referring to something else I've done/am working
on (such as the migrating FDs code)? I've not paid much attention to
this thread, so the context is lost on me. Do you have an executive
summary? ;-)

Regards,

Richard....

Chris Wedgwood

unread,
Jul 18, 1998, 3:00:00 AM7/18/98
to
On Fri, Jul 17, 1998 at 05:10:08PM +1000, Richard Gooch wrote:

> > I don't even know if its worth it, because I think someone (Gooch?) has
> > already modified glibc2 so that the aio_* calls already do this.
>
> Er, I don't think it was me: I haven't modified any glibc2
> code. Perhaps you're referring to something else I've done/am working
> on (such as the migrating FDs code)? I've not paid much attention to
> this thread, so the context is lost on me. Do you have an executive
> summary? ;-)

- Me bleats that O_NONBLOCK should work on regular files for open(2),
read(2), and write(2)

I suggest a dynamically sizing set of kernel threads for this because
I'm too lazy to do it in userspace and have had trouble making it work
right before

- Alan says, it belongs in user space, and rightly it does, only I can't
make it work right

- I suggestion someone (you) has already done something along these line
for glibc2, which I guess must be wrong....

-cw

Chris Wedgwood

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to
On Tue, Jul 21, 1998 at 06:36:52PM +1000, Richard Gooch wrote:

> Nope, not me. I'm fiddling with something else: reducing the impact of
> select(2) and poll(2) when you have zillions of FDs.

select(2) I don't think needs to work for gobs of FDs. In fact, I think its
reasonable for select(2) to fail with then number of FDs is greater than say
1024.

select(2) doesn't appear to be very scalable, and I don't see any reason to
try to make it so...

But, poll(2) should be nice as fast and be at worst O(n) for gobs and gobs
of FDs, I'd like to have poll(2) working for 250,000 FDs at some point.

> Note that glibc 2.1 has the aio_*() functions, which give you what you
> want with reading and writing.

I'm going to use them instead, only I'm not going to glibc so I'll backport
the code or see if I can munge mine to the right API.

I think I need a new pthreads, seems broken at the moment, with libc5
anyhow.

> Doesn't help with open(2) since the aio_*() functions need a FD to work.
> I've often wondered why we can't have non-blocking I/O with regular files.
> It doesn't seem like it would be impossibly hard to implement.

open(2) could reasonable be made to work with O_NONBLOCK.

O_NONBLOCK is meaningless for read/write because its not clear what should
happen and how you would make the code work in practice. I think O_NONBLOCK
for read/write is asking fro trouble, unless I can think of a sane SAPI for
it.

Followups should probably be off the list.

Richard Gooch

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to
Chris Wedgwood writes:
> On Fri, Jul 17, 1998 at 05:10:08PM +1000, Richard Gooch wrote:
>
> > > I don't even know if its worth it, because I think someone (Gooch?) has
> > > already modified glibc2 so that the aio_* calls already do this.
> >
> > Er, I don't think it was me: I haven't modified any glibc2
> > code. Perhaps you're referring to something else I've done/am working
> > on (such as the migrating FDs code)? I've not paid much attention to
> > this thread, so the context is lost on me. Do you have an executive
> > summary? ;-)
>
> - Me bleats that O_NONBLOCK should work on regular files for open(2),
> read(2), and write(2)
>
> I suggest a dynamically sizing set of kernel threads for this because
> I'm too lazy to do it in userspace and have had trouble making it work
> right before
>
> - Alan says, it belongs in user space, and rightly it does, only I can't
> make it work right
>
> - I suggestion someone (you) has already done something along these line
> for glibc2, which I guess must be wrong....

Nope, not me. I'm fiddling with something else: reducing the impact of


select(2) and poll(2) when you have zillions of FDs.

Note that glibc 2.1 has the aio_*() functions, which give you what you
want with reading and writing. Doesn't help with open(2) since the


aio_*() functions need a FD to work. I've often wondered why we can't
have non-blocking I/O with regular files. It doesn't seem like it
would be impossibly hard to implement.

Regards,

Richard....

Richard Gooch

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to
Chris Wedgwood writes:
> On Tue, Jul 21, 1998 at 06:36:52PM +1000, Richard Gooch wrote:
>
> > Nope, not me. I'm fiddling with something else: reducing the impact of
> > select(2) and poll(2) when you have zillions of FDs.
>
> select(2) I don't think needs to work for gobs of FDs. In fact, I think its
> reasonable for select(2) to fail with then number of FDs is greater than say
> 1024.
>
> select(2) doesn't appear to be very scalable, and I don't see any reason to
> try to make it so...
>
> But, poll(2) should be nice as fast and be at worst O(n) for gobs and gobs
> of FDs, I'd like to have poll(2) working for 250,000 FDs at some point.

That O(n) behaviour is exactly the problem: as you get more FDs,
poll(2) takes longer and longer. For 250 000 FDs, you're talking
hundreds of milliseconds! That's each time you get activity on a
FD. This is no good for a HTTP server. I'm aiming for << O(n).

> > Doesn't help with open(2) since the aio_*() functions need a FD to work.
> > I've often wondered why we can't have non-blocking I/O with regular files.
> > It doesn't seem like it would be impossibly hard to implement.
>

> open(2) could reasonable be made to work with O_NONBLOCK.
>
> O_NONBLOCK is meaningless for read/write because its not clear what should
> happen and how you would make the code work in practice. I think O_NONBLOCK
> for read/write is asking fro trouble, unless I can think of a sane SAPI for
> it.

How's that? We have non-blocking I/O for reading and writing right now
for ttys, pipes and sockets. The interface is simple and well
understood. How is extending that to regular files a conceptual
problem?

> Followups should probably be off the list.

Erm, still looks on-topic to me (because of the part about
non-blocking I/O for regular files).

Zachary Amsden

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to

-----Original Message-----
From: Richard Gooch <Richar...@atnf.CSIRO.AU>
To: Chris Wedgwood <ch...@cybernet.co.nz>
Cc: Alan Cox <al...@lxorguk.ukuu.org.uk>; sr...@wipinfo.soft.net <sr...@wipinfo.soft.net>; linux-...@vger.rutgers.edu <linux-...@vger.rutgers.edu>
Date: Tuesday, July 21, 1998 5:17 AM
Subject: Re: Non-blocking I/O


>> O_NONBLOCK is meaningless for read/write because its not clear what
>should
>> happen and how you would make the code work in practice. I think
>O_NONBLOCK
>> for read/write is asking fro trouble, unless I can think of a sane
>SAPI for
>> it.
>
>How's that? We have non-blocking I/O for reading and writing right now
>for ttys, pipes and sockets. The interface is simple and well
>understood. How is extending that to regular files a conceptual
>problem?


Should be very easy indeed, just check for O_NONBLOCK when you are going to block a process waiting for disk I/O, and return the number of bytes read so far. Of course, all I/Os that you return EWOULDBLOCK on need to be scheduled so that at some point in the future they won't block. If the buffer cache locks in pages with pending transfer to userspace, I suppose it would also be wise to check for misbehaving processes chewing up a whole bunch of buffer cache with nonblocking I/O requests that they never service with some kind of timeout mechanism on the locks.

Zachary Amsden
ams...@andrew.cmu.edu

Chris Wedgwood

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to
On Tue, Jul 21, 1998 at 09:34:54AM -0400, Zachary Amsden wrote:

[I reformatted this because you evil nasty mta did bad things]

> Should be very easy indeed, just check for O_NONBLOCK when you are going
> to block a process waiting for disk I/O, and return the number of bytes
> read so far. Of course, all I/Os that you return EWOULDBLOCK on need to
> be scheduled so that at some point in the future they won't block. If the
> buffer cache locks in pages with pending transfer to userspace, I suppose
> it would also be wise to check for misbehaving processes chewing up a
> whole bunch of buffer cache with nonblocking I/O requests that they never
> service with some kind of timeout mechanism on the locks.

For it to be useful, it need to be made to work with select(2) and poll(2)
much the same as sockets do.

This looks decidely difficult to me.... <pause>.

Actually, maybe not. Right now, I'm using a wrapper around libc which uses a
pool of threads for the IO, I could also wrap select and poll in a similar
way...

hmm... what seemed very difficult now doesn't seem to hard. Will see later
perhaps.


-cw

Matti Aarnio

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to
Chris Wedgwood <ch...@cybernet.co.nz> wrote:
> On Tue, Jul 21, 1998 at 09:34:54AM -0400, Zachary Amsden wrote:
> [I reformatted this because you evil nasty mta did bad things]

Hmm... What do you mean with that ? (I am curious, as one of my
larger hats is MTA writer.. Reply privately, this is off-topic to
Linux. )

> > Should be very easy indeed, just check for O_NONBLOCK when you are going
> > to block a process waiting for disk I/O, and return the number of bytes
> > read so far. Of course, all I/Os that you return EWOULDBLOCK on need to
> > be scheduled so that at some point in the future they won't block. If the
> > buffer cache locks in pages with pending transfer to userspace, I suppose
> > it would also be wise to check for misbehaving processes chewing up a
> > whole bunch of buffer cache with nonblocking I/O requests that they never
> > service with some kind of timeout mechanism on the locks.
>
> For it to be useful, it need to be made to work with select(2) and poll(2)
> much the same as sockets do.
>
> This looks decidely difficult to me.... <pause>.
>
> Actually, maybe not. Right now, I'm using a wrapper around libc which uses a
> pool of threads for the IO, I could also wrap select and poll in a similar
> way...

That is propably the only way to handle things which are not
non-blockable. Especially open() is such which can't return before
the directory lookups, and lowest level file open succeeds or fails.

While the network level can do connect() in fully non-blocked mode,
the same is not quite so easy for open().
Or could it be ? Davem ? Stephen ?

File-IO on regular files is convertable into non-blocking. Consider
for example current system of waiting for indirection blocks or data-
blocks deep within filesystem codes. (Say, average random block access
on the disk takes about 5 milliseconds. Instead of blocking for that
the system could do retrying of the file-io in non-block mode to wait
for the availability of the file block.)

To realize that does need changes in the filesystem codes, though.
A thing to consider in 2.3.* series.

> hmm... what seemed very difficult now doesn't seem to hard. Will see later
> perhaps.
>
> -cw

/Matti Aarnio <matti....@sonera.fi>

Chris Wedgwood

unread,
Jul 21, 1998, 3:00:00 AM7/21/98
to
On Tue, Jul 21, 1998 at 08:51:14PM +0300, Matti Aarnio wrote:

> Hmm... What do you mean with that ? (I am curious, as one of my
> larger hats is MTA writer.. Reply privately, this is off-topic to
> Linux. )

Line length >> 80 chars. Just a nit-pick. Technical nothing wrong with it. I
thid you were using M$ outlook or whatever, which would explain it.

> File-IO on regular files is convertable into non-blocking. Consider for
> example current system of waiting for indirection blocks or data-blocks
> deep within filesystem codes. (Say, average random block access on the
> disk takes about 5 milliseconds. Instead of blocking for that the system
> could do retrying of the file-io in non-block mode to wait for the
> availability of the file block.)

You need some kind of schedulable context for non-blocking file open, read,
write and close. Considered a process opening, and read 10MB of data of a
_really_ slow device, or NFS for example.

Also, what do non-blocking really man? For network IO, it means the process
never sleeps waiting for data - something that can also happen with files
(e.g. NFS), however, for disk based file IO, the process will enter
uninterruptable sleep - is this blocked?

Network call when sleeping can be abort with a signal, thats not necessarily
possible with block-device sleeping IO.

And all this is very well, but if you can't tell when the IO finished is a
nice efficient way, its not much good. This means changing the semantics of
select(2) and poll(2) for filesystem FDs, which may violate POSIX (Anyone?).

> To realize that does need changes in the filesystem codes, though.

Why?

> A thing to consider in 2.3.* series.

I'm not so sure any kernel changes are required. I started out thinking that
way, but now I think it may be possible to make do with libpthread and some
wrapping around libc. (Albeit somewhat suboptimal that way you'll have to
pull apart fdsets and so on).

-cw

Chris Wedgwood

unread,
Jul 22, 1998, 3:00:00 AM7/22/98
to
[OFFTOPIC - but but have had responses to this off the list]

On Wed, Jul 22, 1998 at 06:02:38AM +1200, Chris Wedgwood wrote:
> On Tue, Jul 21, 1998 at 08:51:14PM +0300, Matti Aarnio wrote:
>
> > Hmm... What do you mean with that ? (I am curious, as one of my
> > larger hats is MTA writer.. Reply privately, this is off-topic to
> > Linux. )
>
> Line length >> 80 chars. Just a nit-pick. Technical nothing wrong with it.
> I thid you were using M$ outlook or whatever, which would explain it.

I meant MUA.

Reply all
Reply to author
Forward
0 new messages