Using epoll with real files

Sebastian Hauer

unread,

Jan 29, 2010, 5:10:34 PM1/29/10

to

Hi,
I was wondering if it is possible to use epoll with "real" (non-socket)
file descriptors. I am unsuccessfully trying to use it to do some
non-blocking file IO.

The code below fails to add the fd to the epoll event set (I am getting
an EPERM errno):

-----------------------
#include <sys/epoll.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>

int main(int argc, char *argv[]) {
int fd = open("etest.log", O_CREAT | O_WRONLY | O_APPEND,
S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP);
if (fd < 0) {
perror("Failed to open file");
return 1;
}

int efd = epoll_create(5);
struct epoll_event ev = { 0, { 0 } };
ev.events = EPOLLOUT;
ev.data.fd = fd;
if (epoll_ctl(efd, EPOLL_CTL_ADD, fd, &ev) < 0) {
perror("Failed to add fd to event set");
}
return 0;
}
-----------------------

Any help would be greatly appreciated.

Regards,
Sebastian

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

David Schwartz

unread,

Jan 29, 2010, 8:19:40 PM1/29/10

to

On Jan 29, 2:10 pm, Sebastian Hauer <ha...@psicode.com> wrote:

> I was wondering if it is possible to use epoll with "real" (non-socket)
> file descriptors. I am unsuccessfully trying to use it to do some
> non-blocking file IO.
>
> The code below fails to add the fd to the epoll event set (I am getting
> an EPERM errno):

What you are trying to do doesn't make any sense.

For a socket, for example, "ready to read" can mean that there's data
that a "read" could return without blocking. However, with a file, the
system has no idea where in the file you'd like to read, so it cannot
know whether that "read" could return without blocking.

For a socket, for example, "ready to write" can mean that the other
side has acknowledged data or enlarged the window or it can mean
there's space in the local socket send buffer. For a file, the system
has no idea where you might want to write in the file and system cache
space is shared and so is unlikely to still be available by the time
you get around to calling write.

So even if you could add a regular file to an epoll set, there would
be no point. If you did ever get a 'ready to read' or 'ready to write'
indication, you would have no idea what it meant.

DS

Josef Moellers

unread,

Feb 1, 2010, 3:29:47 AM2/1/10

to

David Schwartz wrote:
> On Jan 29, 2:10 pm, Sebastian Hauer <ha...@psicode.com> wrote:
>
>> I was wondering if it is possible to use epoll with "real" (non-socket)
>> file descriptors. I am unsuccessfully trying to use it to do some
>> non-blocking file IO.
>>
>> The code below fails to add the fd to the epoll event set (I am getting
>> an EPERM errno):
>
> What you are trying to do doesn't make any sense.
>
> For a socket, for example, "ready to read" can mean that there's data
> that a "read" could return without blocking. However, with a file, the
> system has no idea where in the file you'd like to read, so it cannot
> know whether that "read" could return without blocking.

IMHO this would make sense.
If the current file offset were smaller than the file size, a "read"
would return data. If it were not, the process could be put suspended
until it was. That would make "tail" and programs that need to monitor
some log files a lot easier to write.

Josef
--
These are my personal views and not those of Fujitsu Technology Solutions!
Josef M�llers (Pinguinpfleger bei FTS)
If failure had no penalty success would not be a prize (T. Pratchett)
Company Details: http://de.ts.fujitsu.com/imprint.html

Sebastian Hauer

unread,

Feb 1, 2010, 9:46:58 AM2/1/10

to

Hello David,

Thank you for your reply.

David Schwartz wrote:
> On Jan 29, 2:10 pm, Sebastian Hauer <ha...@psicode.com> wrote:
>
>> I was wondering if it is possible to use epoll with "real" (non-socket)
>> file descriptors. I am unsuccessfully trying to use it to do some

> What you are trying to do doesn't make any sense.
You are probably right, I might have been looking at this issue too
closely. Maybe AIO would have been a better way to accomplish what I
originally intended. Which is to write a simple but fast "non-blocking"
log service. I did not want to use a blocking write from the logging
thread which lead me down this lane and I started playing with
O_NONBLOCK and epoll.
But at this point my curiosity regarding epoll has simply taken over and
I am trying to understand it better.

> that a "read" could return without blocking. However, with a file, the
> system has no idea where in the file you'd like to read, so it cannot
> know whether that "read" could return without blocking.

Well as Joseph pointed out in his post, for reads I think O_NONBLOCK
would make sense. If you wanted to periodically check for new content to
read using epoll seems reasonable except that it does not work in my
testes either.

/* BEGIN: ---- epoll reader, does not work with files !!! ----- */
#include <sys/epoll.h>
#include <unistd.h>

#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>

static const int DontWait = -1;
static const size_t BufSz = 256;

int main(int argc, char *argv[]) {

int fd = STDIN_FILENO;
if (argc > 1)
fd = open(argv[1], O_RDONLY);

if (fd < 0) {
perror("Failed to open file");
return 1;
}

int efd = epoll_create(5);
struct epoll_event ev = { 0, { 0 } };

ev.events = EPOLLIN;

ev.data.fd = fd;
if (epoll_ctl(efd, EPOLL_CTL_ADD, fd, &ev) < 0) {
perror("Failed to add fd to event set");

return 2;
}
int wfds = epoll_wait(efd, &ev, 1, DontWait);
if (wfds > 0) {
char buf[256];
ssize_t sz = read(fd, buf, BufSz);
if (sz > 0) {
buf[sz - 1] = '\0';
printf("Read: %s\n", buf);
} else if (sz < 0) {
perror("Error while reading");
return 3;
}
} else if (wfds < 0) {
perror("Failed while waiting");
return 4;
}
return 0;
}
/* END: ---- epoll reader, does not work with files !!! ----- */

Interestingly it does work reading from tty stdin, it also works when
piping to stdin, but not when redirecting a file to stdin.

If I try the same thing using poll(2) it works with files.

> For a socket, for example, "ready to write" can mean that the other
> side has acknowledged data or enlarged the window or it can mean
> there's space in the local socket send buffer. For a file, the system
> has no idea where you might want to write in the file and system cache
> space is shared and so is unlikely to still be available by the time
> you get around to calling write.

Yes intuitively this makes sense.

David Schwartz

unread,

Feb 1, 2010, 5:36:57 PM2/1/10

to

On Feb 1, 12:29 am, Josef Moellers <josef.moell...@ts.fujitsu.com>
wrote:

> > For a socket, for example, "ready to read" can mean that there's data
> > that a "read" could return without blocking. However, with a file, the
> > system has no idea where in the file you'd like to read, so it cannot
> > know whether that "read" could return without blocking.

> IMHO this would make sense.
> If the current file offset were smaller than the file size, a "read"
> would return data. If it were not, the process could be put suspended
> until it was. That would make "tail" and programs that need to monitor
> some log files a lot easier to write.

The problem is that non-blocking I/O isn't specified for regular files
either. So how many bytes have to be ready for the file to be "ready
for read". If it woke the process with just one byte ready to read,
the process would block on the "read" for more than one byte.

The reason 'select', 'poll', and 'epoll' work so well for sockets is
because sockets have a well-defined non-blocking API. Generally,
people use these functions to discover when to attempt non-blocking
operations in designs where it's important the thread not block.

Because no such non-blocking API exists for local files (other than
AIO which doesn't require discovery anyway), a discovery mechanism
wouldn't really have any use if there was one.

So while "ready for read" could be defined for files in this way, it
would have almost no use. A way to discover that a file had been
modified would be much more useful, and so that is provided.

DS

Josef Moellers

unread,

Feb 2, 2010, 5:24:25 AM2/2/10

to

David Schwartz wrote:
> On Feb 1, 12:29 am, Josef Moellers <josef.moell...@ts.fujitsu.com>
> wrote:
>
>>> For a socket, for example, "ready to read" can mean that there's data
>>> that a "read" could return without blocking. However, with a file, the
>>> system has no idea where in the file you'd like to read, so it cannot
>>> know whether that "read" could return without blocking.
>
>> IMHO this would make sense.
>> If the current file offset were smaller than the file size, a "read"
>> would return data. If it were not, the process could be put suspended
>> until it was. That would make "tail" and programs that need to monitor
>> some log files a lot easier to write.
>
> The problem is that non-blocking I/O isn't specified for regular files
> either. So how many bytes have to be ready for the file to be "ready
> for read". If it woke the process with just one byte ready to read,
> the process would block on the "read" for more than one byte.

Would it? IIRC a read on a regular file returns as many bytes as there
are available (up to the specified amount, obviously), returning 0 bytes
at EOF.

Rainer Weikusat

unread,

Feb 2, 2010, 7:24:55 AM2/2/10

to

Sebastian Hauer <ha...@psicode.com> writes:
> Thank you for your reply.
> David Schwartz wrote:
>> On Jan 29, 2:10 pm, Sebastian Hauer <ha...@psicode.com> wrote:
>>
>>> I was wondering if it is possible to use epoll with "real" (non-socket)
>>> file descriptors. I am unsuccessfully trying to use it to do some
>
>> What you are trying to do doesn't make any sense.
> You are probably right, I might have been looking at this issue too
> closely. Maybe AIO would have been a better way to accomplish what I
> originally intended. Which is to write a simple but fast
> "non-blocking" log service. I did not want to use a blocking write
> from the logging thread which lead me down this lane

There is no such thing as 'a blocking write to a regular file'.
Actually, there isn't even 'blocking I/O on regular files' for the
usual meaning of 'blocking' -- process waits until an unpredictable
event which has to be caused by an external source has occured. A
process/ thread which actually waits for disk I/O (that is, desires to
read data which isn't yet cached or desires to force cached data to be
written out to a backing store now) is put into a state which is
called 'uninterruptible sleep' and it is expected that it will always
leave this state shortly afterwards again (NFS doesn't really fit in
here, but let's omit that for now).

[...]

Poll is required to always return 'ready for anything' when being used
with regular files.

David Schwartz

unread,

Feb 2, 2010, 7:28:43 AM2/2/10

to

On Feb 2, 2:24 am, Josef Moellers <josef.moell...@ts.fujitsu.com>
wrote:

> > The problem is that non-blocking I/O isn't specified for regular files

> > either. So how many bytes have to be ready for the file to be "ready
> > for read". If it woke the process with just one byte ready to read,
> > the process would block on the "read" for more than one byte.

> Would it? IIRC a read on a regular file returns as many bytes as there
> are available (up to the specified amount, obviously), returning 0 bytes
> at EOF.

The problem is the definition of "available". What does it mean for
data to be "available" in a regular file? Does that mean it is not
past the EOF? Does it mean that it's in cache? Or what?

DS

David Schwartz

unread,

Feb 2, 2010, 7:34:11 AM2/2/10

to

On Feb 2, 4:24 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:

> There is no such thing as 'a blocking write to a regular file'.
> Actually, there isn't even 'blocking I/O on regular files' for the
> usual meaning of 'blocking' -- process waits until an unpredictable
> event which has to be caused by an external source has occured. A
> process/ thread which actually waits for disk I/O (that is, desires to
> read data which isn't yet cached or desires to force cached data to be
> written out to a backing store now) is put into a state which is
> called 'uninterruptible sleep' and it is expected that it will always
> leave this state shortly afterwards again (NFS doesn't really fit in
> here, but let's omit that for now).

By this logic, sending on a UDP socket wouldn't be a blocking
operation since you're just waiting for space in the local network
card's transmit buffer, if at all. There are, actually, several
aspects to "blocking" that generally go together but don't always.
Usually, no one attribute is definitive.

Normal file I/O is blocking in the sense that the function will not
return, stalling the process as long as needed, until the operation
definitively succeeds or fails. It is expected that this will be
"soon", but that isn't always the case. However, it is not the typical
blocking behavior because the process is not interruptible.

DS

Rainer Weikusat

unread,

Feb 2, 2010, 7:37:30 AM2/2/10

to

David Schwartz <dav...@webmaster.com> writes:
> On Feb 2, 4:24�am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>> There is no such thing as 'a blocking write to a regular file'.
>> Actually, there isn't even 'blocking I/O on regular files' for the
>> usual meaning of 'blocking' -- process waits until an unpredictable
>> event which has to be caused by an external source has occured. A
>> process/ thread which actually waits for disk I/O (that is, desires to
>> read data which isn't yet cached or desires to force cached data to be
>> written out to a backing store now) is put into a state which is
>> called 'uninterruptible sleep' and it is expected that it will always
>> leave this state shortly afterwards again (NFS doesn't really fit in
>> here, but let's omit that for now).
>
> By this logic, sending on a UDP socket wouldn't be a blocking
> operation since you're just waiting for space in the local network
> card's transmit buffer, if at all.

Sending on a UDP socket isn't usually 'a blocking operation' precisely
because of this (or wasn't historically).

[...]

> Normal file I/O is blocking in the sense that the function will not
> return, stalling the process as long as needed, until the operation
> definitively succeeds or fails. It is expected that this will be
> "soon", but that isn't always the case. However, it is not the typical
> blocking behavior because the process is not interruptible.

Yes, and 'strlen' is a blocking operation in the sense that execution
of other code will be stalled until the length of the string has been
calculated. Consequently, why can't I use epoll to wait for strlen?

Sorry for the sarcasm, but the issue is complicated (apparently)
enough already, without diffusing all terms up to the point of
complete meaninglessness.

Josef Moellers

unread,

Feb 2, 2010, 9:46:47 AM2/2/10

to

David Schwartz wrote:
> On Feb 2, 2:24 am, Josef Moellers <josef.moell...@ts.fujitsu.com>
> wrote:
>
>>> The problem is that non-blocking I/O isn't specified for regular files
>>> either. So how many bytes have to be ready for the file to be "ready
>>> for read". If it woke the process with just one byte ready to read,
>>> the process would block on the "read" for more than one byte.
>
>> Would it? IIRC a read on a regular file returns as many bytes as there
>> are available (up to the specified amount, obviously), returning 0 bytes
>> at EOF.
>
> The problem is the definition of "available".

You wrote "the process would block on the "read" for more than one
byte." which is definitely incorrect. If it woke the process with just
omne byte ready to read, then the process would read just that one byte.

> What does it mean for
> data to be "available" in a regular file? Does that mean it is not
> past the EOF? Does it mean that it's in cache? Or what?

That, indeed, is to be defined. However, IMHO a suitable definition
could be "the offset of the open file is less than the file's size".
In other words: a read() on the file descriptor would return with a
non-zero, positive value.

David Schwartz

unread,

Feb 2, 2010, 10:59:51 AM2/2/10

to

On Feb 2, 6:46 am, Josef Moellers <josef.moell...@ts.fujitsu.com>
wrote:

> > What does it mean for

> > data to be "available" in a regular file? Does that mean it is not
> > past the EOF? Does it mean that it's in cache? Or what?

> That, indeed, is to be defined. However, IMHO a suitable definition
> could be "the offset of the open file is less than the file's size".
> In other words: a read() on the file descriptor would return with a
> non-zero, positive value.

This would be really, really strange. It would only be useful if you
were trying to detect file modification, which there is already a
sensible way to accomplish. It is much more sensible for it to
indicate that the data can be read without having to wait for the
disk.

DS

David Schwartz

unread,

Feb 2, 2010, 11:58:47 AM2/2/10

to

On Feb 2, 4:37 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:

> Sorry for the sarcasm, but the issue is complicated (apparently)
> enough already, without diffusing all terms up to the point of
> complete meaninglessness.

Right, but trying to sort the many, varied cases into two categories
isn't very helpful either.

There are operations that are purely blocking and there are operations
that are purely non-blocking. But most real-world operations have some
aspects of each. There are things that the operation will always wait
for, things the operation will never wait for, and things that the
operation will wait for unless it's interrupted.

I gather you want to use the term "blocking" only for operations that
remain indefinitely in an interruptible state until an external event
occurs? Honestly, I don't think that's helpful because it will make
people think that operations that are not "blocking" in that sense
will not "block" in the ordinary sense.

Telling the OP "you don't need epoll, files are always non-blocking"
would not have been useful.

Especially since that's not the complete answer. A key issue is there
is no clear semantic meaning to "ready to read" with respect to a
normal file. Some may want it to mean that their pointer is no longer
at the EOF (as seems to be the case here). Some may want it to mean
that the data will not need to be read from the disk (as most people
claim to want).

DS

Kaz Kylheku

unread,

Feb 2, 2010, 3:47:11 PM2/2/10

to

On 2010-02-02, Rainer Weikusat <rwei...@mssgmbh.com> wrote:
> Sebastian Hauer <ha...@psicode.com> writes:
>> Thank you for your reply.
>> David Schwartz wrote:
>>> On Jan 29, 2:10 pm, Sebastian Hauer <ha...@psicode.com> wrote:
>>>
>>>> I was wondering if it is possible to use epoll with "real" (non-socket)
>>>> file descriptors. I am unsuccessfully trying to use it to do some
>>
>>> What you are trying to do doesn't make any sense.
>> You are probably right, I might have been looking at this issue too
>> closely. Maybe AIO would have been a better way to accomplish what I
>> originally intended. Which is to write a simple but fast
>> "non-blocking" log service. I did not want to use a blocking write
>> from the logging thread which lead me down this lane
>
> There is no such thing as 'a blocking write to a regular file'.

When a regular file is opened, the O_SYNC flag can be supplied to the open
system call, whose intent is to arrange for blocking writes, which wait until
the data is actually on disk.

Moreover, even non-synchronous writes to the cache are not always non-blocking.
This is only the case when a buffer is immediately available for the write.

David Schwartz

unread,

Feb 3, 2010, 2:50:25 AM2/3/10

to

On Feb 2, 12:47 pm, Kaz Kylheku <kkylh...@gmail.com> wrote:

> When a regular file is opened, the O_SYNC flag can be supplied to the open
> system call, whose intent is to arrange for blocking writes, which wait until
> the data is actually on disk.
>
> Moreover, even non-synchronous writes to the cache are not always non-blocking.
> This is only the case when a buffer is immediately available for the write.

Right, which explains why you can't use anything like 'epoll' to make
them non-blocking. In the O_SYNC case, it would defeat the point of
O_SYNC. In the "out of buffer space" case, there is no guarantee that
by the time you get back around to calling 'write' the buffer space is
still available and no way to have a write to a regular file that
won't wait for buffer space.

In sum, you'd need a lot of pieces to make this useful, and many of
those pieces do not exist. However, there is AIO, I/O threads, and
things like FAM, depending on what you really want.

DS

Rainer Weikusat

unread,

Feb 3, 2010, 8:43:41 AM2/3/10

to

Kaz Kylheku <kkyl...@gmail.com> writes:
> On 2010-02-02, Rainer Weikusat <rwei...@mssgmbh.com> wrote:
>> Sebastian Hauer <ha...@psicode.com> writes:
>>> Thank you for your reply.
>>> David Schwartz wrote:
>>>> On Jan 29, 2:10 pm, Sebastian Hauer <ha...@psicode.com> wrote:
>>>>
>>>>> I was wondering if it is possible to use epoll with "real" (non-socket)
>>>>> file descriptors. I am unsuccessfully trying to use it to do some
>>>
>>>> What you are trying to do doesn't make any sense.
>>> You are probably right, I might have been looking at this issue too
>>> closely. Maybe AIO would have been a better way to accomplish what I
>>> originally intended. Which is to write a simple but fast
>>> "non-blocking" log service. I did not want to use a blocking write
>>> from the logging thread which lead me down this lane
>>
>> There is no such thing as 'a blocking write to a regular file'.
>
> When a regular file is opened, the O_SYNC flag can be supplied to the open
> system call, whose intent is to arrange for blocking writes, which wait until
> the data is actually on disk.

And the name of this flag is O_SYNC and not O_BLOCK because it is not
the opposite of O_NONBLOCK: The purpose is not to have the process
block until 'something of interest' has happened, whenever that might
be, but to write date to some 'backing store medium'
synchronously. SUS defines synchronized I/O completion as

The state of an I/O operation that has either been
successfully transferred or diagnosed as unsuccessful.

Additionally, this enforces an ordering on independent write
requests.

Much of the recurring confusion about this could be avoided when the
term 'blocking' was not used to describe to completely different
scenarios:

- blocking as opposite of O_NONBLOCK mode, meaning, the
process is put to sleep for an indefinite time in the kernel

- blocking in the sense of 'the process needs to wait for a
"slow", locally attached peripheral' which, except that it
is 'slow', is expected to behave in completey deterministic
ways

The kernel contains a considerable amount of code whose only purpose
is to avoid the second case whenever possible and generally,
applications are not supposed to be concerned with that, while they
necessarily have to be concerned with the first case.