Reading a directory asynchronously: getdents() ?

Everton da Silva Marques

unread,

Apr 16, 2003, 2:25:54 PM4/16/03

to

The task at hand is to read a directory content in an asynchronous way;
i.e. it's not acceptable to block on readdir() or something.

I'm considering to use: open(), select(), getdents(), etc.

However, under Linux/gcc/glibc, I'm failing to use getdents():

cc1: warnings being treated as errors
dir.c: In function `on_dir_read':
dir.c:88: warning: implicit declaration of function `getdents'
make: *** [dir.o] Error 1

The getdents() function is documented in the man pages, but
it seems it should not be used by applications.

How could directories be read asynchronously, other than by
using getdents()?

vi...@parcelfarce.linux.theplanet.co.uk

unread,

Apr 16, 2003, 5:14:39 PM4/16/03

to

In article <cc8d795c.03041...@posting.google.com>,

Everton da Silva Marques <ever...@yahoo.com.br> wrote:
>The task at hand is to read a directory content in an asynchronous way;
>i.e. it's not acceptable to block on readdir() or something.
>
>I'm considering to use: open(), select(), getdents(), etc.

getdents() is _not_ asynchronous.

Everton da Silva Marques

unread,

Apr 17, 2003, 2:21:07 PM4/17/03

to

vi...@parcelfarce.linux.theplanet.co.uk wrote in message news:<b7kh3v$ako$1...@parcelfarce.linux.theplanet.co.uk>...

The term 'asynchronous' was used in the sense of
relying in event notification (select(), poll(), etc)
of read availability over the file descriptor.

Is getdents() available to applications? Are there
other alternatives?

vi...@parcelfarce.linux.theplanet.co.uk

unread,

Apr 17, 2003, 10:17:28 PM4/17/03

to

In article <cc8d795c.03041...@posting.google.com>,
Everton da Silva Marques <ever...@yahoo.com.br> wrote:
>vi...@parcelfarce.linux.theplanet.co.uk wrote in message news:<b7kh3v$ako$1...@parcelfarce.linux.theplanet.co.uk>...
>> In article <cc8d795c.03041...@posting.google.com>,
>> Everton da Silva Marques <ever...@yahoo.com.br> wrote:
>> >The task at hand is to read a directory content in an asynchronous way;
>> >i.e. it's not acceptable to block on readdir() or something.
>> >
>> >I'm considering to use: open(), select(), getdents(), etc.
>>
>> getdents() is _not_ asynchronous.
>
>The term 'asynchronous' was used in the sense of
>relying in event notification (select(), poll(), etc)
>of read availability over the file descriptor.

Again, getdents() is not asynchronous in that sense. Moreover, there is
no nonblocking mechanisms for reading directory contents - never had
been and not likely to appear; too much locking-related PITA to implement
and no visible gain.

What are you actually trying to do?

Kasper Dupont

unread,

Apr 18, 2003, 4:11:51 AM4/18/03

to

vi...@parcelfarce.linux.theplanet.co.uk wrote:
>
> Again, getdents() is not asynchronous in that sense. Moreover, there is
> no nonblocking mechanisms for reading directory contents - never had
> been and not likely to appear; too much locking-related PITA to implement
> and no visible gain.

If you need it, you could fork a process that will do synchronous reading
of directories, and then do asynchoronous communication with that process.

--
Kasper Dupont -- der bruger for meget tid på usenet.
For sending spam use mailto:aaa...@daimi.au.dk
for(_=52;_;(_%5)||(_/=5),(_%5)&&(_-=2))putchar(_);

David W Noon

unread,

Apr 18, 2003, 6:18:47 AM4/18/03

to

On Friday 18 Apr 2003 09:11 in <3E9FB347...@daimi.au.dk>, Kasper Dupont
(kas...@daimi.au.dk) wrote:

> vi...@parcelfarce.linux.theplanet.co.uk wrote:
>>
>> Again, getdents() is not asynchronous in that sense. Moreover, there is
>> no nonblocking mechanisms for reading directory contents - never had
>> been and not likely to appear; too much locking-related PITA to implement
>> and no visible gain.
>
> If you need it, you could fork a process that will do synchronous reading
> of directories, and then do asynchoronous communication with that process.

A thread would be even simpler. Welcome to the world of OS/2!

This secondary thread apporach has been /de riguer/ in OS/2 development for
asynchronous I/O ever since IBM revealed that the API that provided this
facility in early (i.e. 16-bit) versions of OS/2 simply wrapped a secondary
thread.

There is no reason why Linux developers cannot use this same approach. The
threads even share virtual addresses, so there is no need to have a special
shared memory area for data exchange. Much simpler than forking out.

--
Regards,

Dave
======================================================
dwn...@spamtrap.ntlworld.com (David W Noon)
Remove spam trap to reply via e-mail.
======================================================

Kasper Dupont

unread,

Apr 18, 2003, 8:43:51 AM4/18/03

to

David W Noon wrote:
>
> On Friday 18 Apr 2003 09:11 in <3E9FB347...@daimi.au.dk>, Kasper Dupont
> (kas...@daimi.au.dk) wrote:
>
> > vi...@parcelfarce.linux.theplanet.co.uk wrote:
> >>
> >> Again, getdents() is not asynchronous in that sense. Moreover, there is
> >> no nonblocking mechanisms for reading directory contents - never had
> >> been and not likely to appear; too much locking-related PITA to implement
> >> and no visible gain.
> >
> > If you need it, you could fork a process that will do synchronous reading
> > of directories, and then do asynchoronous communication with that process.
>
> A thread would be even simpler. Welcome to the world of OS/2!

Sure there are advantages from a thread:
- Speed
- Same current directory nice if directory reading thread is being reused
- No need to transfer data between threads except from a litle synchronization.

But there are disadvantages as well:
- Portability
- Possible race conditions
- Possible non-reentrant library functions

vi...@parcelfarce.linux.theplanet.co.uk

unread,

Apr 18, 2003, 8:49:27 AM4/18/03

to

In article <7105n-...@my-pc.ntlworld.com>,

David W Noon <dwn...@spamtrap.ntlworld.com> wrote:
>On Friday 18 Apr 2003 09:11 in <3E9FB347...@daimi.au.dk>, Kasper Dupont
>(kas...@daimi.au.dk) wrote:
>
>> vi...@parcelfarce.linux.theplanet.co.uk wrote:
>>>
>>> Again, getdents() is not asynchronous in that sense. Moreover, there is
>>> no nonblocking mechanisms for reading directory contents - never had
>>> been and not likely to appear; too much locking-related PITA to implement
>>> and no visible gain.
>>
>> If you need it, you could fork a process that will do synchronous reading
>> of directories, and then do asynchoronous communication with that process.
>
>A thread would be even simpler. Welcome to the world of OS/2!

Threads _are_ processes on Linux. OS/2 had been misdesigned badly
enough to require separation between these (mostly due to 286-induced
braindamage early in its history). Linux had avoided that [tc]rap.

It's not immediately obvious whether it's better to share address
spaces or to have them separate in this case - depends on the way
you implement passing the data, exclusion and notification mechanism.
Might be either way.

David W Noon

unread,

Apr 18, 2003, 2:51:00 PM4/18/03

to

On Friday 18 Apr 2003 13:49 in
<b7os8n$ckd$1...@parcelfarce.linux.theplanet.co.uk>,
vi...@parcelfarce.linux.theplanet.co.uk
(vi...@parcelfarce.linux.theplanet.co.uk) wrote:

> In article <7105n-...@my-pc.ntlworld.com>,
> David W Noon <dwn...@spamtrap.ntlworld.com> wrote:
>>On Friday 18 Apr 2003 09:11 in <3E9FB347...@daimi.au.dk>, Kasper
>>Dupont (kas...@daimi.au.dk) wrote:
>>
>>> vi...@parcelfarce.linux.theplanet.co.uk wrote:
>>>>
>>>> Again, getdents() is not asynchronous in that sense. Moreover, there
>>>> is no nonblocking mechanisms for reading directory contents - never had
>>>> been and not likely to appear; too much locking-related PITA to
>>>> implement and no visible gain.
>>>
>>> If you need it, you could fork a process that will do synchronous
>>> reading of directories, and then do asynchoronous communication with
>>> that process.
>>
>>A thread would be even simpler. Welcome to the world of OS/2!
>
> Threads _are_ processes on Linux.

They have separate PIDs but share virtual storage. From a memory management
point of view, a thread is nowhere near as expensive as a forked process.

> OS/2 had been misdesigned badly
> enough to require separation between these (mostly due to 286-induced
> braindamage early in its history). Linux had avoided that [tc]rap.

That's a matter of opinion. If you look at the systems on which concurrent
subroutine execution (orignally called "subtasking") arose, you will find
that none of them implement it the way Linux does. If that approach were
wrong then I would expect after 40 years it would have been fixed. [That's
right, mainframes were doing concurrent subtasks in the early-to-mid
1960s.]

I am not advocating OS/2 or mainframes here, and this is not an advocacy
newsgroup. I am not even saying that the Linux threading model is
intrinsically defective -- quite the opposite. The current Linux threading
model works well enough that it is generally preferable to forking when the
asynchronous activities are closely related.

[If you re-read my original post, you will see that my thrust was that the
use of threads has been a software design requirement for OS/2 developers
for some decade or more. Since I'm a mainframe dinosaur, perhaps I should
have mentioned OS/360 and OS/VS1 as well, but I expected few would
recognize those names and those systems did not impose the requirement of
concurrent execution the way OS/2 does. My post was about multi-threaded
application design, not subtasking/threading models as implemented by any
given OS.]

> It's not immediately obvious whether it's better to share address
> spaces or to have them separate in this case - depends on the way
> you implement passing the data, exclusion and notification mechanism.
> Might be either way.

The thread routine is, in practice, a subroutine of the calling program; it
just runs asynchronously. I cannot fathom how anybody could conclude that
address space separation is in any way beneficial in that scenario. But it
certainly makes data exchange slower.

My [rule-of-thumb] approach is to use fork() when I want to do something
completely different from the current task and the current task is not
dependent on the results. Otherwise I use a thread. Like most
rules-of-thumb, exceptions sometimes arise.

After all, this message thread is about reading a filesystem directory
asynchronously -- a trivial task -- not running a separate
DB2/Oracle/whatever subsystem (or anything else major). A simple thread
routine that wraps scandir() would probably be ideal.

Kai Henningsen

unread,

Apr 19, 2003, 8:20:00 AM4/19/03

to

dwn...@spamtrap.ntlworld.com (David W Noon) wrote on 18.04.03 in <l1u5n-...@my-pc.ntlworld.com>:

> On Friday 18 Apr 2003 13:49 in
> <b7os8n$ckd$1...@parcelfarce.linux.theplanet.co.uk>,
> vi...@parcelfarce.linux.theplanet.co.uk
> (vi...@parcelfarce.linux.theplanet.co.uk) wrote:

> > OS/2 had been misdesigned badly
> > enough to require separation between these (mostly due to 286-induced
> > braindamage early in its history). Linux had avoided that [tc]rap.
>
> That's a matter of opinion. If you look at the systems on which concurrent
> subroutine execution (orignally called "subtasking") arose, you will find
> that none of them implement it the way Linux does. If that approach were
> wrong then I would expect after 40 years it would have been fixed. [That's
> right, mainframes were doing concurrent subtasks in the early-to-mid
> 1960s.]

And it has been so fixed.

In Linux.

Which, incidentally, does also run on mainframes these days.

Of course, the real point might well be that Linux processes are often
faster than other OS' threads ...

> [If you re-read my original post, you will see that my thrust was that the
> use of threads has been a software design requirement for OS/2 developers
> for some decade or more. Since I'm a mainframe dinosaur, perhaps I should
> have mentioned OS/360 and OS/VS1 as well, but I expected few would

As long as you don't mention (mainframe) DOS ...

> > It's not immediately obvious whether it's better to share address
> > spaces or to have them separate in this case - depends on the way
> > you implement passing the data, exclusion and notification mechanism.
> > Might be either way.
>
> The thread routine is, in practice, a subroutine of the calling program; it
> just runs asynchronously. I cannot fathom how anybody could conclude that
> address space separation is in any way beneficial in that scenario. But it
> certainly makes data exchange slower.

Well, the fact that it *is* a subroutine is already a design decision.

With DNS procesing (related but even more likely to be slow), the separate
process model seems to be fairly widespread - though that may be a
portability issue.

> My [rule-of-thumb] approach is to use fork() when I want to do something
> completely different from the current task and the current task is not
> dependent on the results. Otherwise I use a thread. Like most
> rules-of-thumb, exceptions sometimes arise.

"Completely different" is, of course, subjective. You might, for example,
want a common directory reading daemon which is usable from a dozen
different tasks, and implements caching.

Or you might not.

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)

Everton da Silva Marques

unread,

Apr 22, 2003, 11:37:21 AM4/22/03

to

vi...@parcelfarce.linux.theplanet.co.uk wrote in message news:<b7nn7o$d1b$1...@parcelfarce.linux.theplanet.co.uk>...

> In article <cc8d795c.03041...@posting.google.com>,
> Everton da Silva Marques <ever...@yahoo.com.br> wrote:
> >vi...@parcelfarce.linux.theplanet.co.uk wrote in message news:<b7kh3v$ako$1...@parcelfarce.linux.theplanet.co.uk>...
> >> In article <cc8d795c.03041...@posting.google.com>,
> >> Everton da Silva Marques <ever...@yahoo.com.br> wrote:
> >> >The task at hand is to read a directory content in an asynchronous way;
> >> >i.e. it's not acceptable to block on readdir() or something.
> >> >
> >> >I'm considering to use: open(), select(), getdents(), etc.
> >>
> >> getdents() is _not_ asynchronous.
> >
> >The term 'asynchronous' was used in the sense of
> >relying in event notification (select(), poll(), etc)
> >of read availability over the file descriptor.
>
> Again, getdents() is not asynchronous in that sense. Moreover, there is
> no nonblocking mechanisms for reading directory contents - never had
> been and not likely to appear; too much locking-related PITA to implement
> and no visible gain.
>
> What are you actually trying to do?

I was trying to read a directory inside a select()-driven event-loop,
and had assumed that monitoring the directory FD for read was the
correct approach. Now I believe that to monitor files' FD for read
availability is inocuous, as files are always "ready" to be read.

Maybe multi-threading offers a better mechanism for reading files
(and directories) asynchronously.