Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Does directio(fd, DIRECTIO_ON) survive the file descriptor?

3 views
Skip to first unread message

Paul Eggert

unread,
Apr 13, 2004, 6:22:36 PM4/13/04
to
The directio(3C) man page says

"The advice argument is kept per file; the last caller of
directio() sets the advice for all applications using the file
associated with fildes."

I can see two ways to interpret this.

First, there's a per-file flag that any process can set or clear, and
this flag persists until the next reboot.

Second, if any process has an open file descriptor that has the
DIRECTIO_ON flag set, then all processes that read or write to the
corresponding file use directio semantics and bypass the buffer cache;
otherwise, they all use the default semantics.

Which interpretation is right? Or is there some other interpretation
that is even better?

Richard L. Hamilton

unread,
Apr 13, 2004, 7:28:22 PM4/13/04
to
In article <7whdvno...@sic.twinsun.com>,

The flag as to whether or not directio is enabled for a particular file is
in the in-core ufs inode data structure (see sys/fs/ufs_inode.h) or in the
in-core nfs rnode data structure (sys nfs/rnode.h). So it remains set as
long as the inode(rnode) remains in-core (while the file remains
continuously in use in any way by any process; perhaps also while the file
is in use by the kernel itself (i.e. the current traditional process
accounting file), or perhaps also while cached pages are still on the
freelist even with nobody using it (but if the flag was set, would there
be any?)). When the kernel frees the in-core inode(rnode) (because
nothing is using it), presumably the flag's state would be forgotten.

As an aside, remember that even if the flag is enabled, directio only
actually happens if the conditions of the particular read(2)/write(2)
are suitable. From directio(3c) (Solaris 9):

DIRECTIO_OFF
Applications get the default system behavior when
accessing file data.
[...]
The system behavior for DIRECTIO_OFF can change
without notice.

DIRECTIO_ON

The system behaves as though the application is not
going to reuse the file data in the near future. In
other words, the file data is not cached in the
system's memory pages.


When possible, data is read or written directly
between the application's memory and the device when
the data is accessed with read(2) and write(2) opera-
tions. When such transfers are not possible, the sys-
tem switches back to the default behavior, but just
for that operation. In general, the transfer is pos-
sible when the application's buffer is aligned on a
two-byte (short) boundary, the offset into the file is
on a device sector boundary, and the size of the
operation is a multiple of device sectors.

This advisory is ignored while the file associated
with fildes is mapped (see mmap(2)).

The system behavior for DIRECTIO_ON can change
without notice.
[...]
USAGE
Small sequential I/O generally performs best with
DIRECTIO_OFF.

Large sequential I/O generally performs best with
DIRECTIO_ON, except when a file is sparse or is being
extended and is opened with O_SYNC or O_DSYNC (see
open(2)).

The directio() function is supported for the ufs file system
type (see fstyp(1M)).


I have reason to believe that directio may not actually happen if the file
is sparse (which might make sense - that could certainly complicate
things). I haven't looked at what happens in a write that would extend
the file. Also, it looks like it has at least some effect on nfs file
access as well as ufs file access, and I seem to recall that it may be
supported in reasonably recent versions of VxFS. (the man page says that
it _is_ supported for ufs, but doesn't say one way or the other about
other file system types; but the support is (necessarily) at the filesystem
layer and not at the generic layer, so it need not be implemented by all
filesystem types)

Remember, the man page is quite clear that directio(3c) is just _advice_;
depending on other conditions (which you might want to know about to get
the full benefit, but might change without notice in future releases),
that advice may or may not have any effect. So only use it as a
performance hint (and consider making your program run-time configurable
as to whether or not it actually uses directio(3c), in case circumstances
should change such that it's no longer advantageous); do _not_ use it to
achieve anything else, i.e. it's definitely _not_ something that provides
any other (or indeed any) result (such as the assurance that the
data is committed to storage that O_SYNC provides).

--
mailto:rlh...@smart.net http://www.smart.net/~rlhamil

Jonathan Adams

unread,
Apr 13, 2004, 10:49:12 PM4/13/04
to
Paul Eggert <egg...@twinsun.com> wrote in message news:<7whdvno...@sic.twinsun.com>...

> The directio(3C) man page says
>
> "The advice argument is kept per file; the last caller of
> directio() sets the advice for all applications using the file
> associated with fildes."
>
> I can see two ways to interpret this.
>
> First, there's a per-file flag that any process can set or clear, and
> this flag persists until the next reboot.
>
> Second, if any process has an open file descriptor that has the
> DIRECTIO_ON flag set, then all processes that read or write to the
> corresponding file use directio semantics and bypass the buffer cache;
> otherwise, they all use the default semantics.

The first is correct -- it's a per-file flag. It is not necessarily
persistant until reboot, since the system is free to throw away the
vnode once there are no active handles for the file (but it is under
no obligation to do so). At that point, the behavior would reset to
the default semantics.

- jonathan

Paul Eggert

unread,
Apr 14, 2004, 11:02:03 AM4/14/04
to
At Tue, 13 Apr 2004 23:28:22 -0000, Richard.L...@mindwarp.smart.net (Richard L. Hamilton) writes:

> When the kernel frees the in-core inode(rnode) (because nothing is
> using it), presumably the flag's state would be forgotten.

Thanks for the explanation.

I was thinking of adding support for "dd iflags=direct oflags=direct"
to GNU dd, when it's running on Solaris. These GNU extensions to "dd"
let the user specify direct I/O as a performance hint.

I'd like to do this:

1. Inspect the current status of the directio flag.
2. If it's currently off, call directio(fd, DIRECTIO_ON).
3. Do the I/O.
4. When done, call directio(fd, DIRECTIO_OFF) if the flag was originally
off, to restore its original state.

However, I see no way to do (1). Too bad.

Also, step (4) can't be done if "dd" is aborted with kill -9 or
something like that.

Further: if some other program does (1) and (2) while we're doing (3),
and then does its (4) after we do (4), then the state won't be
restored correctly. This is unfortunate too.

Ouch. It's kind of awkward all around.

(Maybe Solaris could add support for open (... O_DIRECT), a la Linux?
It'd be more reliable, or at least more convenient. :-)


> definitely _not_ something that provides any other (or indeed any)
> result (such as the assurance that the data is committed to storage
> that O_SYNC provides).

That's a separate flag in GNU dd. You can use both flags at the same
time, g.g. "dd oflag=direct,sync".

FYI, here are all the options recently added to GNU dd.
Perhaps the Solaris dd implementers could do the same thing.

dd has new conversions for the conv= option:

nocreat do not create the output file
excl fail if the output file already exists
fdatasync physically write output file data before finishing
fsync likewise, but also write metadata

dd has new iflag= and oflag= options with the following flags:

append append mode (makes sense for output file only)
direct use direct I/O for data
dsync use synchronized I/O for data
sync likewise, but also for metadata
nonblock use non-blocking I/O
nofollow do not follow symlinks

Solaris doesn't have O_NOFOLLOW, but it does have the other open
options. (Solaris also has O_NDELAY but this is so close to
O_NONBLOCK that I'm not sure it's worth worrying about. And Solaris's
O_PRIV doesn't seem to be documented.)

Dragan Cvetkovic

unread,
Apr 14, 2004, 11:25:04 AM4/14/04
to
Paul Eggert <egg...@twinsun.com> writes:

>
> (Maybe Solaris could add support for open (... O_DIRECT), a la Linux?
> It'd be more reliable, or at least more convenient. :-)

How can you do (or not do) file buffering based only on a file descriptor?
Does Linux implementation really work as advertised (i.e. if 2 processes do
open(...|O_DIRECT) and 3 do open(...) without O_DIRECT on the same file,
would the first 2 get non-cached file context whereas the other 3 get the
usual behaviour)? How much more information you need to keep to establish
that?

Bye, Dragan

--
Dragan Cvetkovic,

To be or not to be is true. G. Boole No it isn't. L. E. J. Brouwer

!!! Sender/From address is bogus. Use reply-to one !!!

Jonathan Adams

unread,
Apr 15, 2004, 10:40:32 PM4/15/04
to
Paul Eggert <egg...@twinsun.com> wrote:
> And Solaris's O_PRIV doesn't seem to be documented.

Probably because it doesn't do anything -- it appears to have
mystically appeared sometime in 1989, but the only references are the
definition itself, and the code in pfiles(1) which decodes it (!).

- jonathan

Paul Eggert

unread,
Apr 16, 2004, 3:44:00 AM4/16/04
to
At Wed, 14 Apr 2004 11:25:04 -0400, Dragan Cvetkovic <m...@privacy.net> writes:

> How can you do (or not do) file buffering based only on a file descriptor?

I'd think that directio and O_DIRECT are supposed to be merely advice
to the system, so the operating system is free to do whatever it likes
so long as it preserves POSIX semantics. I.e., O_DIRECT should mean
"please don't bother to buffer this" but the OS is still free to
buffer if it wants to ignore your advice.

> Does Linux implementation really work as advertised

I doubt it. This issue got raised on the Linux kernel list recently
and there was some speculation about what the Linux kernel does;
nobody seemed to know, really.

So, as far as I can see, we have one system (the Linux kernel) that
has a much nicer API but a poorly-understood implementation, and
another system (Solaris) that has a extremely-awkward API with an
implementation that (I hope and presume) actually works.

Too bad we can't have the best of both worlds.

Dragan Cvetkovic

unread,
Apr 16, 2004, 9:43:09 AM4/16/04
to
Paul Eggert <egg...@twinsun.com> writes:

What about AIX (I know AIX = Ain't unIX)? From version 4.3 (I think) it
also supports open(...|O_DIRECT)?

Wrong group, I know.

Paul Eggert

unread,
Apr 16, 2004, 4:25:13 PM4/16/04
to
At Fri, 16 Apr 2004 09:43:09 -0400, Dragan Cvetkovic <m...@privacy.net> writes:

> What about AIX (I know AIX = Ain't unIX)? From version 4.3 (I think) it
> also supports open(...|O_DIRECT)?

With AIX, O_DIRECT takes effect only if all processes that are
currently accessing the file are using O_DIRECT. This is a fairly
straightforward rule. I suspect that it's the direction that
GNU/Linux is headed, since (now that you mention it) GNU/Linux seems
to have imported the idea from AIX and/or JFS.

0 new messages