Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Solaris: Direct I/O

7 views
Skip to first unread message

Joseph Blackette

unread,
Sep 16, 1996, 3:00:00 AM9/16/96
to

Does anyone know how to implement Direct I/O (Bypass filesystem buffers)
under solaris? Under IRIX for example, if you are accessing an efs filesystem
you can set the O_DIRECT flag on the open(2) command.

Thanks in advance.

Joe

Dave Johnson

unread,
Sep 17, 1996, 3:00:00 AM9/17/96
to
By "bypass filesystem buffers," I assume you mean non-canonical
input mode, where you don't need a carriage return to input data
from the stdin to the program. If yes, then see ioctl(2), and
termio(7I). If you're still confused on how to turn-off canonical
behavior, let me know, and I can dig-up some old code for an
example.
-Dave

Scott Byer

unread,
Sep 19, 1996, 3:00:00 AM9/19/96
to

Dave Johnson writes:

Dave> Joseph Blackette wrote:
>> Does anyone know how to implement Direct I/O (Bypass filesystem buffers)
>> under solaris? Under IRIX for example, if you are accessing an efs
>> filesystem you can set the O_DIRECT flag on the open(2) command.
>>
>> Thanks in advance.
>>
>> Joe

Dave> By "bypass filesystem buffers," I assume you mean non-canonical input
Dave> mode, where you don't need a carriage return to input data from the
Dave> stdin to the program. If yes, then see ioctl(2), and termio(7I). If
Dave> you're still confused on how to turn-off canonical behavior, let me
Dave> know, and I can dig-up some old code for an example. -Dave

No, no, no. By "bypass filesystem buffers", he means just that. We'd like
the same damn thing. We want I/O that is as down-and-dirty to the hardware
_without_ having to have our own partition and drivers.


On the SGI, if you align your memory just so, and make the request a
multiple of the disk block size, and the file descriptor is opened _DIRECT,
then the data gets DMAed _directly_ into your buffer from disk - no file
system cache, nothing. This would be effectively equivalent of doing a mmap
with an ACCESS_SEQUENTIAL flag set, and mmap would be a valid replacement if
it didn't have other performance problems. DirectIO is also a big win for
predictive I/O, since the kernel file cache is no longer a bottleneck for
multiple reads by multiple threads.


In the lab, we had an SGI with XFS (which allows two outstanding I/Os per
device) and a couple of real-time set I/O threads cranking through Photoshop
data like you wouldn't believe, smooth as silk, darn near the theoretical
limit of the disks' I/O rate. Never got close to that on a Sun - stupid file
cache kept getting in the way.

(This would all be irrelevant if we had plug-in pagers! ;-)


--
Scott Byer, Senior Computer Scientist mailto:by...@adobe.com
Adobe Systems Incorporated, Mailstop W10 These are my opinions, and
345 Park Avenue do not necessarily reflect
San Jose, CA 95110-9704 the opinions of my employer.

Cameron Simpson

unread,
Sep 20, 1996, 3:00:00 AM9/20/96
to Dave Johnson

Dave Johnson <jsqu...@rain.org> writes:
| Joseph Blackette wrote:
| > Does anyone know how to implement Direct I/O (Bypass filesystem buffers)
| > under solaris? Under IRIX for example, if you are accessing an efs filesystem
| > you can set the O_DIRECT flag on the open(2) command.
| By "bypass filesystem buffers," I assume you mean non-canonical
| input mode, where you don't need a carriage return to input data
| from the stdin to the program. If yes, then see ioctl(2), and
| termio(7I). If you're still confused on how to turn-off canonical
| behavior, let me know, and I can dig-up some old code for an
| example.

That's not what he meant. He's not talking about ttys at all. Normally
written data are buffered by the kernal and written to disc as it
becomes ready. He wants to write direct to disc, or at least to block
until the write completes, or something of that ilk. He may want to
bypass the buffers because they're corrupt (like fscking the raw root
partition at boot time and rebooting without a sync). It may be for
performance reasons (bypassing data copies to get raw speed to disc,
perhaps for video).

It may be that O_{RSYNC,DSYNC,SYNC} may get him what he wants. If it's
efficiency, on some OSes a page-alligned I/O buffer never gets copied
at all, just remapped and DMAed direct. This combined with synchronous
I/O may be close to the O_DIRECT flag. (Alas, I no longer have an SGI
to hand, so I can't check out the actual semantics of O_DIRECT.)
- Cameron Simpson
cam...@research.canon.com.au, DoD#743
http://www.zip.com.au/~cs/
--
I am here by the will of the people ... and I will not leave until I get my
raincoat back. - Richard Kadrey, _Metrophage_

Matthias Kurz

unread,
Sep 20, 1996, 3:00:00 AM9/20/96
to

>Dave Johnson <jsqu...@rain.org> writes:
>| Joseph Blackette wrote:
>| > Does anyone know how to implement Direct I/O (Bypass filesystem buffers)
>| > under solaris? Under IRIX for example, if you are accessing an efs filesystem
>| > you can set the O_DIRECT flag on the open(2) command.
>| By "bypass filesystem buffers," I assume you mean non-canonical
>| input mode, where you don't need a carriage return to input data

>| from the stdin to the program. ...

>That's not what he meant. He's not talking about ttys at all. Normally
>written data are buffered by the kernal and written to disc as it
>becomes ready. He wants to write direct to disc, or at least to block
>until the write completes, or something of that ilk. He may want to
>bypass the buffers because they're corrupt (like fscking the raw root
>partition at boot time and rebooting without a sync). It may be for
>performance reasons (bypassing data copies to get raw speed to disc,
>perhaps for video).

>It may be that O_{RSYNC,DSYNC,SYNC} may get him what he wants. If it's
>efficiency, on some OSes a page-alligned I/O buffer never gets copied
>at all, just remapped and DMAed direct. This combined with synchronous
>I/O may be close to the O_DIRECT flag. (Alas, I no longer have an SGI
>to hand, so I can't check out the actual semantics of O_DIRECT.)

O_DIRECT
If set, all reads and writes on the resulting file descriptor will
be performed directly to or from the user program buffer, provided
appropriate size and alignment restrictions are met. Refer to the
F_SETFL and F_DIOINFO commands in the fcntl(2) manual entry for
information about how to determine the alignment constraints.
O_DIRECT is a Silicon Graphics extension and is only supported on
local EFS file systems.

I think, the goal is to speed up I/O and not slow it down by synchronous
r/w. As far as i know there is no such thing for the UFS filesystem.
I heard that something like direct I/O is supported by the Veritas
filesystem. The Veritas software comes with SSAs. You need to buy a extra
license to use it on non-SSA devices.

I'd be interested in _Numbers_. What is the difference between doing the
job with O_DIRECT and without it ? Is it really worth the efford ?

One may take a look at aio_read/write etc. Not me.


(mk)

--
Matthias Kurz; Fuldastr. 3; D-28199 Bremen; VOICE +49 421 53 600 47

0 new messages