posix_fadvise noreuse disables file caching

101 views
Skip to first unread message

Tijl Coosemans

unread,
Jan 19, 2012, 11:39:42 AM1/19/12
to freebsd...@freebsd.org
Hi,

I recently noticed that multimedia/vlc generates a lot of disk IO when
playing media files. For instance, when playing a 320kbps mp3 gstat
reports about 1250kBps (=10000kbps). That's quite a lot of overhead.

It turns out that vlc sets POSIX_FADV_NOREUSE on the entire file and
reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as if
O_DIRECT was specified during open(2), i.e. it disables all caching.
That means every 1028 byte read turns into a 32KiB read (new default
block size in 9.0) which explains the above numbers.

I've copied the relevant vlc code below (modules/access/file.c:Open()).
It's interesting to see that on OSX it sets F_NOCACHE which disables
caching too, but combined with F_RDAHEAD there's still read-ahead
caching.

I don't think POSIX intended for NOREUSE to mean O_DIRECT. It should
still cache data (and even do read-ahead if F_RDAHEAD is specified),
and once data is fetched from the cache, it can be marked WONTNEED.

Is it possible to implement it this way, or if not to just ignore
the NOREUSE hint for now?


/* Demuxers will need the beginning of the file for probing. */
posix_fadvise (fd, 0, 4096, POSIX_FADV_WILLNEED);
/* In most cases, we only read the file once. */
posix_fadvise (fd, 0, 0, POSIX_FADV_NOREUSE);
#if defined(HAVE_FCNTL)
/* We'd rather use any available memory for reading ahead
* than for caching what we've already seen/heard */
# if defined(F_RDAHEAD)
fcntl (fd, F_RDAHEAD, 1);
# endif
# if defined(F_NOCACHE)
fcntl (fd, F_NOCACHE, 1);
# endif
#endif

signature.asc

John Baldwin

unread,
Jan 20, 2012, 2:12:13 PM1/20/12
to freebsd...@freebsd.org, Tijl Coosemans
On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans wrote:
> Hi,
>
> I recently noticed that multimedia/vlc generates a lot of disk IO when
> playing media files. For instance, when playing a 320kbps mp3 gstat
> reports about 1250kBps (=10000kbps). That's quite a lot of overhead.
>
> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire file and
> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as if
> O_DIRECT was specified during open(2), i.e. it disables all caching.
> That means every 1028 byte read turns into a 32KiB read (new default
> block size in 9.0) which explains the above numbers.
>
> I've copied the relevant vlc code below (modules/access/file.c:Open()).
> It's interesting to see that on OSX it sets F_NOCACHE which disables
> caching too, but combined with F_RDAHEAD there's still read-ahead
> caching.
>
> I don't think POSIX intended for NOREUSE to mean O_DIRECT. It should
> still cache data (and even do read-ahead if F_RDAHEAD is specified),
> and once data is fetched from the cache, it can be marked WONTNEED.

POSIX doesn't specify O_DIRECT, so it's not clear what it asks for.

> Is it possible to implement it this way, or if not to just ignore
> the NOREUSE hint for now?

I think it would be good to improve NOREUSE, though I had sort of
assumed that applications using NOREUSE would do their own buffering
and read full blocks. We could perhaps reimplement NOREUSE by doing
the equivalent of POSIX_FADV_DONTNEED after each read to free buffers
and pages after the data is copied out to userland. I also have an
XXX about whether or not NOREUSE should still allow read-ahead as it
isn't very clear what the right thing to do there is. HP-UX (IIRC)
has an fadvise() that lets you specify multiple policies, so you
could specify both NOREUSE and SEQUENTIAL for a single region to
get read-ahead but still release memory once the data is read once.

--
John Baldwin
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

John Baldwin

unread,
Jan 25, 2012, 11:29:22 AM1/25/12
to freebsd...@freebsd.org, Tijl Coosemans

So I've came up with this untested patch. It uses
VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE region, and
leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED doesn't
do any good really for writes (it only flushes clean buffers), so I've
left write(2) operations as using IO_DIRECT still. Does this sound
reasonable? I've not yet tested this at all:

Index: vfs_vnops.c
===================================================================
--- vfs_vnops.c (revision 230331)
+++ vfs_vnops.c (working copy)
@@ -519,6 +519,7 @@ vn_read(fp, uio, active_cred, flags, td)
int error, ioflag;
struct mtx *mtxp;
int advice, vfslocked;
+ off_t offset;

KASSERT(uio->uio_td == td, ("uio_td %p is not td %p",
uio->uio_td, td));
@@ -558,19 +559,14 @@ vn_read(fp, uio, active_cred, flags, td)
switch (advice) {
case POSIX_FADV_NORMAL:
case POSIX_FADV_SEQUENTIAL:
+ case POSIX_FADV_NOREUSE:
ioflag |= sequential_heuristic(uio, fp);
break;
case POSIX_FADV_RANDOM:
/* Disable read-ahead for random I/O. */
break;
- case POSIX_FADV_NOREUSE:
- /*
- * Request the underlying FS to discard the buffers
- * and pages after the I/O is complete.
- */
- ioflag |= IO_DIRECT;
- break;
}
+ offset = uio->uio_offset;

#ifdef MAC
error = mac_vnode_check_read(active_cred, fp->f_cred, vp);
@@ -587,6 +583,10 @@ vn_read(fp, uio, active_cred, flags, td)
}
fp->f_nextoff = uio->uio_offset;
VOP_UNLOCK(vp, 0);
+ if (error == 0 && advice == POSIX_FADV_NOREUSE &&
+ offset != uio->uio_offset)
+ error = VOP_ADVISE(vp, offset, uio->uio_offset - 1,
+ POSIX_FADV_DONTNEED);
VFS_UNLOCK_GIANT(vfslocked);
return (error);

Julian Elischer

unread,
Jan 25, 2012, 5:56:55 PM1/25/12
to John Baldwin, freebsd...@freebsd.org, Tijl Coosemans
On 1/25/12 8:29 AM, John Baldwin wrote:
>
> So I've came up with this untested patch. It uses
> VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE region, and
> leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED doesn't
> do any good really for writes (it only flushes clean buffers), so I've
> left write(2) operations as using IO_DIRECT still. Does this sound
> reasonable?

That sounds like a good solution. If people want something from write they
can do it separately. For what it's worth, I would expect NOREUSE on
write to still do
write clustering but to free the buffer once it is written.

Tijl Coosemans

unread,
Jan 29, 2012, 10:08:10 AM1/29/12
to freebsd...@freebsd.org

The patch drastically improves vlc, but there's still a tiny overhead.
Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD buffer
size). With NOREUSE there's an extra transfer of 32KiB (block size).

signature.asc

John Baldwin

unread,
Jan 30, 2012, 9:36:45 AM1/30/12
to Tijl Coosemans, freebsd...@freebsd.org

This is probably because vlc is not reading on block boundaries, so the
noreuse is throwing away partial blocks at the end of a read that then have to
be re-read. We could maybe fix this by making FADV_DONTNEED only throw
away completely-contained blocks rather than completely-contained pages.
However, this will probably result in NOREUSE not actually throwing away
anything at all if an app always reads sub-blocksize chunks.

We could maybe make the case of vlc work ok in this case though by allowing
an extension where you can do 'posix_fadvise(SEQUENTIAL | NOREUSE)', and
in this case we could make the VOP_ADVISE(DONTNEED) in read() use an offset
of 0 rather than the start of the read request.

However, posix_fadvise() really is going to work best if the userland
application reads aligned FS blocks.

Ulrich Spörlein

unread,
Jan 31, 2012, 12:21:07 PM1/31/12
to John Baldwin, Tijl Coosemans, freebsd...@freebsd.org

I find it questionable in general that an application can tell the
system what to do wrt. caching. Perhaps I'm running 100s of VLC players
all on the same file and actually *do* want reads to be cached?

What happens if I seek back in the file? It has to do a potentially
high-latency read again. The system has a better overview of blocks that
are frequently being requested than any individual application.

I fully understand the intention, and in 99.99% of the cases, this data
*is* just being read once so there's no need to cache any reads for
actually requested data. But as the example shows, requested data is not
necessarily the data that lower layers have to fetch from the disk.

Perhaps taking to VLC people on why they think this is useful and where
it actually, measurably helped them would be interesting.

Sorry if this is all perfectly obvious
Uli

John Baldwin

unread,
Jan 31, 2012, 1:21:47 PM1/31/12
to Ulrich Spörlein, Tijl Coosemans, freebsd...@freebsd.org

There are certainly cases where the user can choose to run specific apps in
such a way where this makes sense, so the OS needs this functionality. As
to whether or not specific apps should use these APIs or if they should make
use of these APIs configurable, that is a question for each app (e.g. vlc).
However, the OS should provide the tools.

--
John Baldwin

Rick Macklem

unread,
Jan 31, 2012, 8:34:00 PM1/31/12
to John Baldwin, Tijl Coosemans, freebsd...@freebsd.org
I'd agree. However, there might be an argument for sysctl that tells the
OS to ignore the hints, so a sysadmin can work around a case where an
app runs poorly in their environment, due to the hint?

rick

Reply all
Reply to author
Forward
0 new messages