[9fans] Persistent cache for cfs

Fariborz 'Skip' Tavakkolian

unread,

May 12, 2001, 12:34:07 AM5/12/01

to

Any thoughts on how cfs could be made to work in disconnected
operations (i.e. persistent cache)?
Also, what limits the cache size?

pres...@plan9.bell-labs.com

unread,

May 12, 2001, 9:18:26 AM5/12/01

to

I'm not sure what you mean. It is a persistent cache.
When booted stand alone, you can always start it up
pointing to a remote file system just by typing
'cfs -a il!fileserver!9fs /n/fileserver',
for example.

However it's not a file cache, it's a byte cache. It stores
only the exact bytes you read or write, no more. If you need
something that acts as a file system when not connected, a stash
if you will, cfs isn't even close. Not only will it not have
parts of files you didn't read recently, but it also knows
nothing about directory structure storing cached data indexed
by qid and file offset.

What plan 9 could use is a stash, something that stores files
you accessed recently, perhaps picking them up lazily. That way,
you could disconnect and keep working. You'ld still need a
local file system like Unix's root file system because you need
some guaranteed files, or you could just provide a set of files
that the stash absolutely has to have, in addition to what it
caches. It would be a nice and useful project for someone to
do. There was a lot of good work some years ago at Columbia
and other places. No idea where it went.

Fariborz 'Skip' Tavakkolian

unread,

May 12, 2001, 1:42:34 PM5/12/01

to

Stash is exactly what I was shooting for. There are a couple
of distributed fs options on the Linux/*BSD side with OpenAFS (IBM) and
Coda (CMU); Both use a cache to continue operating when disconnected.
Their models have too much unnecessary overhead for Plan9.

Matthew Weigel

unread,

May 16, 2001, 4:43:23 AM5/16/01

to

This is my first message here, so bear with me. I don't run Plan 9 for
lack of hardware, but I worked on Steve Wynne's systems a little.

<9f...@cse.psu.edu> wrote:

>What plan 9 could use is a stash, something that stores files
>you accessed recently, perhaps picking them up lazily. That way,
>you could disconnect and keep working. You'ld still need a
>local file system like Unix's root file system because you need
>some guaranteed files, or you could just provide a set of files
>that the stash absolutely has to have, in addition to what it
>caches. It would be a nice and useful project for someone to
>do. There was a lot of good work some years ago at Columbia
>and other places. No idea where it went.

I've been looking at this the last few days, and it seems to me that a
nice way to do this would be stacked on top of kfs and fs. A system
could be installed onto local disk, and all 'required' files could thus
be ensured to be local.

If my understanding is correct, stashfs could have visible to itself
(or to two cooperating processes, rather, similarly to with Russ Cox's
rot13fs) both the fs file tree and the kfs file tree, and the 'upper
layer' would show the fs version if connected, and the kfs version if
disconnected; for special cases such as files that were edited both
locally and on the fs while the client was disconnected could be
presented as file.kfs-<modification-date> and
file.fs-<modification-date>, so that the user could see both and decide
what to do.

A simple configuration file could tell stashfs which hierarchies should
be completely pulled in (such as a home directory, or the directory of
your current projects), and which could not. Before copying a file to
the local stash, it could check for sufficient disk space, and if not
available, begin deleting least recently modified stashed files first.

It should be possible, I think, to have multiple kfs instances serving
files from multiple local partitions, right? It seems that it would be
much simpler to specify a maximum stash size, and/or have multiple
stashes, by creating a partition for each stash. Additionally, if it's
left completely up to stashfs to set up the kfs and fs namespaces
itself, you could avoid potential weird areas like a client process
being able to see the fs or kfs namespace without translation (I'm not
entirely clear on whether this would be possible or not).

If my understanding looks correct to people, and there aren't issues
I'm forgetting to consider, I could begin trying to write this based on
Russ Cox's rot13fs. It would be an interesting exercise to see how
close I can get without being able to test anything :)
--
Matthew Weigel
Research Systems Programmer
mcwe...@cs.cmu.edu

Douglas A. Gwyn

unread,

May 18, 2001, 4:35:41 AM5/18/01

to

Matthew Weigel wrote:
> ... for special cases such as files that were edited both

> locally and on the fs while the client was disconnected could be
> presented as file.kfs-<modification-date> and
> file.fs-<modification-date>, so that the user could see both and decide
> what to do.

More thought is needed before inventing warts like this.
This is just a concurrent-update issue, widely studied
already. It may be that nothing special needs to be done
about such occurrences; the last writer wins.

It seems to me that the only thing distinguishing a stash
from a normal filesystem from the user perspective is
that files have a new two-valued attribute, "temporary"
or "permanent", which is honored by the caching scheme.
Behind the scenes there must also be a mechanism for
specifying how to construct the stash filesystem, but
that can be as simple as an init rc script containing
some "copy permanent files into place" instructions.
You probably want additional specification capability
for what the stash needs to do when a nonexistent file
is accessed and also when the cache is full. Also
what to do upon termination, again possibly an rc
script that copies out updated files.

Douglas A. Gwyn

unread,

May 18, 2001, 10:50:57 AM5/18/01

to

"Douglas A. Gwyn" wrote:
> ... files have a new two-valued attribute, "temporary"
> or "permanent" ...

It occurs to me that this could be generalized to a
"priority level" treated uniformly by the caching
subsystem, with low-priority files more subject to
replacement in the cache. The cache "scheduler"
could also take into account access pattern (time
since last access or something more clever). One
could play with the cache scheduling algorithm
without massing with the filesystem representation
once the priority-level attribute existed.

Matthew Weigel

unread,

May 21, 2001, 4:39:30 AM5/21/01

to

Douglas A. Gwyn <DAG...@null.net> wrote:

>More thought is needed before inventing warts like this.
>This is just a concurrent-update issue, widely studied
>already. It may be that nothing special needs to be done
>about such occurrences; the last writer wins.

That might be fine, sure. I was more concerned with files accessed by
multiple people; I don't entirely like the idea of "shouldn't be a
problem, let's hope it isn't."

>You probably want additional specification capability
>for what the stash needs to do when a nonexistent file
>is accessed and also when the cache is full. Also
>what to do upon termination, again possibly an rc
>script that copies out updated files.

What nonexistant files? :)

The way I was seeing it, if you're not connected to an fs, then the
stashfs would only show files that existed in the stash.

When the stash is full, it would be a reasonable first pass to delete
the oldest files until sufficient space frees up; beyond that, looking
at other people's work in the area seems prudent.

Keeping a list of dirty files in the kfs seems reasonable - upon
termination, a script could remount the kfs volume and finish any
writes.

This may be going through too many gymnastic routines to avoid writing
a different filesystem on disk, but it seems like this would be an
excellent example of where stacking single-purpose, user-level
filesystems can give useful filesystem behavior.

Ronald G Minnich

unread,

May 21, 2001, 11:06:58 AM5/21/01

to

On Mon, 21 May 2001, Matthew Weigel wrote:

> When the stash is full, it would be a reasonable first pass to delete
> the oldest files until sufficient space frees up; beyond that, looking
> at other people's work in the area seems prudent.

you can check my web page for the autocacher
(http://www.acl.lanl.gov/~rminnich). This was a file cache.

but it was a readonly stash. It did however handle the cache full case,
and also the pathological 'that file is too big for the cache' case.

For consistency of stashes across a network of machines, you can check out
Intermezzo.

ron