adaptive replacement better than Working Set?

Paul Wilson

не прочитано,

29 авг. 1989 г., 03:34:3629.08.1989

–

I'm looking for information on adaptive virtual memory replacement algorithms,
like the "learning" replacement algorithm used in the Ferranti ATLAS.

That algorithm used observed reference patterns to give different pages
different caching weights, in hopes of caching regularly-accessed pages.

It seems to me that such an adaptive algorithm could perform significantly
better than the ubiquitous LRU and WS policies. The policy I have in
mind is *not* as loop-oriented as the ATLAS algorithm, and is based
on a more refined notion of frequency of access.

I realize that the ATLAS algorithm only worked better than LRU for a
minority of programs, but I think I know why, and my algorithm would
do better for typical programs.

My impression is that the only similar algorithm ever tested to any real degree
was in fact exactly the same algorithm, (as published in IRE Transactions
in 1962(?)). The simulation was part of Belady's study of replacement
algorithms (IBM Syst. J., 1966(?)).

If different adaptive algorithms have been tested, so that the idea is
actually known to be wrongheaded, I would very much like to know about
it. If not, it seems that the idea was prematurely laid to rest after
testing only one flawed instance of a family of algorithms. In that
case some similar algorithms may in fact be better than LRU or WS, and
everybody's been going the wrong way for 15 or 20 years by assuming that
recency of reference is *the* best available predictor of future
referencing.

I know that working set works very well, but I think you could
do significantly better. Not dramatically better, mind you, but enough
to make it worthwhile. And even if WS is unbeatable for general-purpose
OS applications, adaptive algorithms may be useful
for programs that have wacky locality characteristics, like most
garbage collected heaps and persistent virtual memories used to
implement databases. One algorithm I've been toying with adaptively
shifts toward LRU for "normal" programs, and towards a DBMS-like "hot set"
strategy when locality suddenly goes to pot. That way, if you scan through
an array (or memory mapped file) bigger than memory, it doesn't
displace your whole working set for nothing.

(By the way, I *have* read Denning's argument as to why nobody will ever
do much better than WS ("Working Sets Past and Present," IEEE TSWE 1980).
I'm not quite convinced yet, though. I wonder about the problem of
a single dominant process making WS degenerate into (fixed-space) LRU,
so that something has to be evicted before its time. And Denning
argues from data on programs with "marked" phase changes, when more
than half of all page faults come in the periods in between.)

Any info relevant to this would be *very* helpful. I'm wondering if
more tests were done on the ATLAS than were reported, particularly
tests with variant replacement algorithms that might not have the
same flaw.

For those who know the ATLAS algorithm, here's the flaw I'm talking
about. That system assumes that patterns of references to a given
page are roughly periodic, and that once a page has stopped being
referenced in the same period, it's probably not going to be
referenced for a long time. E.g., a page may be referenced once
every iteration of an outer loop, while several pages are scanned
by an inner loop, e.g., when forming a cartesian product. According
to Denning ["Virtual Memory," _Computing_Surveys_, 1970], this strategy
only worked well for stereotypical loop programs.

My hypothesis is similar, but somewhat more sophisticated. I also assume
that future reference patterns are likely to be similar to past
reference patterns, *but* with two differences: 1) "periodicity" is very
approximate, and 2) reference patterns have multiple frequency components,
only some of which are relevant to a replacement policy.

Take this example: suppose some program accesses some page of memory
every millisecond or so while it's executing, and that there's usually
an interval of a tenth of a second between calls to that program. You
may notice the 1000 Hz frequency and ditch the page if goes unreferenced
for two milliseconds, on the assumption that the current clump of references
is finished. But that's a mistake because you fail to notice the 100 Hz
component that would let you predict that it would be touched again
in about a tenth of a second. (Assuming here that pages typically
stay in memory for at least a tenth of a second.)

You want to key off of the lower frequencies, unless they're so low
that they indicate that the page won't be referenced for a very long
time, so that it's better to use the space some other way in the
meantime.

The problem with the ATLAS algorithm (as published) is that it keys off of
the highest observable frequency, which is probably too short to be interesting
to a replacement policy. (By "observable frequency" I just mean the
resolution of the system's LRU approximation -- how often the use bits
are checked, or whatever. Smaller interreference gaps than that will not be
noticed, of course.) It also assumes that references are fairly literally
periodic, rather than merely having approximate frequency components; this
may also eject pages prematurely.

If anybody knows of any relevant experiments, or if I've misunderstood
the ATLAS algorithm, please drop me a note at the address below.

Thanks prematurely,

Paul

Paul R. Wilson
Software Systems Laboratory lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154) wil...@bert.eecs.uic.edu
Box 4348 Chicago,IL 60680
Paul R. Wilson
Software Systems Laboratory lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154) wil...@bert.eecs.edu
Box 4348 Chicago,IL 60680

Paul Wilson

не прочитано,

30 авг. 1989 г., 14:34:1230.08.1989

–

A couple of people have asked for more info on the ATLAS:

The ATLAS "one-level" store was, I believe, the very first
demand-paged virtual memory design, and its replacement
algorithm was the very first VM replacement algorithm.

Here's a full citation, as a couple of people have requested:

Kilburn, T. et al., "One-level storage system," IRE Transactions
EC-11, 2 (April, 1962), 223-235.

Note that this predated the terms "virtual memory" and "IEEE".
(Is IEEE the same thing as the Institute of Radio Engineers, expanded
and renamed?)

They called the system a "one-level" store, because that's the
abstraction it implements. It is of course a hierarchical memory.

Paul Wilson

не прочитано,

30 авг. 1989 г., 17:53:0930.08.1989

–

A couple of people have rather liked my idea for an adaptive replacement
policy, but expressed doubts that it could be implemented efficiently.
I think it can, without any special hardware (no use bits, even) and
with only about a hundred or so statements worth of extra bookkeeping overhead
at each page fault. The software would be a bit more complicated, maintaining
an extra queue or two, but with only a small time cost.

The implementation I'm thinking of would be a variant of the Babouglu-Ferrari
LRU approximation. (That is, an initial FIFO queue that feeds into an
LRU queue which is access-protected; when a page in the LRU part is
accessed, an exception moves it to the head of the FIFO queue again. This
gives a good approximation of LRU without the need for use bits. The
access-protected LRU queue gives the LRU effect, while the initial
FIFO queue gives pages a "grace period" in which they don't incur
any more access-protection faults. [Babouglu & Ferrari,
"Two-level replacement decisions in paging stores,"
IEEE Trans. on Computers, 32:12, December 1983].)

With a scheme like this, it's easy to keep a record for each page in
memory saying approximately how long the longest gap between references
is, so far. Whenever a page goes unreferenced for a long time, it
tends to get moved into the LRU queue, and the longer it goes
unreferenced, the further it gets through the queue. So whenever
you have a fault on one of those pages, you update its record (if
it got further than it had before).

There are two ways you can use this information. One is to evict
some pages before they get to the end of the queue -- pages that
get pretty far into the queue without being referenced again
are good candidates for early eviction.

Another possibility is to keep these records for pages that have
*already* been evicted -- pages that tend to cause faults because
they are re-referenced shortly after eviction can be given
special treatment (more time in memory). That way, if the policy
screws up on a page once, it learns from it. Such pages can
be kept in a separate "troublemaker" LRU queue that presumably moves at a
lower speed. (Or pages could be marked so that they can run
through the main queue more than once, decrementing a count each time.)

This system could pose a problem because you have to keep records
for pages that aren't in memory, but a good approximation should be easy.
You don't actually have to keep this record for all pages, just
a queue of the last N pages, where N is comparable to the
number of pages in memory. (Any pages that are re-referenced over
a larger interval than that probably should have been evicted anyway,
since the space could be put to better use in the meantime. The
only problem with this approximation is that over the very long
term you may "forget" about some troublemaker pages and have to
learn about them again. But a page that is periodically re-referenced
at an interval, say, double the LRU queue time will incur one
extra page fault and thereafter be retained until it goes
a much longer period unreferenced.)

The overhead here seems pretty trivial. You keep an LRU queue
of recently-evicted pages. This queue is actually also a hash table
(You can implement such a thing so that queue and hash table operations
are all constant-time, a few statements each). In addition, for each
page in either the in-memory queues or recently-evicted queue,
you keep a behavior record which is just the longest observed
interreference gap. At each page fault, you just check to see if
the page is in the recently-evicted table/queue. If so, you put it
in the troublemaker queue instead of the regular queue (and
bump the LRU page out of the troublemaker queue).

A slight added complication: you want the regular queue to steal
pages from the troublemaker queue if the troublemaker pages are
going unreferenced, and vice versa if they're heavily referenced.
Then the system will tend to act like LRU in situations where
LRU works well, and like a "hot set" caching strategy otherwise.
(That is, only allocating limited "buffer space" to operations
that have such poor locality that more buffer space would do
no good, e.g. scanning once through more data than will fit in memory.)

(This inter-queue page stealing is easy enough to do in an approximate
way if both are really FIFO-LRU pairs of queues; if the rear of
a queue is not being hit and causing an exception pretty regularly,
the queue can afford to be a little smaller. You probably want
to rig it so that the troublemaker queue has a period some fixed
constant longer than the regular queue's.)

These policies seem to me to be easy enough to implement, at least
in terms of fixed-space policies; I haven't thought much about
the equivalent modification to something like WS. The differential-
treatment schemes may also have problems that are similar to the
problems fixed-space LRU has relative to (variable-space) WS; at
some times a given queue position indicates a different time period
than others, because the rate of page faults varies.

On the other hand, it seems to me that this may be *exactly* what
you want, because it may adapt to aggregate locality changes.
When you switch working sets, you fault in a bunch of pages
of the new working set. This tends to move the old working
set toward the end of the regular queue. Any parts of the old working
set that are shared with the new working set will then tend to be
referenced in that part of the queue, and will be given
special treatment. Thus the pages that are common to multiple
working sets will tend to be preferentially retained in memory.

Any comments?

-- Paul