Rethinking caching

Robert O'Callahan

unread,

Sep 18, 2006, 11:25:33 PM9/18/06

to

In the new text world, we'll be caching "gfxTextRun" objects that can
contain strings of positioned glyphs that we use to measure and paint
text. This is generally a good thing but we need a cache management
policy; there could be lots of these objects and they could be large.

Currently in the in-memory caches that I know about, we have a limit of
N bytes or K items, and evict items using a least-recently-used policy
or some approximation thereof. I've never liked this approach. Memory we
use to cache items that aren't being accessed could be used somewhere
else (either in Firefox or by other applications) to speed up other
things. It seems to me that the amount of memory we spend caching items
should be related to how often those items are actually being used. In
the limit, a quiescent Firefox should release all memory used by caches
to the operating system.

Now, that may actually happen in effect, if Firefox's caches get paged
out. But then we hit another issue, which is that paging in cache
contents is often more expensive than just taking a cache miss and
recreating the cached item. That will probably be true for text runs,
especially if we can't localize the textrun data in memory to minimize
seeks to the pagefile, which we can't because some of the cached data is
allocated by platform libraries.

I think the right way to think about this is that memory used for
caching has a cost proportional to time ... i.e., it's rented from the
system. We should bound that cost by evicting cached items after they
haven't been used for some time, even if the number of cached items is
small.

For the textruns, I'm thinking of starting eviction at around 5 seconds
of unuse, enough time to reflow and paint in most cases.

Rob

Axel Hecht

unread,

Sep 19, 2006, 12:48:50 AM9/19/06

to

If our cached data would be already paged out, evicting items would
actually page them back in first, right? I wonder if we have a good idea
on when OSes page out to get rid of our not-so-used items before that
happens.

Axel

Robert O'Callahan

unread,

Sep 19, 2006, 12:57:56 AM9/19/06

to

Axel Hecht wrote:
> If our cached data would be already paged out, evicting items would
> actually page them back in first, right? I wonder if we have a good idea
> on when OSes page out to get rid of our not-so-used items before that
> happens.

Most OSes don't support detection of page presence. Even if they did, it
would be hard to use. For caches where we can limit the lifetime of
unused items to no more than say ten seconds, we should mostly avoid
this problem.

Rob

Jonas Sicking

unread,

Sep 19, 2006, 7:49:16 PM9/19/06

to

Axel Hecht wrote:
> If our cached data would be already paged out, evicting items would
> actually page them back in first, right? I wonder if we have a good idea
> on when OSes page out to get rid of our not-so-used items before that
> happens.

I think if we were to just |free()| the cached items then they wouldn't
get paged in even if they were paged out. At least i'd hope that's the
case. However if the cached items are C++ objects that we |delete| and
in their dtor access a bunch of members then we'd probably end up
swapping in the memory.

/ Jonas

Neil

unread,

Sep 20, 2006, 4:41:03 AM9/20/06

to

Jonas Sicking wrote:

> I think if we were to just |free()| the cached items then they
> wouldn't get paged in even if they were paged out.

Doesn't the CRT track free blocks in a chain or some such? That would
involve writing to the freed block (or its header).

--
Warning: May contain traces of nuts.

Robert O'Callahan

unread,

Sep 20, 2006, 5:26:26 PM9/20/06

to

Neil wrote:
> Jonas Sicking wrote:
>> I think if we were to just |free()| the cached items then they
>> wouldn't get paged in even if they were paged out.
>
> Doesn't the CRT track free blocks in a chain or some such? That would
> involve writing to the freed block (or its header).

It does.

This is not really what I'm aiming for though. I want to free cached
stuff before it gets paged out. Even the pageout wastes resources.

Rob

Jonas Sicking

unread,

Sep 20, 2006, 6:16:04 PM9/20/06

to

Very true indeed.

I think for textruns it would be good to try to keep the data in memory
until the page is fully loaded. During loading I bet that we do a lot of
measuring since we are continuously reflowing and rerendering the page.
Given a timeout of 5 seconds I think we should accomplish that unless
the network is really slow, in which case the network will be the
limiting factor anyway.

Of course, we should keep in mind the other timers that we have that
throttles reflows and the likes during loading.

/ Jonas

Axel Hecht

unread,

Sep 21, 2006, 9:06:40 AM9/21/06

to

I have seen portals taking up to 7 seconds to load all their adds and stuff.

But, as roc said that time would start after an object hasn't been used,
to the timer would probably be constantly reset during a load.

I wonder if there's a way to find reoccurring objects used every now and
then. I bet in ajax applications, we could have those. Maybe clobbering
after load more aggressively and loosely later may be a strategy to
speed up DHTML apps that switch back and forth, like menus and the like.

Axel

Robert O'Callahan

unread,

Sep 21, 2006, 9:25:50 PM9/21/06

to

Axel Hecht wrote:
> But, as roc said that time would start after an object hasn't been used,
> to the timer would probably be constantly reset during a load.

In practice we won't be resetting timers all the time but doing some
approximation with a repeating timer and age bits, but yes, if an object
gets touched frequently, it won't be released.

> I wonder if there's a way to find reoccurring objects used every now and
> then. I bet in ajax applications, we could have those. Maybe clobbering
> after load more aggressively and loosely later may be a strategy to
> speed up DHTML apps that switch back and forth, like menus and the like.

We could count the number of resurrections and increase the age limit
for such objects, if that seems necessary.

Rob