|Upcoming Cache Work||Nick Hurley||1/24/12 4:02 PM|
In preparation for discussing and working on the disk cache during our
upcoming necko work week, I wanted to start some discussion with a
wider audience about directions we can go for making our disk cache
present- (and future-) compliant. Right now, I'm starting with a focus
on (1) making the cache more performant, and (2) increasing
stability/preventing data loss under certain circumstances. With that
in mind, here is my initial list of ideas (in no particular order) of
things we can/should do. I would love to hear what others have to say
in terms of different ideas, reasons my ideas suck (or are great), or
something that I've missed.
(A) Reduce lock contention. Currently, the entire cache is protected
by one big lock, which effectively makes our cache single threaded. My
first step idea for this is to start making our locking finer grained.
 is the first bug in this work, with more to follow as more
opportunities for finer grained locking are identified. Eventually, it
would be nice to make some of our locks into multiple reader/single
writer locks for data structures where that makes sense (the global
cache metadata, for example).
(B) Reduce round-trips between the main thread and the cache thread.
This would mean pageloads aren't pushed to the tail of the queue each
time they need to get more data from the cache (among other things).
Discussion on this particular topic has already started at .
(C) Increase parallelism within the cache. Currently all cache I/O
happens serially on one thread, even if we're trying to read/write
more than one element in the cache at a time. We should improve this
situation by having more than one thread available for reading/writing
cache entries (this is dependent on having finer-grained locking of
(D) Make it so changing the on-disk format of the cache is less likely
to require a wholesale deletion of the cache. Discussion has started
in  with some suggestions for strategies to achieve this goal,
which include versioning each cache entry independently of the cache
as a whole, keeping a bitmask of "options" used in the particular
cache entry (enabling us to ignore entries with options we don't
recognize or missing options we require), or making the vast majority
of the metadata more freeform (similar to the bitmask solution, but
text-based and allowing for key/value pairs).
(E) Make crash recovery more robust. This is a long-standing bug 
about the fact that we have precisely one dirty bit for the entire
cache, and if we ever crash (shutdown uncleanly), the entire cache is
deleted when the browser next starts up.
(F) More efficient storage of entries. One incremental improvement
might be to store an entry's metadata with that entry when the entry
is not stored in a block file. Currently, the entry and its metadata
are stored in 2 separate files, which can affect things like directory
seek times in a very full cache.
That's the list I've created for myself as a starting point. Of course
none of this is The One Final Answer, and priorities of these (and
other suggestions) are still up in the air. Comments are highly
desired so we can get our cache in way better shape.
|Re: Upcoming Cache Work||Patrick McManus||1/24/12 5:39 PM|
Nick - its an awesome list you have!
I've been keeping an evernote with some ideas about this - so let me
just dump my notes out here in their unfleshed-out glory as input to
your project. They might not be helpful to you - I won't be hurt, but I
did want to share them. I'll admit some of this is thinking-big instead
of just fixing a weakness but if its an area we can leap frog ahead
surely that's worth discussing.
* Reduce random access nature of I/O by replacing with block reads and
writes that can minimize seek and utilize read ahead.. instead of 1 IO
event per URI, maybe organize things temporally or by their embedded
object relationships. I like fifo data structures for this - they are
great at minimizing contention. Consider if a little duplication of
smallish things (js) is really so bad if it saves seek times.
* Similarly, apply the linked-object concept to replacement policy.
* bring concepts of cost and value into the replacement policy (both for
mem and disk) decision. by cost I mean that fetching a large .swf from
another continent has a different transfer time than fetching a 300
byte icon from the local LAN - LRU is probably not the right way to
compare those two things bidding for the last disk block. By value I
mean that a hit on some objects is simply more valuable than a hit on
other objects - html, css, and js for instance are imo more important to
the page load experience than images. (which is not to say that you want
to store js for rarely used pages at the cost of evicting images of
highly used ones, but that in all-other-things-being equal sense they
are more valuable).
* Consider reducing the size of the data under management (and therefore
bound the IO and memory) by only caching the first mumble bytes of image
resources - enough to give layout hints. Maybe do this as a L2 disk
cache after the full object has been evicted from the smaller full L1 cache.
* invent Self organizing cache sharing LAN protocols. A privacy
nightmare in a coffee shop, but a killer feature in a classroom setting.
IMO this absolutely has potential to radically step up performance in
* Spdy server push needs some way to create a cache scoped to a document
as temporary storage for the pushed data while reusing the "is this is
hit" logic for subresouces of the document - it would be great to have
an in-mem cache that allowed reuse for this kind of thing as well as
promotion of used URIs to the general cache. I can explain more in person.
> dev-tech-network mailing list
|Re: Upcoming Cache Work||Taras Glek||2/8/12 11:22 AM|
On 1/24/2012 5:39 PM, Patrick McManus wrote:I agree with everything that has been suggested(except for the cache
sharing idea, that can be done in an addon).
We absolutely need to get rid of hashtable-on-disk pattern of cache
block files. It has insane overhead. As mentioned elsewhere we should
change the format to instead have better data-locality and readahead.
We should stop assuming that disk cache is faster than network(Bug 725423)
|Re: Upcoming Cache Work||Josh Aas||2/8/12 12:19 PM|
A high-level summary of our plan from the work week:
First (now) we'll go after some high priority issues that can be resolved without changing the on-disk format. Main thread locking issues and bug 725423, for example. After that we'll make sure we have a good system in place for handling on-disk formatting version changes. Then we'll start making changes to the on-disk format.
|Re: Upcoming Cache Work||Henri Sivonen||2/20/12 3:48 AM|