Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

offline files

210 views
Skip to first unread message

Robert O'Callahan

unread,
May 2, 2006, 7:16:50 PM5/2/06
to
Quite some time ago we had a discussion on mozilla2.0 about APIs to
allow Web apps to pin files in the browser cache for offline use. I want
to revive that discussion with a proposal that I think might make
everyone who participated in that discussion happy.

The proposal is to do it all with one new "rel" attribute for the <link>
tag: "offline". Whenever the UA loads a document that has been selected
for offline use (how this happens is UA-specific), after the document's
onload event fires, all "offline" <link> URLs are prefetched as we do
with "prefetch". The UA makes a best-effort attempt to ensure that the
results are not evicted from cache*. The UA allows loads of cached
resources to succeed if it is offline; in particular it assumes that any
HTTP request for a cached offline resoure returns 304 (Not Modified). A
load via "offline" should not replace the cached resource unless all
data is successfully received.

* Subject to UA-imposed policy on quotas and so forth. Also, the UA may
evict the results after pages cease to be selected for offline use or
pages selected for offline use are revisited and the resource is no
longer linked from the page.

Note that linking to a resource via "offline" does not necessarily make
that resource "selected for offline use", i.e., offline linking is not
transitive.

You could use it like this:
<html>
<head><link rel="offline" href="foo.jpg"/><link rel="offline"
href="bar.jpg"/></head>
<body>
<img id="i" src="foo.jpg" onclick="document.getElementById('i').src =
'bar.jpg'"/>
</body>
</html>

For complicated stuff you could use the JAR protocol to great effect.
<html>
<head>
<link rel="offline" href="http://example.com/application.jar"/>
</head>
<body onload="document.getElementById('f').src = navigator.online ?
'app.html' : 'jar:http://example.com/application.jar!/app.html'">
<iframe id="f"/>
</body>
</html>

I imagine that a simple UI approach would be to treat all bookmarked
pages as selected for offline use.

Rob

Doron Rosenberg

unread,
May 2, 2006, 10:34:56 PM5/2/06
to
Robert O'Callahan wrote:
> For complicated stuff you could use the JAR protocol to great effect.
> <html>
> <head>
> <link rel="offline" href="http://example.com/application.jar"/>
> </head>
> <body onload="document.getElementById('f').src = navigator.online ?
> 'app.html' : 'jar:http://example.com/application.jar!/app.html'">
> <iframe id="f"/>
> </body>
> </html>

The jar concept would work well for offline apps, plus one could cache a
.js/.xml file with all the user data.

Add the upcoming WhatWG offline storage work I am told is going to make
Firefox 2.0, and I think we should have everything for a first stab at
offline apps.

I am still unclear about when the caching would happen - each time we
see a rel="offline", or only when you go offline while viewing the page
that has the rel="offline"? Web apps would want some way to ensure they
get into the cache, also a way to make sure the cache is up to date.

Doron

Robert O'Callahan

unread,
May 3, 2006, 12:15:35 AM5/3/06
to
Doron Rosenberg wrote:
> I am still unclear about when the caching would happen - each time we
> see a rel="offline", or only when you go offline while viewing the page
> that has the rel="offline"?

It has to happen every time you see a rel="offline". In general users
will "go offline" unexpectedly with no opportunity to download anything
before they're disconnected.

However, since the objects will generally already be in your cache, if
they haven't changed on the server the standard HTTP caching protocol
will ensure that you don't download them again.

> Web apps would want some way to ensure they
> get into the cache,

This is that way.

> also a way to make sure the cache is up to date.

My approach is that every time you load the root page, the cached
resources are revalidated if you're online. There is no way to update if
the user never visits the root page while online. I think this is OK.

Rob

Doron Rosenberg

unread,
May 3, 2006, 10:52:20 AM5/3/06
to

That is perfect, thanks!

Dan Mosedale

unread,
May 4, 2006, 6:25:27 PM5/4/06
to
Robert O'Callahan wrote:
> Quite some time ago we had a discussion on mozilla2.0 about APIs to
> allow Web apps to pin files in the browser cache for offline use. I want
> to revive that discussion with a proposal that I think might make
> everyone who participated in that discussion happy.
>

This looks great; very straightforward. Someone (mconnor?) mentioned
the other night that the cache gets blown away after a crash. I assume
this behavior is likely to want some modification for this?

Dan

Robert O'Callahan

unread,
May 4, 2006, 6:28:59 PM5/4/06
to

Yes, definitely.

Rob

Neil

unread,
May 4, 2006, 7:53:53 PM5/4/06
to
Robert O'Callahan wrote:

>Dan Mosedale wrote:
>
>>Someone (mconnor?) mentioned the other night that the cache gets blown away after a crash. I assume this behavior is likely to want some modification for this?
>>
>>
>Yes, definitely.
>
>

We could use this for pinning site icons in cache too.

--
Warning: May contain traces of nuts.

Darin Fisher

unread,
May 5, 2006, 3:10:04 AM5/5/06
to Robert O'Callahan, dev-pl...@lists.mozilla.org

That's a big job. We can probably do some cheap things to ensure that
the disk cache is more often in a consistent state instead of just
waiting to put it in a consistent state at shutdown, but I wouldn't
underestimate the amount of work involved.

-Darin

Mike Shaver

unread,
May 5, 2006, 8:44:05 AM5/5/06
to Darin Fisher, dev-pl...@lists.mozilla.org, Robert O'Callahan
On 5/5/06, Darin Fisher <dar...@gmail.com> wrote:
> That's a big job. We can probably do some cheap things to ensure that
> the disk cache is more often in a consistent state instead of just
> waiting to put it in a consistent state at shutdown, but I wouldn't
> underestimate the amount of work involved.

Do recent improvements to sqlite warrant another pass at measuring the
performance hit from that approach? IIRC, it solved that problem as
well as the bogus-hash-collision one.

Mike

Darin Fisher

unread,
May 5, 2006, 11:43:41 AM5/5/06
to dev-pl...@lists.mozilla.org
Good question. I have my doubts.

My attempt from before is here:
http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.{h,cpp}

-Darin

Mike Shaver

unread,
May 5, 2006, 2:02:41 PM5/5/06
to Darin Fisher, dev-pl...@lists.mozilla.org
On 5/5/06, Darin Fisher <dar...@gmail.com> wrote:
> Good question. I have my doubts.
>
> My attempt from before is here:
> http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.{h,cpp}

Oh, you and your doubts...

Another question would be "how much performance hit are we willing to
take to fix the hashing collisions and corruption issues?", though I
suspect the answer is still "not much".

Mike

Darin Fisher

unread,
May 5, 2006, 3:53:43 PM5/5/06
to Mike Shaver, dev-pl...@lists.mozilla.org
On 5/5/06, Mike Shaver <mike....@gmail.com> wrote:
> On 5/5/06, Darin Fisher <dar...@gmail.com> wrote:
> > Good question. I have my doubts.
> >
> > My attempt from before is here:
> > http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.{h,cpp}
>
> Oh, you and your doubts...

heh ;-)


> Another question would be "how much performance hit are we willing to
> take to fix the hashing collisions and corruption issues?", though I
> suspect the answer is still "not much".

I measured 5% Tp loss starting with an empty cache. Given our
experience with places, I would expect that performance would be more
impacted by starting with a full sqlite db. I'd be interested in the
results of such an investigation :-)

-Darin

Mike Shaver

unread,
May 5, 2006, 3:58:27 PM5/5/06
to Darin Fisher, dev-pl...@lists.mozilla.org

With places, I thought that we were most recently seeing a 2% Tp hit
for the empty history, but a performance *gain* when operating on
larger history data sets. It wouldn't totally surprise me to discover
that the cache would see the same pattern, and optimizing for the
empty-cache case doesn't seem like the right thing.

Mike
(sure is the easier thing to measure, though!)

Darin Fisher

unread,
May 5, 2006, 4:15:18 PM5/5/06
to Mike Shaver, dev-pl...@lists.mozilla.org
On 5/5/06, Mike Shaver <mike....@gmail.com> wrote:

With places we learned how easy it is to get bad performance from
sqlite. Brett ended up solving the nasty startup performance problem
by preloading the first 10 megs of the history and bookmarks sqlite db.
That results in great performance provided your db does not exceed
10 megs. I'm just skeptical that we should be using sqlite for the cache
because the cache already works quite well, and it doesn't need the
query power of sqlite.

The hash collision problem is a very minor problem for web content.
We can easily tweak the cache to reduce the frequency of collisions
and/or come up with a better conflict resolution scheme if needed.

Ultimately, I don't think the disk cache is the right datastructure for a
persistent datastore backing offline web apps. The cache is designed
as an optimization, not as a generic datastore.

-Darin

Brett Wilson

unread,
May 5, 2006, 4:18:27 PM5/5/06
to

The performance gain with large histories is for Ts. I think the 2% is
mostly across the board, primarily because sqlite does a binary search
for finding visited URLs, and I assume Mork uses a hash table. I'm going
to work on this a bit.

However, with all the performance work we've done for places, I would
expect you would be able to get better performance than Darin measured,
but I'm not sure how it would be relative to the old cache.

Brett

Robert O'Callahan

unread,
May 7, 2006, 5:19:34 PM5/7/06
to
Darin Fisher wrote:
> Ultimately, I don't think the disk cache is the right datastructure for a
> persistent datastore backing offline web apps. The cache is designed
> as an optimization, not as a generic datastore.

As it is currently designed, that's true. But I still think that HTTP
caching is the right model for managing offline content.

Then one way to go would be to have a separate offline store, which we
accept is slower, and implements the cache interface. Use it as a
fallback for HTTP cache lookups, only when we're offline.

If we ever get it to the point where it's actually faster than the
existing disk cache, then we can throw away the latter.

It doesn't have to be SQLLite. It seems to me that making each cache
object a file and leveraging filesystem consistency should be workable,
the same philosophy as maildir format, but simpler in some ways.

Rob

Darin Fisher

unread,
May 8, 2006, 1:34:36 AM5/8/06
to Robert O'Callahan, dev-pl...@lists.mozilla.org

File open is really expensive (especially on windows it seems). The
cache block files (_CACHE_001_, etc.) were a huge perf boost over flat
files.

I agree that it makes sense to intercept at the HTTP layer and to
build a persistent store as a cache layer (or "device" in
nsCacheService terminology).

-Darin

Mike Shaver

unread,
May 8, 2006, 8:35:07 AM5/8/06
to Darin Fisher, dev-pl...@lists.mozilla.org, Robert O'Callahan
On 5/8/06, Darin Fisher <dar...@gmail.com> wrote:
> File open is really expensive (especially on windows it seems). The
> cache block files (_CACHE_001_, etc.) were a huge perf boost over flat
> files.

That always seemed so weird to me, especially since IE uses a separate
file per cookie and its cached data. I wonder what they're doing that
we aren't...

Mike

Dan Mosedale

unread,
May 8, 2006, 2:10:54 PM5/8/06
to
Darin Fisher wrote:
> I'm just skeptical that we should be using sqlite for the cache
> because the cache already works quite well, and it doesn't need the
> query power of sqlite.

The query power of sqlite could, however, be used to address the use
case of searching offline content, which would be interesting in the
case of help pages pinned to the offline cache.

Dan

Robert O'Callahan

unread,
May 8, 2006, 4:39:51 PM5/8/06
to

Yeah, I've been wondering about that too.

Rob

Darin Fisher

unread,
May 10, 2006, 11:11:08 AM5/10/06
to Mike Shaver, dev-pl...@lists.mozilla.org, Robert O'Callahan
On 5/8/06, Mike Shaver <mike....@gmail.com> wrote:
> On 5/8/06, Darin Fisher <dar...@gmail.com> wrote:
> > File open is really expensive (especially on windows it seems). The
> > cache block files (_CACHE_001_, etc.) were a huge perf boost over flat
> > files.
>
> That always seemed so weird to me, especially since IE uses a separate
> file per cookie and its cached data. I wonder what they're doing that
> we aren't...

And, their cache is known to be a dog perf-wise. Of course they make
up for it elsewhere ;-)

-Darin

Jonas Sicking

unread,
May 15, 2006, 5:53:05 AM5/15/06
to

Do we know if their cache is actually saving stuff as separate files? Or
if it's just using explorer extensions to display stuff like that. I
know that you get very different names and dir trees if you look at that
directory from for example linux.

/ Jonas

Vladimir Vukicevic

unread,
May 18, 2006, 4:01:09 AM5/18/06
to
Dan Mosedale worte:

I'm not sure that it would -- sqlite has no full-text indexing engine or
anything more complex than LIKE and similar basic SQL queries. For searching
offline content you want a real full-text index, and you can build that on
top of the raw data being stored elsewhere (filesystem, say) just as well
as you can if it was stored inside sqlite.

- Vlad


Vladimir Vukicevic

unread,
May 30, 2006, 3:10:19 AM5/30/06
to
Darin Fisher wrote:
>>> This looks great; very straightforward. Someone (mconnor?)
>>> mentioned the other night that the cache gets blown away after a
>>> crash. I assume this behavior is likely to want some modification
>>> for this?
>>>
>> Yes, definitely.
>>
> That's a big job. We can probably do some cheap things to ensure that
> the disk cache is more often in a consistent state instead of just
> waiting to put it in a consistent state at shutdown, but I wouldn't
> underestimate the amount of work involved.

This might be a good reason to re-examine your sqlite-backed cache implementation,
especially now that we have some pretty significant sqlite performance improvements
from the async work. Since necko didn't have a dependency on mozStorage
and used sqlite directly, we'd probably have to pull a copy of the async
framework into necko, but it's just a single file. Definitely too late for
Fx2, but it might be worth running the perf tests again after a bit of work
for potential Gecko 1.9 inclusion.

For Fx2, could a secondary offline cache be created? I'm not sure if necko
can work with two disk caches, but having one that cached just the offline-flagged
content (even if the content was also placed in the other cache) would greatly
increase the chances of it being in a consistent state on a crash.

- Vlad


Darin Fisher

unread,
May 30, 2006, 10:30:59 AM5/30/06
to Vladimir Vukicevic, dev-pl...@lists.mozilla.org
On 5/30/06, Vladimir Vukicevic <vlad...@pobox.com> wrote:
> Darin Fisher wrote:
> >>> This looks great; very straightforward. Someone (mconnor?)
> >>> mentioned the other night that the cache gets blown away after a
> >>> crash. I assume this behavior is likely to want some modification
> >>> for this?
> >>>
> >> Yes, definitely.
> >>
> > That's a big job. We can probably do some cheap things to ensure that
> > the disk cache is more often in a consistent state instead of just
> > waiting to put it in a consistent state at shutdown, but I wouldn't
> > underestimate the amount of work involved.
>
> This might be a good reason to re-examine your sqlite-backed cache implementation,
> especially now that we have some pretty significant sqlite performance improvements
> from the async work. Since necko didn't have a dependency on mozStorage
> and used sqlite directly

Actually, I converted it to use mozStorage:
http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.cpp


> , we'd probably have to pull a copy of the async
> framework into necko, but it's just a single file. Definitely too late for
> Fx2, but it might be worth running the perf tests again after a bit of work
> for potential Gecko 1.9 inclusion.

If someone wants to do so, I'd be very interested in the results.


> For Fx2, could a secondary offline cache be created?

Maybe.


> I'm not sure if necko
> can work with two disk caches, but having one that cached just the offline-flagged
> content (even if the content was also placed in the other cache) would greatly
> increase the chances of it being in a consistent state on a crash.

I'm not sure I completely follow your plan, but if you're talking
about instancing the current disk cache twice, then you'd still have
the problem that we don't put the disk cache in a consistent state
until the app shuts down.

I think it would be better for us to work on improving the current
disk cache instead of throwing it out for the sqlite-based one.

-Darin

Robert O'Callahan

unread,
May 30, 2006, 6:21:45 PM5/30/06
to
I think the way to go is:
-- keep around an instance of a mozStorage-based cache, the "offline cache"
-- this cache needs a specialized implementation of eviction and quotas
-- "offline loads" use the offline cache
-- loads that fail in the regular cache while we're in "offline mode"*
fall back to the offline cache

* We may want to expand the definition of "offline mode" to include
situations where we're globally online, but the HTTP connection attempt
failed.

This scheme means we don't get sharing between the offline and online
caches, which is slightly unfortunate, but it seems relatively simple
and should impose no performance penalty.

Rob

Wladimir Palant

unread,
Jun 4, 2006, 10:51:36 AM6/4/06
to Jonas Sicking
Jonas Sicking wrote:
> Do we know if their cache is actually saving stuff as separate files? Or
> if it's just using explorer extensions to display stuff like that.

Actually, it is both. It uses Explorer extensions to display files in
the "Temporary Internet Files" directory that aren't there - but they
really are separate files. This directory actually contains a
subdirectory Content.IE5 which stores the cached files in several
subdirectories. The cookies that are also displayed are separate files
as well but stored in an entirely different directory outside "Temporary
Internet Files". Luckily there is Total Commander that doesn't use
explorer extensions and displays the file system the way it really is...

Wladimir

Philip Chee

unread,
Jun 4, 2006, 11:27:26 AM6/4/06
to
On Sun, 04 Jun 2006 16:51:36 +0200, Wladimir Palant wrote:

> Actually, it is both. It uses Explorer extensions to display files in
> the "Temporary Internet Files" directory that aren't there - but they
> really are separate files. This directory actually contains a
> subdirectory Content.IE5 which stores the cached files in several
> subdirectories. The cookies that are also displayed are separate files
> as well but stored in an entirely different directory outside "Temporary
> Internet Files". Luckily there is Total Commander that doesn't use
> explorer extensions and displays the file system the way it really is...

I use Total Commander (since it was Windows Commander) exclusively so I
never noticed the virtual "Temporary Internet Files".

Phil (fires up windows explorer to have a look. Oh now I know what
people are talking about).
--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.
[ ]NAVY: Never Again Volunteer Yourself
* TagZilla 0.059

0 new messages