The proposal is to do it all with one new "rel" attribute for the <link>
tag: "offline". Whenever the UA loads a document that has been selected
for offline use (how this happens is UA-specific), after the document's
onload event fires, all "offline" <link> URLs are prefetched as we do
with "prefetch". The UA makes a best-effort attempt to ensure that the
results are not evicted from cache*. The UA allows loads of cached
resources to succeed if it is offline; in particular it assumes that any
HTTP request for a cached offline resoure returns 304 (Not Modified). A
load via "offline" should not replace the cached resource unless all
data is successfully received.
* Subject to UA-imposed policy on quotas and so forth. Also, the UA may
evict the results after pages cease to be selected for offline use or
pages selected for offline use are revisited and the resource is no
longer linked from the page.
Note that linking to a resource via "offline" does not necessarily make
that resource "selected for offline use", i.e., offline linking is not
transitive.
You could use it like this:
<html>
<head><link rel="offline" href="foo.jpg"/><link rel="offline"
href="bar.jpg"/></head>
<body>
<img id="i" src="foo.jpg" onclick="document.getElementById('i').src =
'bar.jpg'"/>
</body>
</html>
For complicated stuff you could use the JAR protocol to great effect.
<html>
<head>
<link rel="offline" href="http://example.com/application.jar"/>
</head>
<body onload="document.getElementById('f').src = navigator.online ?
'app.html' : 'jar:http://example.com/application.jar!/app.html'">
<iframe id="f"/>
</body>
</html>
I imagine that a simple UI approach would be to treat all bookmarked
pages as selected for offline use.
Rob
The jar concept would work well for offline apps, plus one could cache a
.js/.xml file with all the user data.
Add the upcoming WhatWG offline storage work I am told is going to make
Firefox 2.0, and I think we should have everything for a first stab at
offline apps.
I am still unclear about when the caching would happen - each time we
see a rel="offline", or only when you go offline while viewing the page
that has the rel="offline"? Web apps would want some way to ensure they
get into the cache, also a way to make sure the cache is up to date.
Doron
It has to happen every time you see a rel="offline". In general users
will "go offline" unexpectedly with no opportunity to download anything
before they're disconnected.
However, since the objects will generally already be in your cache, if
they haven't changed on the server the standard HTTP caching protocol
will ensure that you don't download them again.
> Web apps would want some way to ensure they
> get into the cache,
This is that way.
> also a way to make sure the cache is up to date.
My approach is that every time you load the root page, the cached
resources are revalidated if you're online. There is no way to update if
the user never visits the root page while online. I think this is OK.
Rob
That is perfect, thanks!
This looks great; very straightforward. Someone (mconnor?) mentioned
the other night that the cache gets blown away after a crash. I assume
this behavior is likely to want some modification for this?
Dan
Yes, definitely.
Rob
>Dan Mosedale wrote:
>
>>Someone (mconnor?) mentioned the other night that the cache gets blown away after a crash. I assume this behavior is likely to want some modification for this?
>>
>>
>Yes, definitely.
>
>
We could use this for pinning site icons in cache too.
--
Warning: May contain traces of nuts.
That's a big job. We can probably do some cheap things to ensure that
the disk cache is more often in a consistent state instead of just
waiting to put it in a consistent state at shutdown, but I wouldn't
underestimate the amount of work involved.
-Darin
Do recent improvements to sqlite warrant another pass at measuring the
performance hit from that approach? IIRC, it solved that problem as
well as the bogus-hash-collision one.
Mike
My attempt from before is here:
http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.{h,cpp}
-Darin
Oh, you and your doubts...
Another question would be "how much performance hit are we willing to
take to fix the hashing collisions and corruption issues?", though I
suspect the answer is still "not much".
Mike
heh ;-)
> Another question would be "how much performance hit are we willing to
> take to fix the hashing collisions and corruption issues?", though I
> suspect the answer is still "not much".
I measured 5% Tp loss starting with an empty cache. Given our
experience with places, I would expect that performance would be more
impacted by starting with a full sqlite db. I'd be interested in the
results of such an investigation :-)
-Darin
With places, I thought that we were most recently seeing a 2% Tp hit
for the empty history, but a performance *gain* when operating on
larger history data sets. It wouldn't totally surprise me to discover
that the cache would see the same pattern, and optimizing for the
empty-cache case doesn't seem like the right thing.
Mike
(sure is the easier thing to measure, though!)
With places we learned how easy it is to get bad performance from
sqlite. Brett ended up solving the nasty startup performance problem
by preloading the first 10 megs of the history and bookmarks sqlite db.
That results in great performance provided your db does not exceed
10 megs. I'm just skeptical that we should be using sqlite for the cache
because the cache already works quite well, and it doesn't need the
query power of sqlite.
The hash collision problem is a very minor problem for web content.
We can easily tweak the cache to reduce the frequency of collisions
and/or come up with a better conflict resolution scheme if needed.
Ultimately, I don't think the disk cache is the right datastructure for a
persistent datastore backing offline web apps. The cache is designed
as an optimization, not as a generic datastore.
-Darin
The performance gain with large histories is for Ts. I think the 2% is
mostly across the board, primarily because sqlite does a binary search
for finding visited URLs, and I assume Mork uses a hash table. I'm going
to work on this a bit.
However, with all the performance work we've done for places, I would
expect you would be able to get better performance than Darin measured,
but I'm not sure how it would be relative to the old cache.
Brett
As it is currently designed, that's true. But I still think that HTTP
caching is the right model for managing offline content.
Then one way to go would be to have a separate offline store, which we
accept is slower, and implements the cache interface. Use it as a
fallback for HTTP cache lookups, only when we're offline.
If we ever get it to the point where it's actually faster than the
existing disk cache, then we can throw away the latter.
It doesn't have to be SQLLite. It seems to me that making each cache
object a file and leveraging filesystem consistency should be workable,
the same philosophy as maildir format, but simpler in some ways.
Rob
File open is really expensive (especially on windows it seems). The
cache block files (_CACHE_001_, etc.) were a huge perf boost over flat
files.
I agree that it makes sense to intercept at the HTTP layer and to
build a persistent store as a cache layer (or "device" in
nsCacheService terminology).
-Darin
That always seemed so weird to me, especially since IE uses a separate
file per cookie and its cached data. I wonder what they're doing that
we aren't...
Mike
The query power of sqlite could, however, be used to address the use
case of searching offline content, which would be interesting in the
case of help pages pinned to the offline cache.
Dan
Yeah, I've been wondering about that too.
Rob
And, their cache is known to be a dog perf-wise. Of course they make
up for it elsewhere ;-)
-Darin
Do we know if their cache is actually saving stuff as separate files? Or
if it's just using explorer extensions to display stuff like that. I
know that you get very different names and dir trees if you look at that
directory from for example linux.
/ Jonas
I'm not sure that it would -- sqlite has no full-text indexing engine or
anything more complex than LIKE and similar basic SQL queries. For searching
offline content you want a real full-text index, and you can build that on
top of the raw data being stored elsewhere (filesystem, say) just as well
as you can if it was stored inside sqlite.
- Vlad
This might be a good reason to re-examine your sqlite-backed cache implementation,
especially now that we have some pretty significant sqlite performance improvements
from the async work. Since necko didn't have a dependency on mozStorage
and used sqlite directly, we'd probably have to pull a copy of the async
framework into necko, but it's just a single file. Definitely too late for
Fx2, but it might be worth running the perf tests again after a bit of work
for potential Gecko 1.9 inclusion.
For Fx2, could a secondary offline cache be created? I'm not sure if necko
can work with two disk caches, but having one that cached just the offline-flagged
content (even if the content was also placed in the other cache) would greatly
increase the chances of it being in a consistent state on a crash.
- Vlad
Actually, I converted it to use mozStorage:
http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
> , we'd probably have to pull a copy of the async
> framework into necko, but it's just a single file. Definitely too late for
> Fx2, but it might be worth running the perf tests again after a bit of work
> for potential Gecko 1.9 inclusion.
If someone wants to do so, I'd be very interested in the results.
> For Fx2, could a secondary offline cache be created?
Maybe.
> I'm not sure if necko
> can work with two disk caches, but having one that cached just the offline-flagged
> content (even if the content was also placed in the other cache) would greatly
> increase the chances of it being in a consistent state on a crash.
I'm not sure I completely follow your plan, but if you're talking
about instancing the current disk cache twice, then you'd still have
the problem that we don't put the disk cache in a consistent state
until the app shuts down.
I think it would be better for us to work on improving the current
disk cache instead of throwing it out for the sqlite-based one.
-Darin
* We may want to expand the definition of "offline mode" to include
situations where we're globally online, but the HTTP connection attempt
failed.
This scheme means we don't get sharing between the offline and online
caches, which is slightly unfortunate, but it seems relatively simple
and should impose no performance penalty.
Rob
Actually, it is both. It uses Explorer extensions to display files in
the "Temporary Internet Files" directory that aren't there - but they
really are separate files. This directory actually contains a
subdirectory Content.IE5 which stores the cached files in several
subdirectories. The cookies that are also displayed are separate files
as well but stored in an entirely different directory outside "Temporary
Internet Files". Luckily there is Total Commander that doesn't use
explorer extensions and displays the file system the way it really is...
Wladimir
> Actually, it is both. It uses Explorer extensions to display files in
> the "Temporary Internet Files" directory that aren't there - but they
> really are separate files. This directory actually contains a
> subdirectory Content.IE5 which stores the cached files in several
> subdirectories. The cookies that are also displayed are separate files
> as well but stored in an entirely different directory outside "Temporary
> Internet Files". Luckily there is Total Commander that doesn't use
> explorer extensions and displays the file system the way it really is...
I use Total Commander (since it was Windows Commander) exclusively so I
never noticed the virtual "Temporary Internet Files".
Phil (fires up windows explorer to have a look. Oh now I know what
people are talking about).
--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.
[ ]NAVY: Never Again Volunteer Yourself
* TagZilla 0.059