This feature would require some sort of signature on the script, and it
would need to degrade gracefully in older browsers. I suggest something like
<script src="http://mysite.example.com/dojo.js"
shared="a023df234af23423daf"></script>
where the 'shared' attribute is a base64 signature of the script. That
would allow us to create a cache of .fasl files for scripts with this
attribute, because we know they are likely to be reused.
I'm interested in comments on the feasibility of this idea.
-Rob
From my experience with remote and local versions of dojo based web
apps, the major speed problem is the synchronous loading of the js
files, not that much the compilation of those. As you would still need
to load the file completely, compute the checksum and then decide wether
to load a compiled version of the file or not, I wouldn't expect a major
improvement.
I would expect that some caching flags like they were discussed for
offline flagging might be of more help, like, create and publicize a
standard way http header way to make those files get loaded from the
cache, regardless of cache policy. Then we can look into fastloading
those cache entries, too.
Axel
No, you wouldn't need to load them. If you had an appropriate entry
in the shared cache (keyed by the presented hash) then you would use
right out of the cache instead of going out over the network.
Mike
What's the risk of hash collisions? Just assume that the person hosting
the script isn't stupid enough to search for a collision and then
complain that the browser still uses the cached version?
Chris
I don't think it's worth worrying about, if we pick a decent hash
algorithm to use.
Mike
This feature is similar to the one proposed in https://
bugzilla.mozilla.org/show_bug.cgi?id=292481, with the main difference
being that 292481 is mainly intended to improve verification while
this one might only be used to improve caching. I think they should
use the same attribute and code.
If this were implemented in a browser cache, it would also save space,
but it would also allow the browser to keep a few well-known libraries
parsed and in-memory, and avoid the overhead of initialisation. Since
the only thing that needs to be kept around is the relation of the URI
to the hash, and its appropriate metadata (e.g., cacheability), it
should be workable to give it special status in the cache replacement
algorithm, perhaps going as far as pinning it in-cache.
Yes, it would require a network fetch to first get it, but the upside
is that it's zero-impact on publishers; anything that requires people
to synchronise hashes in their HTML with the libraries they publish is
asking for trouble, IMO. It makes things too tightly-coupled.
WRT hash collisions, most caches use a hash of the URI as a key
anyway, so the risk is already there (and apparently manageable).
Cheers,
For the record, Mozilla uses the URI itself as the key (well, actually it also
uses some other info, but that's not really relevant).
I do think that if we use a known-good cryptographic hash we should be fine.
We're relying on those for cert comparisons, so if they're good enough there
they're good enough here.
-Boris
On Jan 29, 3:57 am, Boris Zbarsky <bzbar...@mit.edu> wrote:
> mnott...@gmail.com wrote:
> > WRT hash collisions, most caches use a hash of the URI as a key
> > anyway, so the risk is already there (and apparently manageable).For the record, Mozilla uses the URI itself as the key (well, actually it also
If Firefox changed to use content from a cache even if that content came
from a different site, this would make it much easier to exploit things
if the hash algorithm were broken.
Using a standard Link Fingerprints implementation, if the hash algorithm
is broken, the attacker still has to break into the download servers and
upload their malicious code. In this scenario, they would just need to
have persuaded the target user to have visited any site where they had
put <img src="http://www.evilsite.com/do-nasty-stuff-on-target-site.js"
shared="cracked-hash-here">.
Of course, we might say that if the hash algorithm is broken, we have
far bigger problems to worry about. In that case, yes, let's use Link
Fingerprints as the mechanism for identifying identical resources
without downloading more copies.
Gerv
This is true. Perhaps we could alter it like so:
<script src="http://mysite.example.com/dojo.js"
shared="http://www.openjsan.org/2007/01/29/dojo.js#!md5!A4F329C4..."/>
and download from a known trusted site in the background.
- Rob
Exactly what problem is this trying to solve? Just cross-site caching of
large scripts?
I wonder how well this would work in practice, since every version of a
script would have a different hash (not to mention people tweaking their
copy of a script). I don't think the user is going to see much benefit
unless lots of sites all start using the same version of the same large
scripts.
Even then, this would really only help with the first visit to such a
site. With proper caching, subsequent visits to the site should be using
the cached copy of the script anyway.
Justin
Talking of caching, we don't cache for https right now, right? Would it
be safe to make shared affect https copies?
Axel
Then why do we need the hash at all? The above is basically the same as:
<script src="http://www.openjsan.org/2007/01/29/dojo.js"/>
If it's in the cache from a previous site or session, it won't get
downloaded.
The only difference is that openjsan.org gets to pay the bandwidth bill.
If you wanted to avoid that, you could have something like:
<script src="http://mysite.example.com/dojo.js"
original="http://www.openjsan.org/2007/01/29/dojo.js"/>
i.e. "if you don't have this file but do have this other one, you can
use that instead". If the Dojo hosts put new versions in new dated
directories (or give the files names based on the version number, which
might be better), then there's no need for the hash.
The cache could keep files referenced in this way by a site for an
especially long time.
In fact, going one step further, why not just do:
<script src="http://www.openjsan.org/2007/01/29/dojo.js#!md5!A4F329C4..."/>
where the browser checks the MD5sum against the cached copy and uses it
without download if they match (and fixes it in the cache). openjsan.org
still pays a bandwidth bill, but it's much reduced as each browser only
downloads the file once. And the site has security against openjsan.org
getting hacked, because if the MD5 doesn't match, the browser won't
execute it.
But then, you need to have a way of getting the file if the MD5s don't
match. Otherwise your site goes down. So what you really want is:
<script
src="http://www.openjsan.org/2007/01/29/dojo.js"
original="http://www.mysite.example.com/dojo.js"
hash="A4F329C4..." />
So now the original attribute and MD5sum together a unique key. The
browser does the following:
- Do I have a cached resource with source "original" and MD5 "hash"?
If so, use.
- If not, can I access "src"? If so, and it has MD5 "hash", store it
with the key "original"/"hash" and use.
- If not, can I access "original"? If so, and it has MD5 "hash", store
it with the key "original"/"hash" and use.
But then this has the "hash algorithm breaking" problem again!
Random ramblings - can you get something useful out of them? :-)
Gerv
We don't cache it on disk. We do cache it in the memory cache.
-Boris
That's the idea. It's a way to grow a standard library instead of
blessing one (by shipping in the browser). Look at the things listed here:
<http://www.sergiopereira.com/articles/prototype.js.html#Reference.Extensions>
why does every script have to include their own copies of these? They
take space away from application code.
>
> Even then, this would really only help with the first visit to such a
> site. With proper caching, subsequent visits to the site should be using
> the cached copy of the script anyway.
It would help the most on the first visit, but that is a pretty
important advantage. At the moment, we don't create .fasl files for
scripts, even if they are cached.
-Rob
I still like this idea, but someone (Dylan Schliemann of Dojo, IIRC)
punctured my balloon-like hope by observing that a lot of Ajax apps
crunch and link-cull to make something more like an ASCII .swf file
than a standard, widely hosted and therefore cacheable .js file.
> It would help the most on the first visit, but that is a pretty
> important advantage. At the moment, we don't create .fasl files for
> scripts, even if they are cached.
I hear from some Web 2.0 devs that a ccache approach would win too, to
avoid the recompilation-from-source hit when crossing domain
boundaries and reloading the same prototype.js from the new domain. We
could do both. The ccache approach could be done without the shared=
attribute or the questions begged by the source-embedded crypto-hash.
Thanks for pushing this idea, it still seems worthwhile in spite of
the exceptions and complexities.
/be
I still like this idea, but someone (Dylan Schliemann of Dojo, IIRC)
punctured my balloon-like hope by observing that a lot of Ajax apps
crunch and link-cull to make something more like an ASCII .swf file
than a standard, widely hosted and therefore cacheable .js file.
> It would help the most on the first visit, but that is a pretty
> important advantage. At the moment, we don't create .fasl files for
> scripts, even if they are cached.
I hear from some Web 2.0 devs that a ccache approach would win too, to
Well, maybe there are a couple ways to beat this. The first is to pack
the shared files--that doesn't hurt reuse. Second, it occurs to me that
using a trusted source in the shared= attribute value makes link-culling
irrelevant.
<script src="/my/packed-and-culled-dojo.js"
shared="http://openjsan.org/2006/....dojo.js" />
would mean "use my packed version of dojo, but if you have this other
thing already, you can use that instead, even though it contains code I
don't use." This method also avoids the scary stuff Gerv mentioned, at
the cost of making sure openjsan.org or whatever is pretty sturdy. We
already do exactly this for search suggestions, updates, extensions, etc.
>> It would help the most on the first visit, but that is a pretty
>> important advantage. At the moment, we don't create .fasl files for
>> scripts, even if they are cached.
>
> I hear from some Web 2.0 devs that a ccache approach would win too, to
> avoid the recompilation-from-source hit when crossing domain
> boundaries and reloading the same prototype.js from the new domain. We
> could do both. The ccache approach could be done without the shared=
> attribute or the questions begged by the source-embedded crypto-hash.
Yeah, we should be able to try that first.
What do people think is a good heuristic? Do it for all scripts? 15k of
script and an ETag will make us save a fastload file?
-Rob
I think I was being overly-cautious. We already rely on the
unbreakability of hash algorithms in a lot of other security; if they
get broken, this is the least of our worries.
Another thing, though: if putting the URL of a JS file in the "shared"
attribute makes it cached for longer or load faster, why will all web
app authors not just do:
<script src="/my/scripts-i-wrote.js"
shared="/my/scripts-i-wrote.js" />
? If they do, do we care?
Do we accept only absolute URIs in "shared" to make sure the "key" we
are using is unique?
Gerv
This actually came up at the Baa Camp this weekend.
If there were fast-access sites for particular versions of popular
libraries like, say, Prototype or jQuery, then people might be more
inclined to use them instead of saving a few K by culling their own
custom version.
Personally I think that someone with the connectivity, trust and will
(e.g., Google) should just make the files available, promise to keep
them available, and people can just link to those URLs. Not a browser
issue. (Although Gerv's link hashes would reduce the trust
requirement.)
Rob
AOL does something like this for the Dojo Javascript library:
http://blog.dojotoolkit.org/2006/06/16/aol-hosting-dojo-031-cross-domain-build-in-a-cdn
http://article.gmane.org/gmane.comp.web.dojo.user/10085
Chris.
--
http://www.bluishcoder.co.nz