sharing js libraries

Robert Sayre

unread,

Jan 26, 2007, 7:29:35 PM1/26/07

to

A way to share JS libraries, like dojo and prototype.js, across sits for
Fx3 has been discussed a couple of times.

This feature would require some sort of signature on the script, and it
would need to degrade gracefully in older browsers. I suggest something like

where the 'shared' attribute is a base64 signature of the script. That
would allow us to create a cache of .fasl files for scripts with this
attribute, because we know they are likely to be reused.

I'm interested in comments on the feasibility of this idea.

-Rob

Axel Hecht

unread,

Jan 27, 2007, 4:52:20 AM1/27/07

to

From my experience with remote and local versions of dojo based web
apps, the major speed problem is the synchronous loading of the js
files, not that much the compilation of those. As you would still need
to load the file completely, compute the checksum and then decide wether
to load a compiled version of the file or not, I wouldn't expect a major
improvement.

I would expect that some caching flags like they were discussed for
offline flagging might be of more help, like, create and publicize a
standard way http header way to make those files get loaded from the
cache, regardless of cache policy. Then we can look into fastloading
those cache entries, too.

Axel

Mike Shaver

unread,

Jan 27, 2007, 11:16:25 AM1/27/07

to Axel Hecht, dev-pl...@lists.mozilla.org

On 1/27/07, Axel Hecht <ax...@pike.org> wrote:

> Robert Sayre wrote:
> > <script src="http://mysite.example.com/dojo.js"
> > shared="a023df234af23423daf"></script>
>

> From my experience with remote and local versions of dojo based web
> apps, the major speed problem is the synchronous loading of the js
> files, not that much the compilation of those. As you would still need
> to load the file completely, compute the checksum and then decide wether
> to load a compiled version of the file or not, I wouldn't expect a major
> improvement.

No, you wouldn't need to load them. If you had an appropriate entry
in the shared cache (keyed by the presented hash) then you would use
right out of the cache instead of going out over the network.

Mike

Chris Thomas

unread,

Jan 27, 2007, 2:29:24 PM1/27/07

to

What's the risk of hash collisions? Just assume that the person hosting
the script isn't stupid enough to search for a collision and then
complain that the browser still uses the cached version?

Chris

Mike Shaver

unread,

Jan 27, 2007, 5:12:11 PM1/27/07

to c...@yecc.com, dev-pl...@lists.mozilla.org

On 1/27/07, Chris Thomas <c...@andrew.cmu.edu> wrote:
> What's the risk of hash collisions? Just assume that the person hosting
> the script isn't stupid enough to search for a collision and then
> complain that the browser still uses the cached version?

I don't think it's worth worrying about, if we pick a decent hash
algorithm to use.

Mike

jrud...@gmail.com

unread,

Jan 27, 2007, 7:38:54 PM1/27/07

to

> <script src="http://mysite.example.com/dojo.js"
> shared="a023df234af23423daf"></script>

This feature is similar to the one proposed in https://
bugzilla.mozilla.org/show_bug.cgi?id=292481, with the main difference
being that 292481 is mainly intended to improve verification while
this one might only be used to improve caching. I think they should
use the same attribute and code.

mnot...@gmail.com

unread,

Jan 28, 2007, 4:58:58 AM1/28/07

to

Inktomi Traffic Server is reputed to store only one representation on
disk, no matter how many URIs yield that representation; it uses a
hash of the content to determine when things are the same. For them,
it saves disk space.

If this were implemented in a browser cache, it would also save space,
but it would also allow the browser to keep a few well-known libraries
parsed and in-memory, and avoid the overhead of initialisation. Since
the only thing that needs to be kept around is the relation of the URI
to the hash, and its appropriate metadata (e.g., cacheability), it
should be workable to give it special status in the cache replacement
algorithm, perhaps going as far as pinning it in-cache.

Yes, it would require a network fetch to first get it, but the upside
is that it's zero-impact on publishers; anything that requires people
to synchronise hashes in their HTML with the libraries they publish is
asking for trouble, IMO. It makes things too tightly-coupled.

WRT hash collisions, most caches use a hash of the URI as a key
anyway, so the risk is already there (and apparently manageable).

Cheers,

Boris Zbarsky

unread,

Jan 28, 2007, 11:57:00 AM1/28/07

to

mnot...@gmail.com wrote:
> WRT hash collisions, most caches use a hash of the URI as a key
> anyway, so the risk is already there (and apparently manageable).

For the record, Mozilla uses the URI itself as the key (well, actually it also
uses some other info, but that's not really relevant).

I do think that if we use a known-good cryptographic hash we should be fine.
We're relying on those for cert comparisons, so if they're good enough there
they're good enough here.

-Boris

mnot...@gmail.com

unread,

Jan 29, 2007, 4:05:41 AM1/29/07

to

Interesting. Pretty much every intermediary cache I'm aware of uses an
MD5 or similar hash (or even half-hash).

On Jan 29, 3:57 am, Boris Zbarsky <bzbar...@mit.edu> wrote:

> mnott...@gmail.com wrote:
> > WRT hash collisions, most caches use a hash of the URI as a key

> > anyway, so the risk is already there (and apparently manageable).For the record, Mozilla uses the URI itself as the key (well, actually it also

Gervase Markham

unread,

Jan 29, 2007, 10:50:35 AM1/29/07

to

If Firefox changed to use content from a cache even if that content came
from a different site, this would make it much easier to exploit things
if the hash algorithm were broken.

Using a standard Link Fingerprints implementation, if the hash algorithm
is broken, the attacker still has to break into the download servers and
upload their malicious code. In this scenario, they would just need to
have persuaded the target user to have visited any site where they had
put <img src="http://www.evilsite.com/do-nasty-stuff-on-target-site.js"
shared="cracked-hash-here">.

Of course, we might say that if the hash algorithm is broken, we have
far bigger problems to worry about. In that case, yes, let's use Link
Fingerprints as the mechanism for identifying identical resources
without downloading more copies.

Gerv

Robert Sayre

unread,

Jan 29, 2007, 12:12:07 PM1/29/07

to Gervase Markham

Gervase Markham wrote:
>
> If Firefox changed to use content from a cache even if that content came
> from a different site, this would make it much easier to exploit things
> if the hash algorithm were broken.

This is true. Perhaps we could alter it like so:

and download from a known trusted site in the background.

- Rob

Justin Dolske

unread,

Jan 29, 2007, 6:07:05 PM1/29/07

to

Robert Sayre wrote:
> A way to share JS libraries, like dojo and prototype.js, across sits for
> Fx3 has been discussed a couple of times.
>
> This feature would require some sort of signature on the script, and it
> would need to degrade gracefully in older browsers. I suggest something
> like
>
> <script src="http://mysite.example.com/dojo.js"
> shared="a023df234af23423daf"></script>

Exactly what problem is this trying to solve? Just cross-site caching of
large scripts?

I wonder how well this would work in practice, since every version of a
script would have a different hash (not to mention people tweaking their
copy of a script). I don't think the user is going to see much benefit
unless lots of sites all start using the same version of the same large
scripts.

Even then, this would really only help with the first visit to such a
site. With proper caching, subsequent visits to the site should be using
the cached copy of the script anyway.

Justin

Axel Hecht

unread,

Jan 30, 2007, 5:37:24 AM1/30/07

to

Talking of caching, we don't cache for https right now, right? Would it
be safe to make shared affect https copies?

Axel

Gervase Markham

unread,

Jan 30, 2007, 5:38:27 AM1/30/07

to

Robert Sayre wrote:
> This is true. Perhaps we could alter it like so:
>
> <script src="http://mysite.example.com/dojo.js"
> shared="http://www.openjsan.org/2007/01/29/dojo.js#!md5!A4F329C4..."/>
>
> and download from a known trusted site in the background.

Then why do we need the hash at all? The above is basically the same as:

If it's in the cache from a previous site or session, it won't get
downloaded.

The only difference is that openjsan.org gets to pay the bandwidth bill.
If you wanted to avoid that, you could have something like:

i.e. "if you don't have this file but do have this other one, you can
use that instead". If the Dojo hosts put new versions in new dated
directories (or give the files names based on the version number, which
might be better), then there's no need for the hash.

The cache could keep files referenced in this way by a site for an
especially long time.

In fact, going one step further, why not just do:

where the browser checks the MD5sum against the cached copy and uses it
without download if they match (and fixes it in the cache). openjsan.org
still pays a bandwidth bill, but it's much reduced as each browser only
downloads the file once. And the site has security against openjsan.org
getting hacked, because if the MD5 doesn't match, the browser won't
execute it.

But then, you need to have a way of getting the file if the MD5s don't
match. Otherwise your site goes down. So what you really want is:

So now the original attribute and MD5sum together a unique key. The
browser does the following:

- Do I have a cached resource with source "original" and MD5 "hash"?
If so, use.
- If not, can I access "src"? If so, and it has MD5 "hash", store it
with the key "original"/"hash" and use.
- If not, can I access "original"? If so, and it has MD5 "hash", store
it with the key "original"/"hash" and use.

But then this has the "hash algorithm breaking" problem again!

Random ramblings - can you get something useful out of them? :-)

Gerv

Boris Zbarsky

unread,

Jan 30, 2007, 11:20:01 AM1/30/07

to

Axel Hecht wrote:
> Talking of caching, we don't cache for https right now, right?

We don't cache it on disk. We do cache it in the memory cache.

-Boris

Robert Sayre

unread,

Jan 30, 2007, 1:53:06 PM1/30/07

to Justin Dolske

Justin Dolske wrote:
>
> I wonder how well this would work in practice, since every version of a
> script would have a different hash (not to mention people tweaking their
> copy of a script). I don't think the user is going to see much benefit
> unless lots of sites all start using the same version of the same large
> scripts.

That's the idea. It's a way to grow a standard library instead of
blessing one (by shipping in the browser). Look at the things listed here:

<http://www.sergiopereira.com/articles/prototype.js.html#Reference.Extensions>

why does every script have to include their own copies of these? They
take space away from application code.

>
> Even then, this would really only help with the first visit to such a
> site. With proper caching, subsequent visits to the site should be using
> the cached copy of the script anyway.

It would help the most on the first visit, but that is a pretty
important advantage. At the moment, we don't create .fasl files for
scripts, even if they are cached.

-Rob

bre...@mozilla.org

unread,

Jan 31, 2007, 5:58:15 PM1/31/07

to

On Jan 30, 10:53 am, Robert Sayre <say...@gmail.com> wrote:
> That's the idea. It's a way to grow a standard library instead of
> blessing one (by shipping in the browser). Look at the things listed here:
>

> <http://www.sergiopereira.com/articles/prototype.js.html#Reference.Ext...>

>
> why does every script have to include their own copies of these? They
> take space away from application code.

I still like this idea, but someone (Dylan Schliemann of Dojo, IIRC)
punctured my balloon-like hope by observing that a lot of Ajax apps
crunch and link-cull to make something more like an ASCII .swf file
than a standard, widely hosted and therefore cacheable .js file.

> It would help the most on the first visit, but that is a pretty
> important advantage. At the moment, we don't create .fasl files for
> scripts, even if they are cached.

I hear from some Web 2.0 devs that a ccache approach would win too, to
avoid the recompilation-from-source hit when crossing domain
boundaries and reloading the same prototype.js from the new domain. We
could do both. The ccache approach could be done without the shared=
attribute or the questions begged by the source-embedded crypto-hash.

Thanks for pushing this idea, it still seems worthwhile in spite of
the exceptions and complexities.

/be

bre...@mozilla.org

unread,

Jan 31, 2007, 6:05:06 PM1/31/07

to

On Jan 30, 10:53 am, Robert Sayre <say...@gmail.com> wrote:

> That's the idea. It's a way to grow a standard library instead of
> blessing one (by shipping in the browser). Look at the things listed here:
>

> <http://www.sergiopereira.com/articles/prototype.js.html#Reference.Ext...>

>
> why does every script have to include their own copies of these? They
> take space away from application code.

I still like this idea, but someone (Dylan Schliemann of Dojo, IIRC)

punctured my balloon-like hope by observing that a lot of Ajax apps
crunch and link-cull to make something more like an ASCII .swf file
than a standard, widely hosted and therefore cacheable .js file.

> It would help the most on the first visit, but that is a pretty

> important advantage. At the moment, we don't create .fasl files for
> scripts, even if they are cached.

I hear from some Web 2.0 devs that a ccache approach would win too, to

Robert Sayre

unread,

Feb 1, 2007, 1:53:34 AM2/1/07

to bre...@mozilla.org

bre...@mozilla.org wrote:
>
> I still like this idea, but someone (Dylan Schliemann of Dojo, IIRC)
> punctured my balloon-like hope by observing that a lot of Ajax apps
> crunch and link-cull to make something more like an ASCII .swf file
> than a standard, widely hosted and therefore cacheable .js file.
>

Well, maybe there are a couple ways to beat this. The first is to pack
the shared files--that doesn't hurt reuse. Second, it occurs to me that
using a trusted source in the shared= attribute value makes link-culling
irrelevant.

would mean "use my packed version of dojo, but if you have this other
thing already, you can use that instead, even though it contains code I
don't use." This method also avoids the scary stuff Gerv mentioned, at
the cost of making sure openjsan.org or whatever is pretty sturdy. We
already do exactly this for search suggestions, updates, extensions, etc.

>> It would help the most on the first visit, but that is a pretty
>> important advantage. At the moment, we don't create .fasl files for
>> scripts, even if they are cached.
>
> I hear from some Web 2.0 devs that a ccache approach would win too, to
> avoid the recompilation-from-source hit when crossing domain
> boundaries and reloading the same prototype.js from the new domain. We
> could do both. The ccache approach could be done without the shared=
> attribute or the questions begged by the source-embedded crypto-hash.

Yeah, we should be able to try that first.

What do people think is a good heuristic? Do it for all scripts? 15k of
script and an ETag will make us save a fastload file?

-Rob

Gervase Markham

unread,

Feb 1, 2007, 6:16:29 AM2/1/07

to

Robert Sayre wrote:
> <script src="/my/packed-and-culled-dojo.js"
> shared="http://openjsan.org/2006/....dojo.js" />
>
> would mean "use my packed version of dojo, but if you have this other
> thing already, you can use that instead, even though it contains code I
> don't use." This method also avoids the scary stuff Gerv mentioned, at
> the cost of making sure openjsan.org or whatever is pretty sturdy. We
> already do exactly this for search suggestions, updates, extensions, etc.

I think I was being overly-cautious. We already rely on the
unbreakability of hash algorithms in a lot of other security; if they
get broken, this is the least of our worries.

Another thing, though: if putting the URL of a JS file in the "shared"
attribute makes it cached for longer or load faster, why will all web
app authors not just do:

? If they do, do we care?

Do we accept only absolute URIs in "shared" to make sure the "key" we
are using is unique?

Gerv

rocal...@gmail.com

unread,

Feb 4, 2007, 5:14:24 AM2/4/07

to

On Feb 1, 11:58 am, bren...@mozilla.org wrote:
> On Jan 30, 10:53 am, Robert Sayre <say...@gmail.com> wrote:
> > why does every script have to include their own copies of these? They
> > take space away from application code.
>
> I still like this idea, but someone (Dylan Schliemann of Dojo, IIRC)
> punctured my balloon-like hope by observing that a lot of Ajax apps
> crunch and link-cull to make something more like an ASCII .swf file
> than a standard, widely hosted and therefore cacheable .js file.

This actually came up at the Baa Camp this weekend.

If there were fast-access sites for particular versions of popular
libraries like, say, Prototype or jQuery, then people might be more
inclined to use them instead of saving a few K by culling their own
custom version.

Personally I think that someone with the connectivity, trust and will
(e.g., Google) should just make the files available, promise to keep
them available, and people can just link to those URLs. Not a browser
issue. (Although Gerv's link hashes would reduce the trust
requirement.)

Rob

Chris Double

unread,

Feb 4, 2007, 8:55:54 PM2/4/07

to

On Feb 4, 11:14 pm, "rob...@ocallahan.org" <rocalla...@gmail.com>
wrote:

> Personally I think that someone with the connectivity, trust and will
> (e.g., Google) should just make the files available, promise to keep
> them available, and people can just link to those URLs.

AOL does something like this for the Dojo Javascript library:

http://blog.dojotoolkit.org/2006/06/16/aol-hosting-dojo-031-cross-domain-build-in-a-cdn
http://article.gmane.org/gmane.comp.web.dojo.user/10085

Chris.
--
http://www.bluishcoder.co.nz