JSBC: JavaScript Start-up Bytecode Cache

Nicolas B. Pierron

unread,

Jun 13, 2017, 5:50:13 AM6/13/17

to

The JavaScript Start-up Bytecode Cache⁰ is a project which aims at reducing
the page load time by recording the bytecode generated during the last
visits and by-pass the JavaScript parser.

This project changes the way we process JavaScript script tags which are
fetched from the network, and cached. After multiple visits¹, the bytecode
would be encoded incrementally², as soon as the bytecode emitter generates
it. Once we reached some idle time³, we save the content encoded
incrementally as an alternate data on the cache⁴. The cache contains a
compressed version of the source, the bytecode of functions which got
executed during the start-up of the page, and all non-executed functions
encoded as source indexes⁵.

On follow-up visits the script loader would load the alternate data instead⁶
of the source, and decode the bytecode either off-thread⁷ or on the
current-thread. This is expected to replace the syntax checker and the
bytecode emitter for all recorded functions.

This feature is currently pref-ed off and can be enabled by setting the
following preferences in about:config⁸:
- dom.script_loader.bytecode_cache.enabled = true
- dom.script_loader.bytecode_cache.strategy = 0

For any issue caused by this optimization, filed it as a blocker of Bug 900784.

In the upcoming days, I will add telemetry probes to better tune the
heuristics¹ for the web and monitor the known sources of fallback and
failures. In addition, I will request for a pref-experiment, such that we
can get more data from nightly users. At the moment, I expect to enable this
feature around mid-July.

⁰ https://bugzilla.mozilla.org/show_bug.cgi?id=900784
¹ These are heuristics which would be customized by running a
pref-experiment. (see https://bugzilla.mozilla.org/show_bug.cgi?id=1362114)
² We cannot do it off-thread, nor after the execution (see
https://bugzilla.mozilla.org/show_bug.cgi?id=1316081)
³ Currently set to the next cycle after the processing of the OnLoad event.
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1372207)
⁴ Thanks to Valentin Gosu for his work and support on the alternate data
interface as part of necko. (see
https://bugzilla.mozilla.org/show_bug.cgi?id=1231565)
⁵ https://bugzilla.mozilla.org/show_bug.cgi?id=917996
⁶ This forces us to store the compressed source as part of the encoded
bytecode, but prevent additional round-trip between the parent and child
processes.
⁷ https://bugzilla.mozilla.org/show_bug.cgi?id=1316078
⁸
http://searchfox.org/mozilla-central/rev/d840ebd5858a61dbc1622487c1fab74ecf235e03/modules/libpref/init/all.js#212-233

--
Nicolas B. Pierron

Ben Kelly

unread,

Jun 13, 2017, 9:39:33 AM6/13/17

to Nicolas B. Pierron, dev-pl...@lists.mozilla.org

On Tue, Jun 13, 2017 at 5:50 AM, Nicolas B. Pierron <
nicolas....@mozilla.com> wrote:

> The JavaScript Start-up Bytecode Cache⁰ is a project which aims at
> reducing the page load time by recording the bytecode generated during the
> last visits and by-pass the JavaScript parser.
>
> This project changes the way we process JavaScript script tags which are
> fetched from the network, and cached. After multiple visits¹, the bytecode
> would be encoded incrementally², as soon as the bytecode emitter generates
> it. Once we reached some idle time³, we save the content encoded
> incrementally as an alternate data on the cache⁴. The cache contains a
> compressed version of the source, the bytecode of functions which got
> executed during the start-up of the page, and all non-executed functions
> encoded as source indexes⁵.
>
> On follow-up visits the script loader would load the alternate data
> instead⁶ of the source, and decode the bytecode either off-thread⁷ or on
> the current-thread. This is expected to replace the syntax checker and the
> bytecode emitter for all recorded functions.
>

Just an FYI for people following along at home:

1. We don't support this when loading worker scripts yet.
2. If a page script load is intercepted by a service worker then this
optimization is effectively disabled.

There are a number of follow-up bugs filed to fix those things, but its a
non-trivial amount of work.

Ben

Dirkjan Ochtman

unread,

Jun 13, 2017, 11:01:08 AM6/13/17

to Nicolas B. Pierron, dev-platform

On Jun 13, 2017 11:55, "Nicolas B. Pierron" <nicolas....@mozilla.com>
wrote:

The JavaScript Start-up Bytecode Cache⁰ is a project which aims at reducing
the page load time by recording the bytecode generated during the last
visits and by-pass the JavaScript parser.

So this is about content JS only? Has anyone thought about doing similar
things for chrome JS? It would seem like that would actually be easier
(since it's known to be immutable)?

Cheers,

Dirkjan

Nicolas B. Pierron

unread,

Jun 13, 2017, 11:37:54 AM6/13/17

to

On 06/13/2017 03:00 PM, Dirkjan Ochtman wrote:
> On Jun 13, 2017 11:55, "Nicolas B. Pierron" <nicolas....@mozilla.com>
> wrote:
>
> The JavaScript Start-up Bytecode Cache⁰ is a project which aims at reducing
> the page load time by recording the bytecode generated during the last
> visits and by-pass the JavaScript parser.
>
>
> So this is about content JS only?

Yes, this is for the Content JS only, unless we are using the script loader
as well for Gecko resources.

> Has anyone thought about doing similar
> things for chrome JS?

Yes, and this is already the case for some XUL functions.

Also, if we want to do that for chrome resources, then I wonder how many
functions are unused during the start-up of Firefox?
- If only a few are unused, then we might just store the complete bytecode.
- If a lot remain unused, then we might do the same as here.

Also, the chrome files are stored in the jar file (If I recall correctly),
and we might want to generate the bytecode ahead of time, such that users
don't have to go through the encoding-phase.

> It would seem like that would actually be easier
> (since it's known to be immutable)?

The Alternate Data interface is good to abstract over the eviction
conditions. So, from the ScriptLoader point of view, this is exactly as-if
we are manipulating immutable scripts.

--
Nicolas B. Pierron

David Teller

unread,

Jun 13, 2017, 12:00:11 PM6/13/17

to dev-pl...@lists.mozilla.org

On 6/13/17 5:37 PM, Nicolas B. Pierron wrote:
> Also, the chrome files are stored in the jar file (If I recall
> correctly), and we might want to generate the bytecode ahead of time,
> such that users don't have to go through the encoding-phase.

How large is the bytecode?

I suspect that if it's too large, we'll be okay with generating the
bytecode on the user's computer.

Cheers,
David

Boris Zbarsky

unread,

Jun 13, 2017, 12:11:06 PM6/13/17

to

On 6/13/17 11:00 AM, Dirkjan Ochtman wrote:
> Has anyone thought about doing similar things for chrome JS?

We've been doing fastload for chrome JS (and indeed for entire chrome
XUL documents, including their scripts) for 15+ years now, no?

-Boris

Nicolas B. Pierron

unread,

Jun 13, 2017, 1:13:11 PM6/13/17

to

On 06/13/2017 03:59 PM, David Teller wrote:
>
>
> On 6/13/17 5:37 PM, Nicolas B. Pierron wrote:
>> Also, the chrome files are stored in the jar file (If I recall
>> correctly), and we might want to generate the bytecode ahead of time,
>> such that users don't have to go through the encoding-phase.
>
> How large is the bytecode?

With the current implementation, it depends on the quantity of functions
which are executed.

If there is no script executed at all, then we would mostly have the
chunk-compressed UC2-encoded source.

If all functions are used then we would have the size of the bytecode, which
is roughly more than twice the size of the source. In addition, you would
need the compressed source if we want to have toSource working properly,
which is currently mandatory for some addons (but not for long).

As for the JSBC, as this depends on the number of executed functions, I am
adding telemetry probe to monitor the size of the source and the size of the
encoded content such that I can have an idea of this ratio.

--
Nicolas B. Pierron

Chris Peterson

unread,

Jun 13, 2017, 4:36:40 PM6/13/17

to

Nicolas, when JSBC is enabled by default, should we change our test
procedure for our various page load tests (Talos and Softvision's manual
testing)? Since the first page load will be slower than subsequent page
loads (as you noted in the bug [1]), should we throw away the first page
load time or continue to average it with the subsequent page load times?

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=900784#c72

chris

Kris Maglione

unread,

Jun 13, 2017, 5:23:33 PM6/13/17

to Boris Zbarsky, dev-pl...@lists.mozilla.org

Yes and no. We do something similar to this for the module
loader and subscript loader, but only for the entire compiled
source, not for individual functions, and without any kind of
lazy compilation.

For <script> tags in XUL documents, we only use a bytecode cache
for inline scripts. Other scripts are compiled off-thread from
source, using the ordinary script loader logic. Or, at least,
that's my understanding after investigating whether we could use
the script precompiler for them, to improve first paint time.

William Lachance

unread,

Jun 13, 2017, 5:24:48 PM6/13/17

to

On 2017-06-13 4:36 PM, Chris Peterson wrote:
> Nicolas, when JSBC is enabled by default, should we change our test
> procedure for our various page load tests (Talos and Softvision's manual
> testing)? Since the first page load will be slower than subsequent page
> loads (as you noted in the bug [1]), should we throw away the first page
> load time or continue to average it with the subsequent page load times?
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=900784#c72

For the aggregated Talos summary values we display and alert on in
Perfherder, we already throw out the first few page loads in most (all?)
tests. For example:

https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5

Will

Mike Hommey

unread,

Jun 13, 2017, 5:39:16 PM6/13/17

to Boris Zbarsky, dev-pl...@lists.mozilla.org

On Tue, Jun 13, 2017 at 12:10:58PM -0400, Boris Zbarsky wrote:

Back memory lane...

Fastload is 16 years old according to bugzilla (bug 68045), but was
removed 6 years ago (bug 654489) in favor of startupcache (bug 592943),
which, BTW, we're not generating at build time anymore as of Firefox 55
(bug 1351071) (but we're still generating it at startup time).

I don't remember what fastload was keeping around, but startupcache
stores pre-parsed JS, but I'm not sure of the details, whether that's
some AST or something else. Storing bytecode instead could still be a
win.

Mike

Boris Zbarsky

unread,

Jun 13, 2017, 11:54:25 PM6/13/17

to

On 6/13/17 5:23 PM, Kris Maglione wrote:
> Yes and no. We do something similar to this for the module loader and
> subscript loader, but only for the entire compiled source, not for
> individual functions, and without any kind of lazy compilation.

True.

> For <script> tags in XUL documents, we only use a bytecode cache for
> inline scripts. Other scripts are compiled off-thread from source, using
> the ordinary script loader logic. Or, at least, that's my understanding
> after investigating whether we could use the script precompiler for
> them, to improve first paint time.

Hmm. I guess we still cache the compiled script in the
nsXULPrototypeCache but no longer do XDR for out-of-line <script>s?
That's certainly a change from the last time I looked at that code...
We used to do XDR there.

Doing the per-function thing might make sense if (as I expect) most of
our chrome code is cold...

-Boris

Nicolas B. Pierron

unread,

Jun 14, 2017, 6:02:29 AM6/14/17

to

On 06/13/2017 08:36 PM, Chris Peterson wrote:
> Nicolas, when JSBC is enabled by default, should we change our test
> procedure for our various page load tests (Talos and Softvision's manual
> testing)? Since the first page load will be slower than subsequent page
> loads (as you noted in the bug [1]), should we throw away the first page
> load time or continue to average it with the subsequent page load times?
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=900784#c72

These results [1] were with the eager encoding of the bytecode, while
running locally.

Since, I added an intuition-based heuristics which by luck end-up keeping
the encoding time as part of the ignored set, as we encode as part of the
5th visit. Thus giving the following results on tp5 [2].

Depending on how the heuristics are tuned based on telemetry results [3], we
should either split tp5 results in 2/3 sets, or increase the ignore set size.

If this is not already the case, we should change tp5 benchmark harness to
wait for the idle-callback before moving to the next page. Otherwise, we
might repeatedly attempt to encode the bytecode of one page, but never stay
long enough on the page to save it.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=900784#c72
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=900784#c113
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1362114

--
Nicolas B. Pierron

Nicolas B. Pierron

unread,

Jun 14, 2017, 8:21:56 AM6/14/17

to

On 06/13/2017 09:38 PM, Mike Hommey wrote:
> On Tue, Jun 13, 2017 at 12:10:58PM -0400, Boris Zbarsky wrote:
>> On 6/13/17 11:00 AM, Dirkjan Ochtman wrote:
>>> Has anyone thought about doing similar things for chrome JS?
>>
>> We've been doing fastload for chrome JS (and indeed for entire chrome XUL
>> documents, including their scripts) for 15+ years now, no?
>

> I don't remember what fastload was keeping around, but startupcache
> stores pre-parsed JS, but I'm not sure of the details, whether that's
> some AST or something else. Storing bytecode instead could still be a
> win.

Looking at the implementation of nsXULPrototypeScript::Deserialize, we use
the same methods (XDR) as used by JSBC, i-e we encode/decode the bytecode.

--
Nicolas B. Pierron

Kris Maglione

unread,

Jun 14, 2017, 3:56:29 PM6/14/17

to Boris Zbarsky, dev-pl...@lists.mozilla.org

On Tue, Jun 13, 2017 at 11:54:17PM -0400, Boris Zbarsky wrote:
>On 6/13/17 5:23 PM, Kris Maglione wrote:
>>For <script> tags in XUL documents, we only use a bytecode cache for
>>inline scripts. Other scripts are compiled off-thread from source, using
>>the ordinary script loader logic. Or, at least, that's my understanding
>>after investigating whether we could use the script precompiler for
>>them, to improve first paint time.
>
>Hmm. I guess we still cache the compiled script in the
>nsXULPrototypeCache but no longer do XDR for out-of-line <script>s?
>That's certainly a change from the last time I looked at that code...
>We used to do XDR there.

We store XDR data for XBL functions and I believe inline <script> tags
in the prototype cache, but I'm pretty sure out-of-line scripts are
loaded from omnijar as normal scripts. There might be some special
casing in the prototype cache for the second and subsequent browser
window (I'm not sure), but I checked the loads for the first window with
a debugger to make sure I understood what was going on there.

I suspect that means we should be able to use the new bytecode cache
stuff pretty transparently for those scripts, though.

>Doing the per-function thing might make sense if (as I expect) most of
>our chrome code is cold...

I'm not sure. It may be helpful, but there are some trade-offs, so it
could go either way. A few points to consider:

1) For the module and subscript loaders, we currently throw away sources
to save memory, and define a lazy source hook to load them on demand for
the sake of add-ons. That means we can't do lazy function compilation
for those scripts, and would have to start retaining their sources to
allow it. On the other hand, we might actually use less memory with
in-memory compressed source and lazy function compilation, but that's
not entirely clear without testing.

2) We currently pre-compile those scripts on a background thread at
startup, and in practice, they're currently nearly always ready by the
time that they're needed. So lazy compilation might not save us anything
in terms of startup speed, except on single-core/non-hyper-threaded
machines that don't spend much startup time blocked on IO.

3) With the current script pre-loader startup cache, the XDR bytecode
cache is an uncompressed, memory-mapped file that's shared between all
content processes. As currently implemented, that doesn't save us much
memory, but I'm hoping that we can take advantage of it to store
pointers directly to the cached XDR data for lazy functions, rather than
storing XDR data or compressed source code per-process.

I suspect this would give us a much bigger win for chrome code than
we'd get from using the same logic as web content, given how much chrome
code is shared across content processes vs. web content code. But it's
entirely possible that some CDN scripts are common enough that it's
worth trying to do something similar for both chrome and content XDR
caches.

dmos...@mozilla.com

unread,

Nov 8, 2017, 2:50:44 PM11/8/17

to

Does this cache bytecode for about: pages as well? As an example, caching bytecode for various JS scripts from resource: and chrome: for about:home might get interesting startup improvements...

Thanks,
Dan

Nicolas B. Pierron

unread,

Nov 9, 2017, 6:08:07 AM11/9/17

to

On 11/08/2017 07:50 PM, dmos...@mozilla.com wrote:
> Does this cache bytecode for about: pages as well? As an example, caching bytecode for various JS scripts from resource: and chrome: for about:home might get interesting startup improvements...

I would not expect the JSBC to be used in such cases because JavaScript
files are loaded from resource:// and chrome:// protocols which are not
implementing the nsICacheInfoChannel API as the necko cache does.

Thus, if we want to do so, we should probably pre-compiled the content of
the JSBC and insert it in the OmniJar, and implement the nsICacheInfoChannel
to let the ScriptLoader know about these pre-compiled resources.

--
Nicolas B. Pierron