Changes for JS Shrinking

354 views
Skip to first unread message

Alon Zakai

unread,
Nov 27, 2017, 8:10:41 PM11/27/17
to emscripten-discuss
Recent discussions about our JS size [1] led to a plan for shrinking it, and the first step along the plan [2] has a few PRs open for it. Since this will change some things, we thought it made sense to post to the mailing list about it.

The background is that for a medium to large project (like a game engine) we emit compact and efficient JS and asm.js/wasm. However, for a small project we could do better, especially on the JS size, in part because we've focused a lot on optimizing the compiled code (asm.js/wasm), but the non-compiled JS can be significant in a small project. And as wasm increases the interest in compiling to the web, we've been seeing more people thinking about small projects these days, so we should do better there.

For example, a small program using some libc stuff (printf, malloc, etc.) optimized for size is 5K of gzipped wasm and 9K of gzipped JS. The JS should be smaller! :)

The plan [3] for improving this will involve some breaking changes, since part of the problem is that we export a lot of our runtime by default, so it's emitted even if you don't use it. Breaking changes are never good, but we've thought carefully about how to minimize the risk and annoyance here. Feedback is very welcome. Overall, we hope to emit a compile-time error for breaking changes when possible, which should make any changes users need to make very simple. However, some things can't be checked at compile time. We want to minimize the harm for those as follows:

 * In builds with ASSERTIONS enabled, emit a stub for the thing that is being removed. Then if it is actually used, it will show an error message, something like "this is no longer exported by default, you need to export it yourself." It should be a simple fix given the message.

 * We already enable ASSERTIONS in -O0 builds by default. So the extra runtime explanations would appear there as well. Hopefully most people, when investigating something broken, will try either an unoptimized build or a build with ASSERTIONS (as we already recommend doing so).

 * We'll document breaking changes in Changelog.markdown (which we really should use more).

 * I think we're pretty responsive on the issue tracker in general, but we can try to be extra-responsive about issues filed about these changes.

To be more concrete, for example we would like to stop exporting getValue and setValue by default [4]. The consequences of that change will be:

 * If you don't use getValue or setValue, nothing at all changes.

 * If you use Module['getValue'] then you must export it, using something like -s EXTRA_EXPORTED_RUNTIME_METHODS=["getValue"]. If you don't export it, you'll get the error message mentioned above at runtime, in -O0 or ASSERTIONS builds, which can help quickly fix things.

 * If you use getValue directly (not indirectly on Module), then if you are inside code that the compiler optimizes - anything in a pre-js, post-js, or js-library - then it sees you are using it, and will not remove it, so everything will still work. However, if you use it from another script tag on the HTML file, which emcc did not see, then getValue will not exist and you'll get an error - that is something that never worked with closure compiler, though, and also has always been something we don't say should work, as only things exported on Module should be relied upon from the outside.

In conclusion, these changes may cause breakage if you use these internal runtime methods, but fixing the breakage is very simple, and we're trying hard to make the fix obvious. I think the risk is worth it for the benefit of emitting much more compact JS.

Thoughts?

- Alon

Floh

unread,
Nov 28, 2017, 1:41:54 AM11/28/17
to emscripten-discuss
This is wonderful news :) In my smallest-possible WebGL demos (https://floooh.github.io/sokol-html5/index.html) the .js part is about 2..3x times bigger than the .wasm part (after compression), for the triangle demo this is for instance (download size in Chrome) wasm 11.4 KByte, js 31.8 KByte. I think in those demos it's mostly the library_gl.js shim, not sure if there's much room for improvement though.

Improvements in other areas are also highly appreciated of course :)

Cheers,
-Floh.

Alon Zakai

unread,
Nov 28, 2017, 1:34:42 PM11/28/17
to emscripten-discuss
Yeah, I see similar things when compiling say tests/gles2_conformance.cpp (with something like -O2 -s WASM=1 --closure 1), the JS is over 2x bigger than the wasm. Looking in the code it's actually mostly SDL and browser integration that is the cause, less GL - we should look into that.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Александр Гурьянов

unread,
Nov 29, 2017, 4:02:56 AM11/29/17
to emscripte...@googlegroups.com
Hi. This discussion is very interesting, but what do you think about
solving this problem from other side. I talk about solution like
proguard for java, is it possible to create such tool that drop unused
functions from generated js? Maybe it's very hard to collect what
functions from emscripten core is used, in that case we can create
profiles with white/black listed functions. Like `tool -s
KEEP=gl,sdl,... generated.js` or evenbetter like in proguard rich file
with config. I think this solution is more safe, because we can stay
backward compatible. What do you think?
>> email to emscripten-disc...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-disc...@googlegroups.com.

Floh

unread,
Nov 29, 2017, 8:28:07 AM11/29/17
to emscripten-discuss
AFAIK emscripten already had a fairly aggressive dead-code elimination for some time at least for the C/C++ side, all unreferenced functions and data will be dropped during linking (that's why EMSCRIPTEN_KEEPALIVE and -s EXPORTED_FUNCTIONS is necessary to keep unreferenced functions in).

I think (but am not 100% sure) that some sort elimination also happen in the JS shims, there's also the optional closure compiler pass, which would remove unused JS functions, but using this was a bit hit-and-miss for me (usually the result wasn't much different).

Cheers,
-Floh.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Александр Гурьянов

unread,
Nov 29, 2017, 11:18:39 AM11/29/17
to emscripte...@googlegroups.com
But if there is no functions to delete, how generated js can be
shrinked, I don't understand. BTW for unity project, I write custom
tool for collecting functions that used in runtime, and near 70% of
function is never called. I can reduce size of unity project from 26Mb
to 12Mb, without having in game problems. I understand that is not
related for shrinking emscripten core, but this case is illustrate
that compiler dead code elemination in worst case can be useless.
Proguard works in way that all functions are "dead" unless they kept
by configuration file, or called from function that kept in
configuration file. So if we talk about emscripten core, we can say
that if "library_sdl.js" is not set in configuration file then we drop
it, this is main idea. I admit that this can be already done in
emscripten compiler, in that case I missunderstand something and go to
read thread again :)
>> >> email to emscripten-disc...@googlegroups.com.
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "emscripten-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to emscripten-disc...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-disc...@googlegroups.com.

Alon Zakai

unread,
Nov 29, 2017, 11:30:13 AM11/29/17
to emscripten-discuss
caiiiycuk and floooh, you're both correct here. We do already have quite a lot of optimization for removing dead code, both in compiled code (LLVM DCE, binaryen DCE) and in JS (JSDCE, closure). Those should remove almost everything that can be proven statically to not be used (but as mentioned above, static optimization may miss things, it can't tell if e.g. physics isn't used in Unity at runtime). So there should not be parts of libc or library_sdl.js in the output if the compiler sees they aren't necessary.

But we also need more. I didn't mention it yet since it's longer term and I'm not sure how it will work or how well, but since the question came up, my current thinking is that we need

1. To stop exporting as many things by default. That's the current focus. Exporting unnecessary things prevents all kinds of DCE, both current and future. Improving this with our current DCEs can get us pretty far, the open PRs reduce the size of the first testcase's JS by 12% already.

2. A new JS/wasm DCE, a DCE that spans both JS *and* wasm. Currently we have good DCE in each separately, but they can't collect cycles that cross the two worlds. This would need to look at the combined reachability graphs of both worlds at once. I'm not sure how much this will help: I've seen cases where it could, but I think they are mostly rare or minor. But it's hard to tell ahead of time.

- Alon



>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "emscripten-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an

>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.

Charles Vaughn

unread,
Dec 18, 2017, 7:24:42 PM12/18/17
to emscripten-discuss
Having done a bunch of work on minimizing asm.js foot print, DCE is great, but it's only good on determining stuff known at compile time. You could have a function like:

if (localeFormat.isLTR) {
  formatTextLTR();
} else {
  formatTextRTL();
}

Then if you never run in an environment using RTL, you'll have the code included, but the branch never taken.


Jukka Jylänki

unread,
Dec 22, 2017, 8:37:32 AM12/22/17
to emscripte...@googlegroups.com
Having a final last pass run of Closure or something similar is great,
and definitely recommended to do on the top of the other optimizations
that are happening. A number of users have complained about "out of
the box" code size, and they would like -O0 to look tiny even when not
minified or Closured. This comes very much from educational/mental
image perspective rather than actual KBytes perspective, where people
would like to be able to read the generated nonminified -O0 output and
understand it. Currently new users sometimes trip themselves up when
they do a printf("hello world\n"); and they expect to get out a simple
console.log("hello world"); as a result, but instead get 100KB of JS.
This expectation is unreasonable as we well know, but just to explain
why optimizing from all sides is equally important. There's been some
great contributions to Closure support as well, so we can do things on
both sides, they're not mutually exclusive.
Reply all
Reply to author
Forward
0 new messages