ANN: Ayzim, a more memory-efficient, faster asm.js optimizer

249 views
Skip to first unread message

Aidan Hobson Sayers

unread,
Nov 4, 2016, 11:09:02 PM11/4/16
to emscripte...@googlegroups.com
Using Ayzim as a drop-in replacement for the Emscripten asm.js native optimizer when compiling an asm.js project of moderate or large size on `-O2` or `-O3` should result in a ~50-75% reduction in memory usage and a ~25-50% speedup when running asm.js native optimizer passes (i.e. most of the "js opts" stage as seen in in EMCC_DEBUG output).

To get it, download the compiled releases for Linux and Windows from the ayzim releases page, extract them and replace (after backing up!) the existing optimizer(.exe) binary in `emsdk/emscripten/incoming_optimizer_64bit/` (if you're not on `incoming` but still feel brave, take a look at your emscripten config file, usually at `$HOME/.emscripten`, which should point you to the right place).

--

Some background: when I was trying to port a large application to asm.js about 6 months ago I had serious problems with the Emscripten asm.js optimizer - it would split the 750MB .js file into chunks and promptly consume all 8GB of my RAM by trying to optimize the chunks in parallel, swapping everything else out of memory and grinding the machine to a halt. I tackled this problem by taking a brief(!) diversion to rewrite the optimizer in Rust to be more memory efficient. Along the way I added a few speedups.

Ayzim is probably an entry in the "well this might have been useful two years ago" section of software (since asm.js is 'shortly' going to be made redundant by wasm) but someone may find a use for it. For example, people wanting to understand the structure of the Emscripten optimizer ast may want to look at this code and/or ask me since I'm very familiar with it now :)

In time I may extend Ayzim to support wasm optimizations and move it to being more of a library, but that's for the future.

Aidan

Aidan Hobson Sayers

unread,
Nov 4, 2016, 11:34:01 PM11/4/16
to emscripte...@googlegroups.com
One thing I wanted to quote from the Ayzim readme on the subject of stability and bugs: "As with any non-trivial rewrite it's probably a little buggy, but the enormous Emscripten test suite reports no issues of consequence so it should be reasonably correct."

Alon Zakai

unread,
Nov 5, 2016, 3:38:54 PM11/5/16
to emscripten-discuss
Very cool!

For those interested to check this out, you can just replace the existing optimizer executable, and if you want to go back, just deleting the replacement will make emcc rebuild the original one.

Regarding those improvements to speed and memory use, I'm curious where they come from - what were the changes you made? Small things, or large structural changes to the AST? For comparison, the binaryen optimizer also has some major improvements to speed and memory compared to the emscripten asm.js one, and that's mostly from the redesigned AST - I'm curious if we ended up doing similar things to improve on the old optimizer. Also, do you think your choice of language had an effect here?

Have you verified this generates the same output as the asm.js one, btw? You mention it passes the test suite, but I'm also curious if it's literally generating the same code as well.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aidan Hobson Sayers

unread,
Nov 6, 2016, 2:26:13 PM11/6/16
to emscripte...@googlegroups.com
The memory improvements are basically down to the AST being more strongly typed in Ayzim - structurally it's the same (there's a simple 1-1 mapping between representations). For example, the uglifyjs AST representation of `-4096` is `["unary-prefix", "-", ["num", 4096]]. This was translated faithfully to the C++ optimizer, so the arrays are `Value`s, which are dynamically typed by being a tagged union. The memory cost of a single `Value` is 16 bytes (`double` is the largest type in the union, and you pay that again to be able to hold the tag in the struct and pad it)...and then there's another 3*ptrsize bytes to store the vector type somewhere if the `Value` is an array (typically to point to a child node...so just add this for every node since they're all children!). You then multiply the size of `Value` by the number of items in the array, which is two at minimum (ish). Overall, for an AST node you're paying `3*ptrlen + 32 + (16*N_additional_array_items)`.

However, you know that  `X` in `["num", X]` is a double, and that each AST node has a limited set of possible tags, so you can treats whole AST nodes as tagged unions, rather than the individual AST node fields. In Ayzim you end up paying `ptrlen + 32` for any single ast node (ish).

On top of this, the Emscripten optimizer leaks memory when replacing nodes (and elsewhere), probably because it's actually pretty tricky to keep track of what you're replacing, whether you've got another pointer to it hanging around somewhere and whether it's safe to deallocate. Leveraging the Rust ownership system made it quite tricky to translate parts of the C++ code, but the result is no memory leaks.

Speedups were probably mostly from a) better overall optimization of the AST node tagged enum, b) less compact memory, c) using a string interning library with some interning at compile time (thanks to the Servo project) so some value comparisons could be inlined rather than going via pointer lookup, and d) a carefully chosen piece of low hanging fruit in registerizeHarder.

The language helped in that I felt it gave me niceties to help me succeed (e.g. 'first class' tagged unions, exhaustiveness checking on matching tagged unions), it has a good library experience and the whole memory safety thing is nice. Downsides were the compile times, difficulties of translating highly unsafe C++ code and a number of language papercuts (lexical lifetimes in particular). Someone with full understanding of the optimizer and C++ would have been very able to do all the macro-optimizations in C++ so Rust isn't more powerful, but as a fallible human it helped me a lot and I felt much less like I was juggling chainsaws than when I've made my previous changes to the C++ optimizer :) For example, I'd feel pretty optimistic about my ability to add the duplicate function eliminator to it.

I should have made my testing process clearer - during development I was diffing output of optimizing the sqlite and unity asm.js full library files. The only remaining differences I'm aware of are a) better float representation from ayzim and b) the three test cases here (all three very minor - one ayzim is better at, one emscripten is better at, one ayzim gets very slightly wrong with float representation). If you spy other differences, let me know.

To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alon Zakai

unread,
Nov 6, 2016, 2:50:10 PM11/6/16
to emscripten-discuss
Very interesting, thanks!

In particular about Rust helping avoid leaking, that's something I hadn't thought of, very nice. However, the reason the C++ optimizer leaks is, aside from complexity as you mentioned, also for another reason: to avoid memory allocation overhead, so that allocation is always bumping a pointer, and freeing is just ignoring it. If you're not leaking, then I guess you have lists of freed objects for reuse, or such? I wonder if it's possible to measure that overhead. (For comparison, in the binaryen optimizer I've focused on reusing nodes; there is still leaking when that isn't possible, but it's relatively rare.)

Memory-wise, yeah, keeping the 1-1 mapping to the Uglify AST definitely hurt the C++ optimizer's memory usage. The binaryen optimizer does something similar to what you said, with properly strongly-typed nodes. So e.g. -4096 would be a Const node, currently taking 20 bytes (it could be 16, though).

Aidan Hobson Sayers

unread,
Nov 6, 2016, 4:13:18 PM11/6/16
to emscripte...@googlegroups.com
I ended up not implementing the arena allocation used in the emscripten optimizer (it probably wouldn't be terribly tricky to add, there are notes in cashew.rs) because profiling never identified it as being a big deal. You can verify this just by looking at the primitive timing of the different phases of registerizeHarder and eliminate - overall, the vast majority of time is spent in phases that just analyse the AST. AST node replacement happens where possible, e.g. `*x = Num(0f64)`, but it's likely mostly that jemalloc (the default allocator in rust, which may be considered cheating) is probably pretty quick. I could make wild guesses at other causes (perhaps jemalloc is good at deallocating and reallocating 32-byte types and does something clever internally), but it was never high enough on the profiler to get my attention. That's not to say there aren't savings to be made, but I could spend time better elsewhere even with the current speed.

Alon Zakai

unread,
Nov 6, 2016, 4:25:58 PM11/6/16
to emscripten-discuss
Interesting. I saw malloc when profiling, but it wasn't very high, and makes sense a really good allocator like jemalloc might be able to make it negligible.

To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.

awt

unread,
Dec 1, 2016, 10:43:57 PM12/1/16
to emscripten-discuss
Hi,

This looks very promising and I am interested to try it out. I am currently on emsdk 1.36.3 so where should I place ayzim-opt.exe? The only optimizers that I see in my build are in emsdk\clang\e1.36.3_64bit where I have opt.exe and optimizer.exe. 

I tried to replace optimizer.exe with ayzim-opt.exe and it crashes after a while.

Aidan Hobson Sayers

unread,
Dec 3, 2016, 4:30:48 AM12/3/16
to emscripte...@googlegroups.com
Hmm, I'm not sure those are right. You double check as follows:

When you've entered the emscripten environment (with the emsdk.bat, so emcc is now runnable from the command line), try printing out the EM_CONFIG variable - it should be a path to a file. Inside this file should be a EMSCRIPTEN_NATIVE_OPTIMIZER variable, which tells emscripten where the optimizer is.

If you get through all this have replaced that file and it still crashes (or it's the same file), if you run with EMCC_DEBUG=1 and paste it into a github issue on ayzim then I'm eager to take a look. Or, if it's possible to share any part of the project (either so I can build from source, or just the final objects before the link -> .js step), then that's great too.

--

Jukka Jylänki

unread,
Dec 3, 2016, 8:25:18 PM12/3/16
to emscripte...@googlegroups.com
opt.exe is LLVM's own optimizer, and optimizer.exe is the asm.js optimizer that Ayzim intends to replace.

Emsdk has two separate directory structures for precompiled and compile-from-source installations. In precompiled installations, these opt.exe and optimizer.exe live in the same directory (== LLVM binaries directory). In from-source compiled installations, optimizer.exe lives in its own CMake build directory. Like Aidan mentions, you can check out the .emscripten file to see where the current Emscripten environment is looking up for that tool.

It might be preferable to install ayzim not by replacing the optimizer.exe, but having it as separate file optimizer-ayzim.exe or similar, and editing the .emscripten file to point to this location. This way it will be easier to revert back to the original asm.js optimizer.

When using emsdk, editing the .emscripten file that emsdk generates is ok to do. The only thing here is to remember that it's the "emsdk activate" step which writes the .emscripten file, so if you have any manual modifications to .emscripten file, then if you later call "emsdk activate", the manual modifications will be lost, so calling "emsdk activate" will revert back to using the original emscripten optimizer.

If you do replace optimizer.exe, then there should not be any need to change .emscripten file. So I suppose it's whichever way feels simpler.


To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

awt

unread,
Dec 5, 2016, 12:43:26 AM12/5/16
to emscripten-discuss
Aidan and JJ, 

Thanks for your reply. I believed that I have overwritten the optimizer.exe in the correct location according to .emscripten. This is part of the log that I got before the crash in Aidan's optimizer. Unfortunately, I cannot share my project but I will copy this log to an issue in ayzim.

thread 'main' has overflowed its stack
DEBUG:root:EMCC_WASM_BACKEND tells us to use asm.js backend

splitting up js optimization into 124 chunks, using 40 cores  (total: 301.94 MB)

Traceback (most recent call last):

  File "E:\emsdk\emscripten\1.36.3\\em++", line 13, in <module>

    emcc.run()

  File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1731, in run

    JSOptimizer.flush()

  File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1643, in flush

    run_passes(chunks[i], 'js_opts_' + str(i), just_split='receiveJSON' in chunks[i], just_concat='emitJSON' in chunks[i])

  File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1613, in run_passes

    final = shared.Building.js_optimizer(final, passes, debug_level >= 4, JSOptimizer.extra_info, just_split=just_split, just_concat=just_concat)

  File "E:\emsdk\emscripten\1.36.3\tools\shared.py", line 1741, in js_optimizer

    ret = js_optimizer.run(filename, passes, NODE_JS, debug, extra_info, just_split, just_concat)

  File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 544, in run

    return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, source_map, extra_info, just_split, just_concat))

  File "E:\emsdk\emscripten\1.36.3\tools\tempfiles.py", line 64, in run_and_clean

    return func()

  File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 544, in <lambda>

    return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, source_map, extra_info, just_split, just_concat))

  File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 446, in run_on_js

    filenames = pool.map(run_on_chunk, commands, chunksize=1)

  File "E:\emsdk\python\2.7.5.3_64bit\lib\multiprocessing\pool.py", line 250, in map

    return self.map_async(func, iterable, chunksize).get()

  File "E:\emsdk\python\2.7.5.3_64bit\lib\multiprocessing\pool.py", line 557, in get

    raise self._value

AssertionError: Error in optimizer (return code 255): 

ninja: build stopped: subcommand failed.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages