wasm-opt is extremely slow

20 views
Skip to first unread message

Александр Гурьянов

unread,
Nov 21, 2022, 2:19:11 PM11/21/22
to emscripte...@googlegroups.com
Hi, I added dosbox-x as backend for js-dos it's size is ~12Mb, wasm-opt requires 10-12 min to optimize it (-Oz). Do you have any advice how to speed up it? If I build with -O0/1 I have following error in chrome:


CompileError: WebAssembly.compile(): Compiling function #6485:"CPU_Core_Normal_Run()" failed: local count too large @+4790225

Alon Zakai

unread,
Nov 21, 2022, 3:39:52 PM11/21/22
to emscripte...@googlegroups.com
First thing, I would try -O2 and -Os. They can be much faster than -O3 and -Oz.

Otherwise, 10-12 minutes sounds extreme even for 12MB, so this is likely hitting an unknown bad case. If you can run a system profiler on it that might show something useful. You can also run with BINARYEN_PASS_DEBUG=1 in the env, which will list times of passes - perhaps one particular pass will be slowest.

On Mon, Nov 21, 2022 at 11:19 AM Александр Гурьянов <caii...@gmail.com> wrote:
Hi, I added dosbox-x as backend for js-dos it's size is ~12Mb, wasm-opt requires 10-12 min to optimize it (-Oz). Do you have any advice how to speed up it? If I build with -O0/1 I have following error in chrome:


CompileError: WebAssembly.compile(): Compiling function #6485:"CPU_Core_Normal_Run()" failed: local count too large @+4790225

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/CAKOm%3DVH02jeZ70pHAe%2BvTL26ScOato8r7b_UBW_V34mJvGwt_Q%40mail.gmail.com.

Thomas Lively

unread,
Nov 21, 2022, 3:39:57 PM11/21/22
to emscripte...@googlegroups.com
That's really slow! Are you using the release binaries or building wasm-opt yourself? If you're building it yourself, are you sure you're building wasm-opt in release mode?

Can you run wasm-opt with environment variable BINARYEN_PASS_DEBUG=1 and share the results? That will show which optimization passes are taking the most time.

On Mon, Nov 21, 2022 at 1:19 PM Александр Гурьянов <caii...@gmail.com> wrote:
Hi, I added dosbox-x as backend for js-dos it's size is ~12Mb, wasm-opt requires 10-12 min to optimize it (-Oz). Do you have any advice how to speed up it? If I build with -O0/1 I have following error in chrome:


CompileError: WebAssembly.compile(): Compiling function #6485:"CPU_Core_Normal_Run()" failed: local count too large @+4790225

--

Александр Гурьянов

unread,
Nov 21, 2022, 4:27:18 PM11/21/22
to emscripte...@googlegroups.com
I use pre compiled binaryen from emdsk:
wasm-opt version 110 (version_101-1028-g590f63782)

The outpus of BINARYEN_PASS_DEBUG=1:

[PassRunner] running passes
[PassRunner]   running pass: generate-dyncalls...     0.0692962 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: legalize-js-interface... 0.0484083 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: strip-target-features... 0.000597624 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: strip-dwarf...           0.00404366 seconds.
[PassRunner]   (validating)
[PassRunner] passes took 0.122346 seconds.
[PassRunner] (final validation)
[PassRunner] running passes
[PassRunner]   running pass: strip-dwarf...                        0.00130085 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: post-emscripten...                    0.169116 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: duplicate-function-elimination...     1.00175 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: memory-packing...                     0.0121084 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: once-reduction...                     0.0510261 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: ssa-nomerge...                        1.748 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: dce...                                0.839832 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-names...                0.20939 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-brs...                  1.14575 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-names...                0.499286 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: optimize-instructions...              0.93456 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: pick-load-signs...                    0.136761 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: precompute-propagate...               2.21939 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: optimize-added-constants-propagate... 2.78762 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: code-pushing...                       0.210713 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: simplify-locals-nostructure...        1.38406 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: vacuum...                             0.963663 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: reorder-locals...                     0.180559 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-brs...                  0.601953 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: merge-locals...                       1.2248 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: coalesce-locals...                    8.24509 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: local-cse...                          0.665407 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: simplify-locals...                    1.9897 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: vacuum...                             0.959832 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: reorder-locals...                     0.15094 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: coalesce-locals...                    0.956974 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: reorder-locals...                     0.152977 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: vacuum...                             0.968055 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: code-folding...                       5.8405 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: merge-blocks...                       0.761645 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-brs...                  0.789087 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-names...                0.120671 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: merge-blocks...                       0.366825 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: precompute-propagate...               1.94412 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: optimize-instructions...              0.825021 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: rse...                                0.800074 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: vacuum...                             0.925147 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: dae-optimizing...                     4.91022 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: inlining-optimizing...                25.834 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: duplicate-function-elimination...     0.258486 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: duplicate-import-elimination...       0.00069257 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: merge-similar-functions...            0.169829 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: simplify-globals-optimizing...        0.21235 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: remove-unused-module-elements...      0.112781 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: reorder-globals...                    0.00203702 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: directize...                          0.0837749 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: generate-stack-ir...                  0.137842 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: optimize-stack-ir...                  8.89946 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: asyncify...                           286.814 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: strip-debug...                        0.00531564 seconds.
[PassRunner]   (validating)
[PassRunner]   running pass: strip-producers...                    0.00134905 seconds.
[PassRunner]   (validating)
[PassRunner] passes took 370.225 seconds.
[PassRunner] (final validation)
[PassRunner] running passes
[PassRunner]   running pass: print-function-map... 0.0223957 seconds.
[PassRunner]   (validating)
[PassRunner] passes took 0.0223957 seconds.
[PassRunner] (final validation)

It said that it took 370 sec (~6min), but actually it took 13.5 min:

Снимок экрана 2022-11-22 в 00.23.35.png
Maybe it somehowe related to asyncify, I want to add function into asyncify list by name (without sig), to do this I use:
funcName(*)*

But anyway regulard dosbox (not-x) do wasm-opt for 1-2 minutes on same asyncify lists (1.4Mb).

пн, 21 нояб. 2022 г. в 23:39, 'Thomas Lively' via emscripten-discuss <emscripte...@googlegroups.com>:

Александр Гурьянов

unread,
Nov 21, 2022, 4:29:12 PM11/21/22
to emscripte...@googlegroups.com
I mean:
dosbox wasm size is 1.4Mb, link time 1-2 min
dosbox-x wasm size is (12Mb: code 7 mb, data 5Mb), link time 12-14 min

both use same asyncify lists file

вт, 22 нояб. 2022 г. в 00:27, Александр Гурьянов <caii...@gmail.com>:

Alon Zakai

unread,
Nov 21, 2022, 8:10:10 PM11/21/22
to emscripte...@googlegroups.com
Ok, looks like all the time is in Asyncify. That's hard to avoid atm, as the pass does a lot of hard work.

It might be possible to optimize it, and there is an issue or two with ideas, like this:


That is worth looking into if someone has time.

In the long term, wasm stack switching should avoid running Asyncify entirely. That can be tested now (-sASYNCIFY=2) but it will be some time before all browsers ship it.

Reply all
Reply to author
Forward
0 new messages