Asm.js running too slowly

676 views
Skip to first unread message

john....@gmail.com

unread,
Mar 2, 2017, 11:09:55 AM3/2/17
to emscripten-discuss
Hi,

I have ported c++ program to asm.js using emscripten 1.37
The program is running too slowly - in chrome it is taking 15-30 times slower as compared to native program. Is it normal for c++ programs. I tried both O3 and Oz optimization at llvm & js opts..
Virtual function call takes 1 to 6 msec (and not the actual code) with Oz build - while they take 0 millisec in native build and there are lots of those dyncalls appearing in hotspot. And other code is running slow too.
 I have heard that C program can run at speed of 5-8X slower as compared to native.
Is it possible to increase its speed?

Floh

unread,
Mar 2, 2017, 11:38:52 AM3/2/17
to emscripten-discuss
That's definitely not normal :) At least on Intel CPU asm.js shouldn't be much slower than at most 1.5x. I'm seeing slightly worse perf on ARM, but definitely not that bad.

Is performance behaviour different on Firefox? FF is doing ahead-of-time compilation while Chrome might still hit a situation where it de-optimizes and recompiles code (although I am not sure if this is still the case in recent Chrome versions, but if your code triggers such behaviour it would definitely be worth filing a ticket).

How many lines of code is your program, and how big is the resulting asm.js file?

Is your code making massive use of exceptions, or 64-bit integer math? These are two areas where performance is slower in emscripten (but again, shouldn't be *that* much).

Virtual function calls aren't nice in an inner loop of course (in general, not just asm.js), but 1 to 6 milliseconds per call seems completely strange.

Is there any output on the Javascript console in the browser devtools? For instance, if you are spamming messages to stdout or stderr, execution will be extremely slow.

Is the code public? Would be interesting to have a look what could be causing such bad performance.

Cheers,
-Floh.

john....@gmail.com

unread,
Mar 2, 2017, 12:34:13 PM3/2/17
to emscripten-discuss
Thanks - replies inline


On Thursday, March 2, 2017 at 10:08:52 PM UTC+5:30, Floh wrote:
That's definitely not normal :) At least on Intel CPU asm.js shouldn't be much slower than at most 1.5x. I'm seeing slightly worse perf on ARM, but definitely not that bad.
>> I am using intel quadcore. 

Is performance behaviour different on Firefox? FF is doing ahead-of-time compilation while Chrome might still hit a situation where it de-optimizes and recompiles code (although I am not sure if this is still the case in recent Chrome versions, but if your code triggers such behaviour it would definitely be worth filing a ticket).
>>I see deopts entries in profile (in heap allocation profile in code sections) on chrome.On  Firefox and webassembly it's relatively faster but still more than 9X slower(runtime) than native there. 

How many lines of code is your program, and how big is the resulting asm.js file?
>>The resulting code is ~10MB of asm.js - takes ~0.8 sec in startup . For my profiling and performance comparison, I am starting the profiler at later stage and not including the intial asm.js validation time in perf comparison.

Is your code making massive use of exceptions, or 64-bit integer math? These are two areas where performance is slower in emscripten (but again, shouldn't be *that* much).
>> no 64 bit math. some exception handling is there but again disabling it does not help much.
 
Virtual function calls aren't nice in an inner loop of course (in general, not just asm.js), but 1 to 6 milliseconds per call seems completely strange.
>>Different variants of dyn and invoke_ii and invoke_viii etc (all function tables too) are taking time (top bottleneck in Oz build) - But in the native application - it is always coming up zero; so even increase of 0. 2millisecond adds to overhead.
 
Is there any output on the Javascript console in the browser devtools? For instance, if you are spamming messages to stdout or stderr, execution will be extremely slow.
>>I am not using console or filesystem at all. 
 
Is the code public? Would be interesting to have a look what could be causing such bad performance.
>> Unfortunately the code contains legacy proprietary components. The bottlenecks are widespread - no single function is taking large time. But the slowdown in observed in almost all the functions - iterators, virtual calls, malloc, boost, STL, other normal code. The small milliseconds increase in absolute timings of all the functions as compared to native add up to cause slowdown of this magnitude.

Alon Zakai

unread,
Mar 2, 2017, 3:09:48 PM3/2/17
to emscripten-discuss
Those invoke_* calls are going outside of the compiled code into normal JS (then they call back into dyn*). That could explain the slowness you see. The two main reasons for having invokes are C++ exceptions and setjmp/longjmp. You say you disabled exceptions and didn't see much benefit, so perhaps it's setjmp/longjmp? (The overhead there can often be reduced with some code refactoring, as the main slowness is inside the function doing setjmp, so avoiding hard work there and avoiding inlining into it can help a lot).

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

john....@gmail.com

unread,
Mar 3, 2017, 1:06:49 PM3/3/17
to emscripten-discuss
 
Removing exceptions helped a little on firefox(performance increased by 20%).
But Chrome becomes slower by 5% after disabling exception.
Does Chrome really support asm.js ? Are there any whitepaper on asm.js implementation of chrome (is it same as FTL implementation of Safari) ?

I noticed strange pattern on chrome:
- Chrome takes way too long at startup (in importing asm.js - perhaps parsing of file is taking too much time - since it does not AOT)
- Even after the script is imported and main function has been called; the first major event loop of asm.js application(which does the core processing) is slower by factor of 50 times as compared to native(and 15 times slower as compared to firefox). Perhaps, this means that asm.js has not kicked in at this time and chrome is  compiling functions as and when they are called. So when a function is called for the first time, it needs to be compiled and stored and some extra processing needs to be done - making it 50 times slower.
- Then subsequent event loops are 2 or 3 times slower than firefox (or 10 to 25 times slower than native). This means asm.js has been activated at this stage and it is using code from the cache (which is ofcourse slower than firefox)

If this behavior is not by design, then which mailing list should I discuss about Chrome specific issues?
Also is there a bug logged for deoptimization-and-recompile-asm issue of chrome that Floh mentioned?

Alon Zakai

unread,
Mar 4, 2017, 2:33:59 PM3/4/17
to emscripten-discuss
After removing exceptions, are there any invoke* methods in the profile?

Chrome does have AOT for asm.js now, using the wasm infrastructure. I am not sure if it's on by default or not, check chrome://flags to see.

Rong Jie

unread,
Mar 5, 2017, 3:33:01 AM3/5/17
to emscripten-discuss
Enable chrome://flags/#enable-asm-webassembly in Chrome Canary.

john....@gmail.com

unread,
Mar 5, 2017, 10:50:09 AM3/5/17
to emscripten-discuss
Deopt:
Using node --trace_opt --trace_deopt, I found that there are no bailouts and deopts in asm.js code.
Those functions are getting optimized by Turbofan (rarely by TurboFan OSR) and JS functions outside asm.js are getting optimized by crankshaft.
BUT, they are getting optimized AFTER THE FACT (when they have already been run once or more and taken too much time already
[optimizing 00000104A55818B9 <JS Function yyy (SharedFunctionInfo 0000028F632A2459)> - took 68.801, 0.000, 0.000 ms]
And then optimizing will also take time (in same or other thread ??). Not sure how to interpret the above line - optimizing a single function took 68 ms or the single function took 68ms on a run (using unoptimized code of generic/ complete compiler)

This would perhaps explain 50 times slowdown in first event loop.

This also means that loading / parsing of huge script (event without AOT) is very costly operation in Chrome - as chrome takes a lot of time to execute first line of code (no prejs or significant init code in asm).

Since compilation is slow, webassembly startup will be slow - but atleast browser will take all the load at startup(hopefully using multiple cores in desktop atleast)  and even the first execution of functions will be fast.
So I will try asm to WASM flag and just plain WASM to find startup and runtime cost if it is true. And then see if those can be improved based on results..

And there seems to setjmp in one library function which is causing invoke* even with exception off(so that should further help in firefox) - i will remove that but i guess it will be long way to 2X performance in FF (and 4X performance in chrome) as against native.

Alon Zakai

unread,
Mar 5, 2017, 12:12:09 PM3/5/17
to emscripten-discuss
> And there seems to setjmp in one library function which is causing invoke* even with exception off(so that should further help in firefox) - i will remove that but i guess it will be long way to 2X performance in FF (and 4X performance in chrome) as against native.

If it's still slower than 2x in firefox after removing the invoke* calls and switching to wasm, then that is surprising and you might be hitting a bug. You said the code isn't public, but perhaps you can narrow it down to a testcase you can submit?

--

john....@gmail.com

unread,
Mar 6, 2017, 12:40:40 PM3/6/17
to emscripten-discuss
I will work on getting wasm numbers (with and without allow memory growth option - which can be used now in WASM) after removing setjmp today and will revert with my findings. 

john....@gmail.com

unread,
Mar 8, 2017, 1:22:57 PM3/8/17
to emscripten-discuss
WebAssembly(esp AOT) is magic !! We can change the thread title from "asm js too slow" to "wasm too fast"!!

 "50X slowdown of chrome in asm.js" as compared to native has not turned to "3X slower than native" for chrome in WebAssembly
Here is my account of WebAssembly:

Status on Chrome Canary
WASM is 3X slower than native at runtime (with exceptions disabled).
The performance is predictable for WASM  (for asm.js there is a huge variance - 10X to 50X slower as compared to native)
With exception enabled, it is 4X slower than native(faster than chrome here) - Does it mean "zero cost exceptions" has already been implemented in Chrome for WASM (but not in FF yet) ? Interestingly c++ exceptions did not create much difference in asm.js version of chrome too

Status on FF Nightly
(FF main released yesterday is lagging a little behind latest FF nightly in performance)
WASM is 2X slower than native at runtime (with exceptions disabled)
The performance is predictable for both asm.js and WASM (very less variance)
With exception enabled FF is 6X slower as compared to native (slower than chrome here)

Allow Memory Growth
This was a big area of concern for asm js developers and I tested it with WASM.
There is zero overhead on performance if this option is set (when the application does not try to grow the memory) for both the browsers !! In asm js world, just compiling with this option disabled many optimizations and has adverse impact on performance even if the memory growth never occured (and browsers removed its support) !! But I did not find any affect in latest browser WASM In my tests. 
I did not test thoroughly what happened when "memory growth is forced on the application" but it appeared to be very fast to me. In any case -I would guess if you allocated too much memory you will hit browser limits like aw - snap on trying to allocate too much memory in chrome too quickly without releasing the event loop. Or you may reach 2gb browser limit etc
It would be interesting to know how this problem was addressed ? What are the experience of others in trying this option in WASM ? 

This was all about desktop testing. 
Perhaps I will write my analysis about mobile performance too when I am on it. Should I expect a slowdown in the ratio of jetStream JS benchmark for those devices when compared to native ? Or more than that due to size of application (JetStream test is done with smaller JS files)

Thanks



Thanks

Alon Zakai

unread,
Mar 8, 2017, 4:21:04 PM3/8/17
to emscripten-discuss
Great! Those numbers (2-3x compared to native) are similar to other codebases I've seen.

Regarding performance with exceptions, its slower because of the calls out and in of wasm. It's possible chrome currently optimizes that path better than firefox, so that could explain the 4x vs 6x numbers you saw. (There is also a goal to add exception handling to wasm itself, but it's just at the idea stage so far).

Yes, memory growth should be at 100% speed in wasm, it's a major benefit. Regarding when an allocation fails, emscripten can either do an abort() or let malloc return NULL (in which case your application may be able to handle it), see the ABORTING_MALLOC option in settings.js.

--
Reply all
Reply to author
Forward
0 new messages