Data saize effect on performance (and are there JS vs. Emscripten vs. native C++ benchmarks)?

Boris Sergeev

unread,

Feb 18, 2017, 9:50:52 PM2/18/17

to emscripten-discuss

Hi to all Emscripters!

;-)

I am looking for some performance tests comparing manually optimized JavaScript code vs. emscripted C++ vs. native C++.

Some people in my company are dismissing Emscripten as too slow and I want to identified areas where it indeed is and where it is as fast as native.

Some time ago, I found somebody's simple C++ code generating Pi to any number of digits. This code doesn't have a lot of data, but does a lot of number crunching on some limited data. To my surprise, emscripted code showed the same performance, within the error of measurement, as the native code (built on different platforms with different compilers).

Then, while working on a multi-platform CAD application, which is built for both desktop platforms and Web, I've observed emscripted code being 3...5 times slower than the native code, when operating on models taking up to 1 GB of memory. I used Chrome profiler and was pleased to see that copying JS Typed Arrays to Emscripten heap was relatively fast--this code took less than a second to copy(?) hundreds of megabytes of Uint8Array:

var length = javascriptArray.length;

var geometry = Module._malloc(length);

var emscrHeap = new Uint8Array(Module.HEAPU8.buffer, geometry, length);

emscrHeap.set(new Uint8Array(javascriptArray.buffer));

But the, originally C++ and then emscripted, code performing geometry manipulations on these large models is several times slower than native.

Where is the source of this slowness, accessing JS TypedArrays is so much slower than operating with C pointers and C++ STL containers?

Thank you,

Boris

Alon Zakai

unread,

Feb 20, 2017, 6:34:05 PM2/20/17

to emscripten-discuss

There is the emscripten benchmark suite which you can run with

python tests/runner.py benchmark

It's part of the test suite, so you can run an individual benchmark using, for example,

python tests/runner.py benchmark.test_fasta_float

Example output:

Running Emscripten benchmarks... [ including compilation | Mon Feb 20 15:28:26 2017 | em: commit bf47896da7caa7b92e6e67c47f23779a8c7ac36d | sm: changeset:   343850:d0462b0948e0 | llvm: /home/alon/Dev/fastcomp/build/bin ]
test_fasta_float (test_benchmark.benchmark) ...
        clang: mean: 10.887 (+-0.005) secs median: 10.884 range: 10.882-10.894 (noise: 0.044%) (3 runs)
     sm-asmjs: mean: 13.050 (+-0.018) secs median: 13.039 range: 13.031-13.074 (noise: 0.137%) (3 runs)   Relative: 1.20 X slower
      sm-simd: mean: 13.864 (+-0.015) secs median: 13.856 range: 13.844-13.880 (noise: 0.107%) (3 runs)   Relative: 1.27 X slower
      sm-wasm: mean: 11.675 (+-0.003) secs median: 11.674 range: 11.672-11.678 (noise: 0.024%) (3 runs)   Relative: 1.07 X slower
ok

That shows asm.js is around 20% slower than native clang, and wasm just 7% slower, on SpiderMonkey.

See the contents of tests/test_benchmark.py, you can decide what engines to benchmark by editing the top of the file.

Overall, there is some overhead in sandboxing in asm.js and wasm, and there are some limitations in our toolchain, but speed is pretty close to native, on JS engines that optimize asm.js and wasm. It's as close to native speed as, say, gcc and clang are close to each other (i.e., often there is some 50% difference). But there are things still in progress like threading and SIMD, which you can experiment with now, but until all browsers support, native can be much faster thanks to them, on some specific types of benchmarks.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Floh

unread,

Feb 21, 2017, 4:21:06 AM2/21/17

to emscripten-discuss

Mostly anecdotal evidence instead of hard numbers but:

Some of my Oryol samples are doing fairly expensive CPU stuff (e.g. the Bullet physics demos):

http://floooh.github.io/oryol-samples/

And also this emulator, which is basically all CPU integer bit twiddling:

http://floooh.github.io/virtualkc/

On the emulator you can press Tab to see the 'emulator frame time' in the upper right corner which gives you a good idea about pure asm.js performance differences between browsers and platforms.

All these demos can be compiled as native versions as well to compare performance, and at least on desktop browsers I'm not seeing much difference to the native version (worst case about 1.5x slower).

On mobile CPU performance is a bit more problematic, especially on iOS Safari.

I'm trying to write my code in a more 'embedded systems style', minimize dynamic memory allocation and pointer chasing, no 'advanced C++ features' like exceptions.

There may be corner cases where asm.js code performs much worse then native, especially on browsers which don't AOT-compile the code, and may re-compile functions during runtime, but so I didn't see really catastrophic performance behaviour so far.