memmove / bulk memory operations proposal status

39 views
Skip to first unread message

Lilit Darbinyan

unread,
Jul 12, 2019, 12:01:45 PM7/12/19
to emscripten-discuss
I have a benchmark where I insert into the front of a vector in a loop 1k times, which causes the vector to grow continuously. 

This was pretty slow with the Fastcomp generated Wasm binary (~80ms), and sure enough profiling showed that the hot path was memmove. 
I have now switched to the new LLVM upstream backend, and it's much faster now but not as fast as I expect (~20ms), and memmove is still showing as the most time consuming thing. 

I have inspected the generated wasm binary and don't see any of the new bulk memory operations there, so my questions are:

- Does the new LLVM upstream backend support bulk memory operations?
- If not, then why am I seeing this speedup by switching to the LLVM backend? 

The benchmark code can be found here: https://github.com/ldarbi/wasm-scratchpad/tree/master/memmove

Thomas Lively

unread,
Jul 12, 2019, 12:12:45 PM7/12/19
to emscripte...@googlegroups.com
A memory.copy instruction should be emitted if you pass -mbulk-memory while using the LLVM backend. I’d be very interested in how that affects your benchmark.
--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/eb5441fc-02d2-4894-8559-6d7ea5ea1e61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Lilit Darbinyan

unread,
Jul 12, 2019, 12:38:26 PM7/12/19
to emscripten-discuss
oops typo - it's twice as slow WITH the flag. 

On Friday, July 12, 2019 at 5:37:52 PM UTC+1, Lilit Darbinyan wrote:
Thanks!

Interestingly it's twice as slow without the flag. It's now 40ms instead of the 20ms. To summarize:

Fastcomp                  80ms
LLVM                      20ms
LLVM with -mbulk-memory   40ms


The profiler now shows this as the hot path instead of memmove:

std::__2::vector<double, std::__2::allocator<double> >::insert(std::__2::__wrap_iter<double const*>, unsigned long, double const&)

Alon Zakai

unread,
Jul 12, 2019, 6:34:11 PM7/12/19
to emscripte...@googlegroups.com
Interesting, yes, I also see the bulk memory path as slower. By just a little, though, after I changed the benchmark to run 10x more iterations in the html and added -O3 to the emcc command (hoping to reduce noise that way, and enable maximum opts). I see a 2-5% slowdown.

Talking to tlively, the difference might be that with bulk memory we depend on the browser for doing memmove etc., and perhaps the implementation there is not yet as optimized as what the toolchain was emitting inside the wasm (which was heavily optimized over time). But we're not sure.

The wasm binary is around 2% smaller with bulk memory though, which is nice (20,839 bytes).


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Thomas Lively

unread,
Jul 16, 2019, 9:31:47 AM7/16/19
to emscripte...@googlegroups.com
Hmm yeah I was worried that might happen. We’re going to have to do some science to figure out where it is actually beneficial to use the bulk memory instructions. Alternatively we can just pester engine implementers to optimize them better. 
--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages