Passing TypedArrays to C/C++ Functions

Adi Shavit

unread,

Feb 27, 2017, 6:03:41 AM2/27/17

to emscripten-discuss

Hi,

I've been trying to find the most efficient way to pass JS TypeArrays to C/C++.

There are several ways to do this described in different places and I wanted to sort out the fastest one.

Many of the ways seem to incur some hidden copying that is difficult to find.

I wrote several approaches to the problem based on various sources and put them all together here (live demo in dev console here).

Here's a sample run (slightly formatted for discussion):

01: C sum: 630 ms
02: C sum, buffer: 174 ms
03: C sum_c: 171 ms

04: sum_str_cpy: 2701 ms
05: sum_str_ref: 2829 ms
06: sum_str_cpy, buffer: 2764 ms
07: sum_str_ref, buffer: 2693 ms

08: C sum heapBytes: 2117 ms
09: C sum heapBytes.buffer: 173 ms
10: C sum_c heapBytes.buffer: 171 ms

The results are quite interesting and not all make complete sense.

First, there is a silent pessimization, when passing a TypedArray instead of its .buffer (see 1 vs. 2 above (also 8 vs. 9));
Using std::string is much much more expensive than using char* (see 2-3 vs 4-7 above)
There is no difference between using a std::string copy or reference. Surprising to me (see 4-7 above).
There is no significant difference between using a JS Uint8Array and a heap allocated array!! (see 2-3 vs 9-10 above)
This was the most surprising to me as Alon explained that only heap allocated buffers can be passed w/o copying.
Either I am incurring a copy in both cases due to faulty usage in my examples (I would appreciate seeing the corrected code), or that regular JS typed-buffers are passed as-is without being copied (this would be great!).

Results are pretty much consistent across browsers and OSs.

Does this make sense?

Am I doing things wrong?

Thoughts?

Thanks,

Adi

Alon Zakai

unread,

Feb 27, 2017, 2:29:34 PM2/27/17

to emscripten-discuss

That 4-8 are slower is due to embind overhead. Embind is expressive and convenient, but does add a bunch of layers.

Passing an ArrayBuffer to ccall shouldn't work - it expects an array when you tell it the type is 'array'. The array can be a JS array or an 8-byte typed array, see http://kripken.github.io/emscripten-site/docs/api_reference/preamble.js.html#ccall . So unless I am missing something when I read your code, I think in those cases it isn't doing the work you expect it to, so it appears fast. This appears to explain away 2, 3, 9, 10.

8 copies the data into the heap manually, but then ccall also does a copy - it doens't know that the typed array it is given happens to look into the heap. It's odd this is slower than 1 - perhaps the aliasing makes the JIT less efficient? Or perhaps the malloc put the data in a less efficient position that is not in the cache?

What would be faster is to do one copy, then pass the pointer into the asm.js function. Specifically, take ptr from _arrayToHeap, and pass that as the first param to Module._sum (i.e. avoiding embind and ccall) - then you just pass an integer into a function and it then does the work, with no copying. But you do still have the initial copy there.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adi Shavit

unread,

Feb 27, 2017, 3:18:31 PM2/27/17

to emscripte...@googlegroups.com

That 4-8 are slower is due to embind overhead. Embind is expressive and convenient, but does add a bunch of layers.

OK. This is good to know, but the overhead does not seem fixed. It increases as I increase the buffer sizes, so I suspect there are multiple layers of copying.

Passing an ArrayBuffer to ccall shouldn't work - it expects an array when you tell it the type is 'array'. The array can be a JS array or an 8-byte typed array, see http://kripken.github.io/emscripten-site/docs/api_reference/preamble.js.html#ccall . So unless I am missing something when I read your code, I think in those cases it isn't doing the work you expect it to, so it appears fast. This appears to explain away 2, 3, 9, 10.

But strangely it does seem to work, and on all the browsers I tried.

The code sums up the input array values, and all these cases give the correct result. They are also the fastest!

8 copies the data into the heap manually, but then ccall also does a copy - it doens't know that the typed array it is given happens to look into the heap. It's odd this is slower than 1 - perhaps the aliasing makes the JIT less efficient? Or perhaps the malloc put the data in a less efficient position that is not in the cache?

Multiple calls and playing with the order makes 1 and 8 similar as you expected.

I intentionally made the copy once outside the measurement, to NOT take it into account.

What would be faster is to do one copy, then pass the pointer into the asm.js function. Specifically, take ptr from _arrayToHeap, and pass that as the first param to Module._sum (i.e. avoiding embind and ccall) - then you just pass an integer into a function and it then does the work, with no copying. But you do still have the initial copy there.

OK, so I added:

Module._sum(ptr, uint8Array.length);

It works, and takes just as much time as 2,3,9, and 10.

All 5 versions give the correct answer.

Here's a typical run (last column is array element sum, should be 1000001):

01: C sum: 812 ms 1000001
02: C sum, buffer: 62 ms 1000001
03: C sum_c: 62 ms 1000001
04: sum_str_cpy: 978 ms 1000001
05: sum_str_ref: 970 ms 1000001
06: sum_str_cpy, buffer: 977 ms 1000001
07: sum_str_ref, buffer: 982 ms 1000001
08: C sum heapBytes: 760 ms 1000001
09: C sum heapBytes.buffer: 61 ms 1000001
10: C sum_c heapBytes.buffer: 61 ms 1000001
11: direct asm.js call: 68 ms 1000001

Something's very fishy here.

Adi

Alon Zakai

unread,

Feb 27, 2017, 3:26:22 PM2/27/17

to emscripten-discuss

That 2,3,9,10 happen to emit the same output may be due to luck - what I think happens is it ignores the ArrayBuffer entirely, but since we copied in the data from a *previous* run, the right data happens to still be at that location on the stack. So if you corrupt that memory in between, or use different data in each iteration, you'll see it's wrong.

Adi Shavit

unread,

Feb 27, 2017, 3:53:07 PM2/27/17

to emscripten-discuss

That 2,3,9,10 happen to emit the same output may be due to luck - what I think happens is it ignores the ArrayBuffer entirely, but since we copied in the data from a *previous* run, the right data happens to still be at that location on the stack. So if you corrupt that memory in between, or use different data in each iteration, you'll see it's wrong.

You're right. Indeed, the asm.js direct method is the fastest.

Don't you think some runtime type checks would be helpful for all the invalid (and inefficient) cases?

Alon Zakai

unread,

Feb 27, 2017, 5:55:41 PM2/27/17

to emscripten-discuss

Yeah, it would make sense that ccall would check that array inputs have length >= 0, for example, when the ASSERTIONS option is on (that would have caught this error). The code is in src/preamble.js if someone wants to add that.

--

Adi Shavit

unread,

Feb 28, 2017, 2:05:38 AM2/28/17

to emscripte...@googlegroups.com

Thanks. I'll take a look.

To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/CMfYljLWMvY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-discuss+unsub...@googlegroups.com.

Reply all

Reply to author

Forward