Marshalling Data From C++ to JS Efficiently

1,868 views
Skip to first unread message

Matthew Tamayo

unread,
Aug 15, 2015, 2:22:47 AM8/15/15
to emscripten-discuss
I've been looking around Emscripten to try and figure out how to marshall large binary objects from C++.

We want to:
  1. pass binaryArray = (unsigned *)&bigObject as a Uint8Array
  2. Use XMLHttpRequest2 to send Uint8Array to server
  3. Read in binary data on server (BigObject*) binaryArray in C++
The problem we're having is figuring out the most efficient way to get a Uint8Array of emscripten using embind. I've seen some threads that recommend emscripten::memory_view(output_size, output_ptr), but I don't know if it requires passing in a JS function via emscripten::val or if returning a memory view will get correctly marshalled.

Does this seem like sane approach? Any easier ways in the most recent version of emscripten?

-mtr

Brion Vibber

unread,
Aug 15, 2015, 5:53:17 AM8/15/15
to emscripten Mailing List
If you declare your C++ function to return an emscripten::val, a memory view should correctly marshal out and return a Uint8Array (or whichever type you selected).

Here's an example in a getter on a value object, but it should work fine on regular object methods too:


#include <emscripten/bind.h>
using namespace emscripten;

...

struct H264Plane {
        unsigned char *data;
        int stride;
        int height;
...
        val data_getter() const;
        void data_setter(val v);
};

...

val H264Plane::data_getter() const
{
        return val(memory_view<unsigned char>(stride * height, (unsigned char *)data));
}

...

EMSCRIPTEN_BINDINGS(h264_decoder) {

        value_object<H264Plane>("H264Plane")
                .field("data", &H264Plane::data_getter, &H264Plane::data_setter)
...
                ;
...
}

-- brion


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mat...@kryptnostic.com

unread,
Aug 18, 2015, 10:17:40 PM8/18/15
to emscripte...@googlegroups.com
Thanks Brion! This is super useful.

-mtr

From: Brion Vibber
Sent: ‎8/‎15/‎2015 5:53 AM
To: emscripten Mailing List
Subject: Re: Marshalling Data From C++ to JS Efficiently

You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/l8GkOxZ79Ks/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

Peng Hui How

unread,
Aug 21, 2015, 7:17:42 PM8/21/15
to emscripten-discuss, ro...@kryptnostic.com, mat...@kryptnostic.com, ry...@kryptnostic.com, dr...@kryptnostic.com, pe...@kryptnostic.com
Hi Brion,

We are trying to retrieve the C++ Data Type that were previously converted into an emscripten val.
That is, suppose we had an instance of class X in C++, convert it into emscripten val using the method you described previously,
how do we retrieve that instance in class X in C++?

It would be super help if you can look into this.

Best,
Peng Hui
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/l8GkOxZ79Ks/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-discuss+unsub...@googlegroups.com.

Peng Hui How

unread,
Aug 21, 2015, 7:21:58 PM8/21/15
to emscripten-discuss, ro...@kryptnostic.com, mat...@kryptnostic.com, ry...@kryptnostic.com, dr...@kryptnostic.com, pe...@kryptnostic.com
Hi Brion,

Please ignore the previous message as it was full of typos.

The question is: 
Suppose we have an instance of class X in C++, converted it into emscripten val using the method you described previously,
how do we retrieve it as an instance of class X in C++?

It would be super helpful if you can look into this. Thanks a lot!

Best,
Peng Hui

On Tuesday, August 18, 2015 at 7:17:40 PM UTC-7, Matthew Tamayo wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/l8GkOxZ79Ks/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-discuss+unsub...@googlegroups.com.

Brion Vibber

unread,
Aug 21, 2015, 8:44:48 PM8/21/15
to emscripten Mailing List
Bindings for a class instance should typically use a class or value object binding, not a raw memory view -- that mainly makes sense for binary buffers.

If you really need to pass raw object instance variable contents as typed array views into JavaScript and then send them back... well it probably works to receive the array in your binding as a std::string (there's a standard incoming mapping from Uint8Array to std::string), then get the string's data pointer and reinterpret_cast<> it.

-- brion


To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/l8GkOxZ79Ks/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Matthew Tamayo

unread,
Aug 22, 2015, 6:16:06 PM8/22/15
to emscripte...@googlegroups.com
Thanks, Brion! 

To add a little more context we have POD types that are contiguous blocks of memory representing multivariate polynomial functions. On the server side we are martially the raw byte buffers through Java into JNI back into native C++ data type.  

We're thinking the general flow should be:
  1. Grab binary buffer view of POD data type through a memory view or some kind of object binding.
  2. Post it as an arraybuffer type using XMLHttpRequest2 to a Java web service
  3. Rehydrate on the server by taking byte[] -> jbyteArray (JNI) -> jByte * -> reinterpret_cast -> POD * 
For security reasons we have validations on the size of the byte arrays server side. It's a self-describing data structure and is generated by filling with random data in the first place so as long size is correct we are safe.

These objects can get pretty big (100K -> 7 MB) so having zero cost serialization of the binary form would be ideal.

Given this additional context is memory_view the correct approach?

-mtr

Brion Vibber

unread,
Aug 22, 2015, 7:41:53 PM8/22/15
to emscripten Mailing List
Ok yeah, that actually make sense. :)


Two warnings about using memory views here:

1) As with sending a raw pointer, lifetime management is left up to you. Make sure the underlying object doesn't change or get deallocated before you've used the array contents, or your data may be corrupted.

2) Beware of the difference between *typed arrays* and *array buffers* -- a typed array is always a view into an ArrayBuffer object, and when returning data as a memory_view that means you've got a Uint8Array (or other typed array subtype) that is associated with an ArrayBuffer that is the entire, live emscripten heap.

So if the API you're passing data to accepts a Uint8Array directly, memory_view is perfect. This is good for, say, uploading WebGL textures. However if your target API actually takes an ArrayBuffer as a single data chunk, then you may need to manually slice out a copy of that portion of the buffer before you can pass it in...

Documentation on MDN seems to indicate that XHR.send() should accept a typed array view directly <https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest#send%28%29> but the compatibility table is a bit spotty. Double-check that it works in your target browsers. :)


If it does turns out that you need to copy into a separate ArrayBuffer, you can do that on the JS side something like this:

  var arr = myobj.myfunc();
  var buf = arr.buffer.slice(arr.byteOffset, arr.byteOffset + arr.byteLength);
  xhr.send(buf);

Should be able to do the equivalent via emscripten::val directly in your binding function as well, though it's probably a bit verbose...

-- brion

Matthew Tamayo

unread,
Aug 24, 2015, 2:11:41 PM8/24/15
to emscripte...@googlegroups.com
Sorry Brion, one more question! 

What's the easiest way to go in the reverse direction? That is if we receive a UInt8Array from the server, what's the easiest way to pass that into a C++ call? The conversion back is ideally just a reinterpret_cast of an unsigned char * to the correct C++ type.

-mtr

On Sat, Aug 22, 2015 at 4:41 PM, Brion Vibber <br...@pobox.com> wrote:
Ok yeah, that actually make sense. :)


Two warnings about using memory views here:

1) As with sending a raw pointer, lifetime management is left up to you. Make sure the underlying object doesn't change or get deallocated before you've used the array contents, or your data may be corrupted.

2) Beware of the difference between *typed arrays* and *array buffers* -- a typed array is always a view into an ArrayBuffer object, and when returning data as a memory_view that means you've got a Uint8Array (or other typed array subtype) that is associated with an ArrayBuffer that is the entire, live emscripten heap.

So if the API you're passing data to accepts a Uint8Array directly, memory_view is perfect. This is good for, say, uploading WebGL textures. However if your target API actually takes an ArrayBuffer as a single data chunk, then you may need to manually slice out a copy of that portion of the buffer before you can pass it in...

Documentation on MDN seems to indicate that XHR.send() should accept a typed array view directly <https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest#send%28%29> but the compatibility table is a bit spotty. Double-check that it works in your target browsers. :)


If it does turns out that you need to copy into a separate ArrayBuffer, you can do that on the JS side something like this:

  var arr = myobj.myfunc();
  var buf = arr.buffer.slice(arr.byteOffset, arr.byteOffset + arr.byteLength);
  xhr.send(buf);

Should be able to do the equivalent via emscripten::val directly in your binding function as well, though it's probably a bit verbose...

-- brion
On Sat, Aug 22, 2015 at 3:16 PM, Matthew Tamayo <mat...@kryptnostic.com> wrote:
Thanks, Brion! 

To add a little more context we have POD types that are contiguous blocks of memory representing multivariate polynomial functions. On the server side we are martially the raw byte buffers through Java into JNI back into native C++ data type.  

We're thinking the general flow should be:
  1. Grab binary buffer view of POD data type through a memory view or some kind of object binding.
  2. Post it as an arraybuffer type using XMLHttpRequest2 to a Java web service
  3. Rehydrate on the server by taking byte[] -> jbyteArray (JNI) -> jByte * -> reinterpret_cast -> POD * 
For security reasons we have validations on the size of the byte arrays server side. It's a self-describing data structure and is generated by filling with random data in the first place so as long size is correct we are safe.

These objects can get pretty big (100K -> 7 MB) so having zero cost serialization of the binary form would be ideal.

Given this additional context is memory_view the correct approach?

-mtr

Brion Vibber

unread,
Aug 24, 2015, 2:41:27 PM8/24/15
to emscripten Mailing List
When using embind, the simplest way to accept an incoming Uint8Array is to declare a binding function that takes a std::string -- the standard type conversions in the bindings will take care of importing the buffer into the emscripten heap space and you'll get a std::string instance that you can get a pointer out of, then do whatever casts are needed to get your C++ object.

I think this will allocate the std::string on the stack... not sure offhand if the actual buffer is inline or separately heap-allocated; should only make a difference if you need to keep the rehydrated object around after your function exits.


If you do need to retain the object in the emscripten heap after your function exits and want to minimize data copies, you may wish to control the binding more tightly.

I've tended to do this more with manual JS bindings wrapping C functions:
* allocate a heap buffer by calling Module._malloc() or other appropriate allocator
* copy the data in; you can do this manually by calling Module.HEAPU8.set() or through a nice wrapper Module.writeArrayToMemory(): https://kripken.github.io/emscripten-site/docs/api_reference/preamble.js.html#writeArrayToMemory
* pass the pointer in to the C++ function and let C++ manage the pointer lifetime & deallocation

Should be able to do roughly the same purely within embind by using emscripten::val calls.

-- brion

Matthew Tamayo

unread,
Aug 24, 2015, 2:53:09 PM8/24/15
to emscripte...@googlegroups.com
Are there any gotchas with a std::string expecting something that is null terminated?

-mtr

On Mon, Aug 24, 2015 at 11:41 AM, Brion Vibber <br...@pobox.com> wrote:
When using embind, the simplest way to accept an incoming Uint8Array is to declare a binding function that takes a std::string -- the standard type conversions in the bindings will take care of importing the buffer into the emscripten heap space and you'll get a std::string instance that you can get a pointer out of, then do whatever casts are needed to get your C++ object.

I think this will allocate the std::string on the stack... not sure offhand if the actual buffer is inline or separately heap-allocated; should only make a difference if you need to keep the rehydrated object around after your function exits.


If you do need to retain the object in the emscripten heap after your function exits and want to minimize data copies, you may wish to control the binding more tightly.

I've tended to do this more with manual JS bindings wrapping C functions:
* allocate a heap buffer by calling Module._malloc() or other appropriate allocator
* copy the data in; you can do this manually by calling Module.HEAPU8.set() or through a nice wrapper Module.writeArrayToMemory(): https://kripken.github.io/emscripten-site/docs/api_reference/preamble.js.html#writeArrayToMemory
* pass the pointer in to the C++ function and let C++ manage the pointer lifetime & deallocation

Should be able to do roughly the same purely within embind by using emscripten::val calls.

-- brion

On Mon, Aug 24, 2015 at 11:11 AM, Matthew Tamayo <mat...@kryptnostic.com> wrote:
Sorry Brion, one more question! 

What's the easiest way to go in the reverse direction? That is if we receive a UInt8Array from the server, what's the easiest way to pass that into a C++ call? The conversion back is ideally just a reinterpret_cast of an unsigned char * to the correct C++ type.

-mtr

Brion Vibber

unread,
Aug 24, 2015, 3:27:28 PM8/24/15
to emscripten Mailing List

Just don't try to get a C string out of it. :) Use the data() method not c_str() to get a pointer.

-- brion

Reply all
Reply to author
Forward
0 new messages