Recommended practice for transfer of C++ large arrays from web worker to JavaScript in an HTML

863 views
Skip to first unread message

Sohail Siadat

unread,
Jun 17, 2016, 6:26:34 AM6/17/16
to emscripten-discuss
Efficient transfer of large arrays with an Emscripten C++ Web Worker: which JavaScript design is better?

I have trouble deciding between the following three designs. My code is already working well on web in HTML JavaScript using a core algorithm implemented in C++, but I want to turn it into a web-worker, because it can be a time consuming process, so I don't want to block the 3D Designer's function and UI.


I have an Emscripten C++ algorithm. Which design is more efficient to transfer large data to a JavaScript program? Since a web worker does clone() and serialise, to transfer through the web worker message system, there is some overhead here. Also some code is needed to translate the resulting data on the C++ side, from HEAP32 into JavaScript arrays ( C -> JS ).

By efficient, I mean which design is faster, i.e. which design leads to triggering less new and gc()(constructing and destructing JS objects). My Web Worker uses a core function written in C++which returns large arrays (two arrays of float[V][3] and int[N][3] with N=V=10000. It will be used to update a ThreeJS Geometry, and will be called tens of thousands of times over a long period on a web page. Apart being slow, this also may cause the browser to slow down, freeze or crash.

Solution 1:
To write a Web Worker using JS which imports a JS code compiles using Emscripten. Cons: This option seems not possible, as the web-worker side needs to import the compiles JS file. Data exchange: C++ -> JS -> message(serialise) -> JS. Design: (C++)JS <-WW-> JSFiles: core_mc_algorithm.cpp, worker.js, main.js .

  1. Solution 2:
    Use a C++ Web Worker compiled using -s BUILD_AS_WORKER=1, write some other C++ code on the main side that received the data, and convert the results from HEAP to JS on the main side: (WebWorker data traser handled by Emscripten): Pros: efficient transfer, but required two conversions. Risk: on C++ side, it requires multiple copying from vector to array, etc. Data exchange: C++ -> message(serialise) -> C++ -> JS, Design: (C++) <-WW-> C++(JS) . Files: worker.cpp, main.cpp, main.js .
  2. Solution 3:
    Again a C++ Web Worker, but the web worker function are contacted directly by the main program which is in JavaScript. Pros: The conversions/exchanges are not done twice. There is no separate exchange between C++ and JS, this conversion is done at the same time with WW serialisation. Risks: The decoding may be difficult and messy (the protocol will need to be reimplemented, which itself requires multiple conversions, which may not be very efficient). Also, the exchange may be not actually efficient and may not actually improve performance. Data exchange: C++ -> message(serialise) -> JS, Design: (C++) <-WW-> JSFiles: worker.cpp, main.js .

I have a function like this in C++, I want to run it as a Web Worker (this is not the exact prototype, just as an example.):

void produce_object ( REAL* verts_output, int number_of_vertices, int* faces_output, int number_of_triangles ) { // Run Marching cubes, which produces a vector<int> and a vector<float>. // fills in the arrays verts_output[] with coordinates (size: 3*number_of_vertices), // fill in faces_output[] with triangle vertex indices (size: 3*number_of_triangles ), using some numerical code which includes the Marching Cubes algorithm. }

I need the following JavaScript callback function to get called with the right parameters. It is defined in an HTML file:

function update_mesh_geometry_callback (verts, faces) { /* verts and faces are of type Float32Array and Int32Array of size (3*N) and (3*V). In this function they are used to create the following object, which is added to the scene.*/ var geo = new THREE.Geometry(verts, faces); // a subclass scene.add(new THREE.Mesh(gro, mat, etc)); }

Typical size at least: number_of_vertices == 90000 = N, number_of_triangles == 8000 = V.


Thanks,
Sohail

Brion Vibber

unread,
Jun 17, 2016, 11:53:42 AM6/17/16
to emscripten Mailing List

I have a comparable setup in the ogv.js media player:

On the main thread I have a JS front-end and an emscripten C module for the demuxer, which extracts packets of compressed data to be sent to Workers with additional emscripten C modules which decode the data and send back uncompressed video or audio to be handled by JS in the main thread (WebGL and Web Audio used directly).

I think this maps to scenario 3 in your mail.

It should be fairly easy to send a Float32Array and an Int32Array through worker messages; since they're backed by a buffer, it's mostly the cost of copying that backing buffer. (You can also send an ArrayBuffer directly and wrap it into a typed array on the other end. Shouldn't be much difference in performance.)

Main thing to watch out for: make sure you are not accidentally sending the entire emscripten heap buffer! If you extracted live heap views, then the copy may be very slow as it tries to copy the entire 16M or larger backing buffer. If you have extracted fresh buffers containing only the bytes needed, then they should copy cleanly.

If you have live buffers from the interface you're using, you can copy them using the copy constructor of the appropriate typed array:

  var newArr = new Float32Array(extractedArr);

and then send that copy instead of the original that referenced all of the heap.

You can also optimize the postMessage data transfer by using the 'transferList' parameter, putting the buffer property of each typed array in the list. This will avoid an extra copy of the smaller backing buffer, as long as you no longer need the array in the worker.

https://developer.mozilla.org/en-US/docs/Web/API/Worker/postMessage

-- brion

> --
> You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sohail Siadat

unread,
Nov 4, 2016, 12:51:46 PM11/4/16
to emscripte...@googlegroups.com
Thank you. I started to implement based on your suggestion.

I compiled my C++ file using -s BUILD_AS_WORKER=1. On the JS side, I chose the right function on the C++ through a dictionary:
var worker = new Worker('compiled.js');
worker.postMessage({ funcName: "c_func", callbackId: -1 /* id used when a result is posted back */, data: 0,})

However, I couldn't find out how I can send multiple arguments to the C/C++ function. At the moment, only a function with a single argument works:
void c_func(char*, int);

If this having a single argument is mandatory, I need to encode my list of input arguments into a U8 byte array. If so, what is the recommended encoding? My arguments include String, Json, and Uint32Array and Float32Array.

I could not find the answer by looking at https://github.com/brion/ogv.js


--
sohale

> To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.


> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/atlPQCtAJFc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-discuss+unsub...@googlegroups.com.

Brion Vibber

unread,
Nov 4, 2016, 3:50:28 PM11/4/16
to emscripten Mailing List
You should be able to pass arguments as an array to support multiples.

For calling a function with an arbitrary-length array of parameters on the worker end, try the Function.apply method:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/apply

  var func = mapOfFunctions[funcName];
  func.apply(Module, args); // Module is the 'this' param

-- brion



> To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsubscribe@googlegroups.com.


> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/atlPQCtAJFc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-discuss+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sohail Siadat

unread,
Dec 1, 2016, 12:39:22 PM12/1/16
to emscripten-discuss
My question was not clear previously. I clarified and posted it here:
http://stackoverflow.com/questions/40916582/how-to-interact-with-an-emscripten-web-worker-directly-from-a-javascript-front

I am looking for a solution to efficiently communicate with an Emscripten-compiled Web Worker from a native JavaScript.

Jukka Jylänki

unread,
Dec 3, 2016, 8:31:45 PM12/3/16
to emscripte...@googlegroups.com
With the upcoming SharedArrayBuffer specification, https://tc39.github.io/ecmascript_sharedmem/shmem.html, there will be a very efficient no-copy way of communicating the data between workers by allowing multiple workers to access the same typed array simultaneously. The current Emscripten pthreads support ("-s USE_PTHREADS=1" build mode) utilizes this spec. However before that one lands to browsers, the best thing one can hope for is to postMessage() the data over.

In your StackOverflow message, I think your code is already doing what you want with the postMessage(), except that I think we don't currently have a strong model of how JS code in main thread would call a C++ function in a worker via a postMessage(). I'd recommend having a piece of JS code in the worker that receives the message from the main thread, and that one would extract the data and call the C++ function in question you intend to receive the final function call.

Reply all
Reply to author
Forward
0 new messages