Mapping of JS (Typed) Arrays into the Emscripten heap

1,025 views
Skip to first unread message

Daniel Baulig

unread,
Nov 15, 2013, 3:21:30 PM11/15/13
to emscripte...@googlegroups.com
Emscripten emulates the heap by allocating a large JS typed array and using pointers/memory addresses as indexes into this heap. This works very good and allows for very fast constant time pointer operations. Most JS engines should be able to optimize that code so that the generated runtime assembly looks pretty much exactly like the compiled C code would look like.
However, this direct usage of addresses as array index does have a significant limitation: it does not allow us to map arbitrary linear data into the Emscripten heap. E.g. I have a typed array in JS-land and want to pass it to a function in C-land. I will first need to copy that data into the Emscripten heap before I can get a pointer to it and pass it to the C-function. Of course I can circumvent this by getting a subarray of the Emscripten heap and use this as my typed array in JS land in the beginning. The problem with this is, that one of the most common use-cases for this would be loading a file from disk or a web request. The browser APIs however (e.g. FileReader) do not allow you to specify the target ArrayBuffer/TypedArray to load the file into, but will always allocate a new one. If I load a file using FileReader, there is no way to get a "pointer" to that array buffer that's outside the Emscripten heap. I need to copy the entire file into the Emscripten heap to get a pointer to that files data. It would be nice to be able to map any ArrayBuffer/TypedArray into the Emscripten address space without having to copy the entire buffer. For functions we already have this functionality in the form of Runtime.addFunction. Something similar would be nice to have especially for IO intensive applications. It would require any access to the Emscripten heap to be wrapped in a function call that checks if the address is a virtual address that is mapped to a buffer outside the Emscripten heap. This would most certainly cause a performance regression regarding memory access speed. I believe some applications could still benefit from this. Of course it could be controlled by a compiler flag and be switched off by default. What do you guys think?

Alon Zakai

unread,
Nov 15, 2013, 8:52:23 PM11/15/13
to emscripte...@googlegroups.com
Yeah, this is certainly a concern. But what do you mean to "map" binary data in to the emscripten heap? Copy it, or do something more dynamic?

In general, I think the solution is to get web APIs to be able to write into existing typed arrays. For example we should be able to do an XHR and tell it to write into this buffer at that position, and give us a callback when it's done. (I don't know if this has been proposed, but I'll try to find someone to ask.)

- Alon



On Fri, Nov 15, 2013 at 12:21 PM, Daniel Baulig <daniel...@gmx.de> wrote:
Emscripten emulates the heap by allocating a large JS typed array and using pointers/memory addresses as indexes into this heap. This works very good and allows for very fast constant time pointer operations. Most JS engines should be able to optimize that code so that the generated runtime assembly looks pretty much exactly like the compiled C code would look like.
However, this direct usage of addresses as array index does have a significant limitation: it does not allow us to map arbitrary linear data into the Emscripten heap. E.g. I have a typed array in JS-land and want to pass it to a function in C-land. I will first need to copy that data into the Emscripten heap before I can get a pointer to it and pass it to the C-function. Of course I can circumvent this by getting a subarray of the Emscripten heap and use this as my typed array in JS land in the beginning. The problem with this is, that one of the most common use-cases for this would be loading a file from disk or a web request. The browser APIs however (e.g. FileReader) do not allow you to specify the target ArrayBuffer/TypedArray to load the file into, but will always allocate a new one. If I load a file using FileReader, there is no way to get a "pointer" to that array buffer that's outside the Emscripten heap. I need to copy the entire file into the Emscripten heap to get a pointer to that files data. It would be nice to be able to map any ArrayBuffer/TypedArray into the Emscripten address space without having to copy the entire buffer. For functions we already have this functionality in the form of Runtime.addFunction. Something similar would be nice to have especially for IO intensive applications. It would require any access to the Emscripten heap to be wrapped in a function call that checks if the address is a virtual address that is mapped to a buffer outside the Emscripten heap. This would most certainly cause a performance regression regarding memory access speed. I believe some applications could still benefit from this. Of course it could be controlled by a compiler flag and be switched off by default. What do you guys think?

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chad Austin

unread,
Nov 15, 2013, 8:52:57 PM11/15/13
to emscripte...@googlegroups.com
On Fri, Nov 15, 2013 at 5:52 PM, Alon Zakai <alonm...@gmail.com> wrote:
Yeah, this is certainly a concern. But what do you mean to "map" binary data in to the emscripten heap? Copy it, or do something more dynamic?

In general, I think the solution is to get web APIs to be able to write into existing typed arrays. For example we should be able to do an XHR and tell it to write into this buffer at that position, and give us a callback when it's done. (I don't know if this has been proposed, but I'll try to find someone to ask.)

Oh, that's a good idea.  That would be great for a lot of the stuff we're doing at IMVU. 


- Alon



On Fri, Nov 15, 2013 at 12:21 PM, Daniel Baulig <daniel...@gmx.de> wrote:
Emscripten emulates the heap by allocating a large JS typed array and using pointers/memory addresses as indexes into this heap. This works very good and allows for very fast constant time pointer operations. Most JS engines should be able to optimize that code so that the generated runtime assembly looks pretty much exactly like the compiled C code would look like.
However, this direct usage of addresses as array index does have a significant limitation: it does not allow us to map arbitrary linear data into the Emscripten heap. E.g. I have a typed array in JS-land and want to pass it to a function in C-land. I will first need to copy that data into the Emscripten heap before I can get a pointer to it and pass it to the C-function. Of course I can circumvent this by getting a subarray of the Emscripten heap and use this as my typed array in JS land in the beginning. The problem with this is, that one of the most common use-cases for this would be loading a file from disk or a web request. The browser APIs however (e.g. FileReader) do not allow you to specify the target ArrayBuffer/TypedArray to load the file into, but will always allocate a new one. If I load a file using FileReader, there is no way to get a "pointer" to that array buffer that's outside the Emscripten heap. I need to copy the entire file into the Emscripten heap to get a pointer to that files data. It would be nice to be able to map any ArrayBuffer/TypedArray into the Emscripten address space without having to copy the entire buffer. For functions we already have this functionality in the form of Runtime.addFunction. Something similar would be nice to have especially for IO intensive applications. It would require any access to the Emscripten heap to be wrapped in a function call that checks if the address is a virtual address that is mapped to a buffer outside the Emscripten heap. This would most certainly cause a performance regression regarding memory access speed. I believe some applications could still benefit from this. Of course it could be controlled by a compiler flag and be switched off by default. What do you guys think?

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Chad Austin
Technical Director, IMVU

Daniel Baulig

unread,
Nov 15, 2013, 9:10:35 PM11/15/13
to emscripte...@googlegroups.com
I do mean something more dynamic. Something along the lines of Runtime.addFunction. E.g. 
  var array = new Uint8Array(1);
  var pointer = Runtime.addTypedArray(array);
  Module._memset(pointer, 255, array.length);
  if (array[0] === 255) {
    console.log('This array is mapped into the Emscripten memory address space');
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.

Leo Meyerovich

unread,
Nov 16, 2013, 8:44:48 AM11/16/13
to emscripte...@googlegroups.com
I don't know about the implementation, but the proposed API may address how to approximate multithreading scenarios. Workers already support transferring ownerships, so the next step is to get emscripten+asm to support zero-copy transfers, similar to FFIs in other languages. Dynamic buffer import/export seems like a path that breaks the web less than changing all web APIs or introducing true multithreading ;-)

Alon Zakai

unread,
Nov 16, 2013, 8:14:04 PM11/16/13
to emscripte...@googlegroups.com
Ok, this would set up the array, what would the code look like to use it? When LLVM IR has a load from a pointer, would we need to check which "memory space" the pointer is in, then issue a read from the right one? So instead of

x = HEAP[p];

we would have

x = getHeap(p)[p];

where getHeap returns either HEAP or a typed array set up by Runtime.addTypedArray?

- Alon




To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Daniel Baulig

unread,
Nov 20, 2013, 7:47:10 PM11/20/13
to emscripte...@googlegroups.com
Yes, something like that. You would also need to offset p to start at an absolute 0 though. The returned heap will have it's first byte at index 0 and not at p obviously.

To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Alon Zakai

unread,
Nov 21, 2013, 1:08:49 PM11/21/13
to emscripte...@googlegroups.com
Ok, I see. Yes, this could help out some use cases, but I suspect the performance hit would be severe (maybe 10x in memory-intensive code).

- Alon



To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Daniel Baulig

unread,
Nov 21, 2013, 8:08:16 PM11/21/13
to emscripte...@googlegroups.com
Yeah, maybe it's not worth it. Just something that came to my mind when I was writing code that copies an entire file that I was just handed by FileReader into the Emscripten heap.

Alon Zakai

unread,
Nov 21, 2013, 8:35:32 PM11/21/13
to emscripte...@googlegroups.com
I do agree it's important to try to solve this though. But I am worried about changing every single memory read and write. Some other options might be to use LLVM memory spaces (not sure if that's the right term, but something like it), or something explicit in C++, like an array-like object where we overload [] to access memory from somewhere outside the normal heap. So something like

  ExternalArray ex(EM_ASM(  some js to get info from somewhere, return a typed array   ));
  printf("hello from outside the normal world %d\n", ex[5]); // loads data from that external typed array

This would be very easy to do, the question is whether it would solve enough of your use case? (For example, you couldn't call memcpy or other library functions on that data - only actually writing ex[index] in your code will produce the right result.)

- Alon



To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.

Chad Austin

unread,
Nov 22, 2013, 12:02:22 PM11/22/13
to emscripte...@googlegroups.com
LLVM memory spaces sound interesting.

We ran into this problem too.  In particular, we need to copy a typedarray response from XMLHttpRequest into the Emscripten heap.  For now we simply do the byte-by-byte copy, though a general, fast typed array memcpy would help a lot.  ( https://bugzilla.mozilla.org/show_bug.cgi?id=862249 )

(In particular, embind can pass ArrayBuffers, Uint8Arrays, and Int8Arrays to std::string, which is roughly equivalent to std::vector<char>)

Alon Zakai

unread,
Nov 22, 2013, 12:34:11 PM11/22/13
to emscripte...@googlegroups.com
Typed arrays have the .set() command which functions like memcpy, even among separate heaps - did you check if that is fast?

- Alon

Chad Austin

unread,
Nov 22, 2013, 12:39:14 PM11/22/13
to emscripte...@googlegroups.com
Oops, don't know how I didn't know about that.  :)  Is Emscripten memcpy (or memmove) implemented in terms of TypedArray.set?

Sounds like we could make embind a bit faster in this use case...

Chad Austin

unread,
Nov 22, 2013, 12:46:13 PM11/22/13
to emscripte...@googlegroups.com

Alon Zakai

unread,
Nov 22, 2013, 1:23:57 PM11/22/13
to emscripte...@googlegroups.com
I think our memcpy is optimized for small ranges, but if this is important we could add .set() in there for large copies.

- Alon

tk...@mozilla.com

unread,
Jan 22, 2015, 12:07:04 AM1/22/15
to emscripte...@googlegroups.com
Hi,


I am recently implementing some image processing works on the web with asm.js/emscripten and while dealing with video data, I found the original topic proposed by this post is very important.

Currently, for accessing video data frame-by-frame, we have following two ways:
(1) Draw a HTMLVideoElement into a CanvasRenderingContext2D and then get ImageData from it.
(2) In the coming WebAPI, create ImageBitmap directly from HTMLVideoElement.

However, in either way, the created ImageData/ImageBitmap are not in the asm.js run-time heap.
So, a copy is need for each frame which might cause serious performance problem.

I have tried the performance of TypedArray.set() method.
It gets acceptable performance in the desktop environment, about 1~2 ms to copy a 1920x1080 RGBA data.
However, the .set() method is now very bad in the FirefoxOS Flame reference phone, about ~30ms in the same condition.

Since this topic was posted 1+ year ago, I am wondering if there any further information/possible solution about this topic?
Chad Austin
Technical Director, IMVU

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Chad Austin
Technical Director, IMVU

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.

Jukka Jylänki

unread,
Jan 22, 2015, 5:51:38 AM1/22/15
to emscripte...@googlegroups.com
Would it be viable to use WebGL textures with texImage2D to source the video data to a WebGL texture, and then use glReadPixels to read the data directly from GPU back to the CPU?

To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages