web worker API: huge performance difference between Firefox and Chrome

1,256 views
Skip to first unread message

Floh

unread,
Jan 12, 2013, 8:59:37 AM1/12/13
to emscripte...@googlegroups.com
This is probably not emscripten related, but maybe somebody stumbled across this before or knows what's going on.

I just implemented asynchronous asset decompression with the new web worker API. Inside the emscripten_async_wget_data() callback I'm calling emscripten_call_worker() with the received data. The data block is somewhere between a few dozen kByte and at most few hundred kByte, and currently there's no throttling of how many worker requests can be "in flight" on my side.

In the worker, the data is unzipped using zlib, and emscripten_worker_respond() is called with the uncompressed data buffer.

In Firefox, this simply zips through, and gives a nice performance boost, it basically works as intended. In Chrome it is actually worse then without web workers, the frame rate becomes sluggish, and the more requests there are in flight the worse it seems to get, up to completely locking the browser tab.

I'll implement some throttling and see whether this improves things, but this is even noticeable with only a few (10 or so) requests in flight.

Anybody noticed a similar behaviour, or knows what's up with this?

Floh

unread,
Jan 12, 2013, 11:14:21 AM1/12/13
to emscripte...@googlegroups.com
PS: The Chrome JS profiler shows _emscripten_call_worker() as the most expensive call (at the top of the table, with around 20% of all execution time), which in my case is called from xhr.onload (as I wrote, I'm starting the worker request from inside the async_wget_data callback. Could it be the makeHEAPView() function which is unusually expensive on Chrome? Or the reason that the worked postMessage is invoked from the XHR callback? 

Most requests send around 5..50kByte of (compressed) data to the worker, and the (uncompressed) result data is around 20 to 200 kByte per request.

Throttling the number of worker requests in flight didn't help much, frame rate is still extremely sluggish when the worker is called.

There is exactly one worker, and I gave this worker 8 MB of memory.

I'll try to upload my stuff tomorrow to a public location.

Cheers,
-Floh.

Floh

unread,
Jan 12, 2013, 12:07:20 PM1/12/13
to emscripte...@googlegroups.com
PPS: I have updated the public N3 demos with the webworker code, so you can see for youself. For instance when starting this demo in Firefox, notice how you can still adjust camera position while data is loading (...while those placeholder cubes are visible):


In Chrome, trying to adjust the camera while those placeholder cubes are visible is pretty much impossible...

You can switch to other characters / appearances with Cursor Up and Cursor Right, and toggle between animations with Cursor Left.

The demos with "_debug.html" postfix are non-optimized / non-minified so it's (somewhat) possible to debug/profile. The optimized demos (without the _debug) are uptodate as well.

Cheers,
-Floh.

Am Samstag, 12. Januar 2013 14:59:37 UTC+1 schrieb Floh:

Alon Zakai

unread,
Jan 12, 2013, 1:24:27 PM1/12/13
to emscripte...@googlegroups.com
I can reproduce the problem, it's very sluggish on Chrome. Definitely unexpected.

_emscripten_call_worker() basically does nothing, makeHEAPView() at compile time turns into HEAPU8.subarray, which only creates a new view, not a copy. So if that function takes a lot of time in chrome, it must be that postMessage itself is slow.

Chrome implements web workers as separate processes, so it does have the overhead of IPC, which firefox does not. Still, this is slower than I would hope, and I don't know why.

It's likely a bug or limitation in chrome that you're hitting here, I recommend filing a bug on the chromium project with a stable url of unminified code that they can test on. You can file it at http://code.google.com/p/chromium/issues/list ("New issue"). Please post the issue number here if you file so we can track it too.

- azakai

Floh

unread,
Jan 12, 2013, 6:59:40 PM1/12/13
to emscripte...@googlegroups.com
I've uploaded the code to a separate, stable URL (http://n3demos.appspot.com/dsocharviewer_debug.html), and wrote an issue on the chromium bug tracker: 


Any ideas how to workaround the problems are welcome. Doesn't bananabread also do asynchronous stuff in webworkers with relatively big data?

-Floh.

Alon Zakai

unread,
Jan 12, 2013, 7:19:33 PM1/12/13
to emscripte...@googlegroups.com
BananaBread does, and I've noticed that startup is slower in chrome, but it wasn't a big difference so I assumed it was that chrome's js engine was a little slower on that engine (sometimes it's a little faster on a codebase, sometimes a little slower). But it makes sense that this could be the same issue, and maybe BananaBread just has less data than you do so it's less noticeable. How big are the chunks you send over?

Btw, one interesting thing to measure here is to add

  var t = Date.now();

right before the postMessage, and

  console.log('postMessage took ' + (Date.now() - t) + ' ms on size ' + size);

after it, to see how much time is spent in that call. It should be almost instantaneous, even if it does copy the range being sent.

Hmm, thinking about that now, one possible bug could be if chrome copies the entire underlying arraybuffer and not just the view - this might be a way that emscripten programs differ from typical webgl demos, we send small views on a huge buffer. How big did you set TOTAL_MEMORY?

If that theory is correct (likely not, but worth trying) then increasing TOTAL_MEMORY should make the problem worse, that debugging output could help notice that. And this might work around the problem entirely: replace

  'data': data ? HEAPU8.subarray((data),(data + size)) : 0

with

  'data': data ? new Uint8Array(HEAPU8.subarray((data),(data + size))) : 0

(manually copy into a new small typed array).

If that isn't it (just a guess), then hopefully the chrome devs can clarify what's going on.

- azakai

Floh

unread,
Jan 12, 2013, 7:29:45 PM1/12/13
to emscripte...@googlegroups.com
TOTAL_MEMORY for the main app is set to 32 MByte, and for the worker to 8 MB. The chunks which are sent over and back are somewhere between 5kByte and 200kByte. Thanks for your suggestions, I'll try them tomorrow and let you know how it works out.

-Floh.

Floh

unread,
Jan 13, 2013, 6:21:24 AM1/13/13
to emscripte...@googlegroups.com
It is exactly as you suspected!

Here's the time postMessage takes with TOTAL_MEMORY=32MB:

Chrome: 150ms .. 200ms
Firefox: 0ms

And here with TOTAL_MEMORY=256MB:

Chrome: 1000ms..1800ms (everything basically becomes unusable)
Firefox: 0ms

With the new Uint8Array Chrome is also at 0ms (for 256 MB Heap), it still felt a bit sluggish, but I guess  that because I only "patched" the send direction (main -> worker), and not yet the other way around.

I'll update the issue on the Chromium bug tracker accordingly.

Should emscripten use the copy workaround until this is fixed? Also, are there other places where the current behaviour of Chrome could be a problem?

Thanks!
-Floh.

Am Sonntag, 13. Januar 2013 01:19:33 UTC+1 schrieb azakai:

Floh

unread,
Jan 13, 2013, 7:25:26 AM1/13/13
to emscripte...@googlegroups.com
PS: I just went through the other where .subarray is used (mostly in the GL wrapper), and they seem  not affected.

-Floh.

Floh

unread,
Jan 13, 2013, 9:43:34 AM1/13/13
to emscripte...@googlegroups.com
I have updated the N3 demos on with the Chrome postMessage workaround. Especially when the asset data is already in cache, the speedup due to the multithreaded decompression is very noticeable (I'm using 4 web workers now, and up to 64 requests can be in flight now):


And I wrote a little blog post about multithreading in emscripten:


Cheers,
-Floh.

Alon Zakai

unread,
Jan 13, 2013, 1:27:51 PM1/13/13
to emscripte...@googlegroups.com
Heh, that was just a crazy guess ;) I'm surprised it was right.

This might be related to their having workers in another process. They might move the entire buffer to shared memory or something like that, but even that doesn't explain why it is *so* slow. Copying 32MB doesn't take that long.

I wouldn't expect subarray to be a problem anywhere else, in WebGL they likely convert to an OpenGL data format in the same process. Good to confirm that though. Another possible place this might be an issue is in networking, we send messages with subarrays there, but like WebGL I suspect it won't be a problem.

I'll add the workaround to emscripten, I am a little sad though since it adds a small amount of overhead to all other browsers for no reason. Hopefully the chrome devs will fix this soon.

- azakai

Alon Zakai

unread,
Jan 13, 2013, 1:31:05 PM1/13/13
to emscripte...@googlegroups.com
On Sun, Jan 13, 2013 at 10:27 AM, Alon Zakai <alonm...@gmail.com> wrote:

This might be related to their having workers in another process. They might move the entire buffer to shared memory or something like that, but even that doesn't explain why it is *so* slow. Copying 32MB doesn't take that long.

Hmm, one theory might be that when they see it sent to a worker, they don't just copy to shared memory, but also modify the original to use that memory, which requires invaliding all the JIT code that uses that typed array (all hardcoded memory locations are now wrong) - which for an emscripten program is everything. Then each time a message is sent, it would recompile the entire app. This would produce a slowdown similar to what we are seeing in the bad case.

- azakai

Floh

unread,
Jan 13, 2013, 4:44:48 PM1/13/13
to emscripte...@googlegroups.com
Hmm by why would this then scale with the size of the HEAP array, time required for postMessage seems to grow linearly with the HEAP array size. Hopefully the Chrome devs can explain what's going on and find a nice, clean fix.

-Floh.

Alon Zakai

unread,
Jan 13, 2013, 5:00:17 PM1/13/13
to emscripte...@googlegroups.com
Yeah, you're right, this last theory isn't consistent with that finding.

- azakai
Reply all
Reply to author
Forward
0 new messages