Web workers w/ message passing vs. pthreads w/ shared state

Soeren Balko

unread,

Apr 2, 2014, 8:37:29 AM4/2/14

to emscripte...@googlegroups.com

...alright, I know this has been discussed before: Javascript doesn't have threads with shared state and, hence, pthreads are not supported. Web workers do offer concurrency, but rely on asynchronous message passing and can, hence, not be used to simulate pthreads. So I thought.

But here is a number of observations: message passing to/from workers using postMessage are actually synchronous (to some extent) from a _sender_ perspective. Both Firefox and Chrome behave like so: when a postMessage call is performed, the receiving thread (i.e., some worker or the UI thread) will _immediately_ receive the message. That is, the stack does not have to unwind before the message is delivered to the other thread.

How could this help to use Web workers to simulate pthreads w/ shared state?

We could have each thread hold its own copy of HEAP and use postMessages/onmessage to transfer the operation changing the state. In effect, multiple threads could temporarily see different versions of some shared state. Once an operation from another thread is received, it is applied to the local HEAP copy. Is this necessarily incorrect? I don't think so: unless there is some synchronization (see below) happening, two threads cannot know what the other is doing and must not assume anything about the effects of the other thread. So even if the other thread has already updated a piece of shared state and this effect is not yet known locally, this is not wrong. As long the other thread _eventually_ sees the effects of the first thread, this should be correct.

How about synchronization? pthreads offer multiple mechanisms to do so. Here is an example of using pthread_cond_wait and pthread_cond_signal:

pthread_cond_signal would translate into a postMessage call where the signalling thread notifies the waiting thread(s). The message would be sent immediately such that pthread_cond_signal could be embedded into a synchronous call stack. On the receiving side, things are a bit more complicated. For a thread to receive a message in an onmessage handler, it needs to unwind its call stack. That is, one could not simply do a busy wait within pthread_cond_wait. This is where multiple solutions could be considered:

The simple-most solution is to not support thread synchronization through locks and mutexes. The shared state of a thread would simply be copied back to the HEAP of the "spawning" thread once the other thread ends. Not sure how many real-life scenarios could actually benefit from that approach.
Web workers cannot share in-memory state. Other "HTML5" APIs come to the rescue, notably the synchronous Filesystem API (Chrome only), synchronous IndexedDB (not yet supported in Web workers by Firefox). Using synchronous HTTP requests to a server, which buffers operations performed by a thread and lets another thread pick these up when it runs into a blocking call, could work as well. The latter is actually used in other asynchronous concurrency control mechanisms (like Operational Transformation).
An approach like https://github.com/coolwanglu/hypnotic/wiki, where the JS interpreter is written in JS, allowing for synchronous sleep() calls. However, this might affect performance quite significantly.
A custom asynchronous emscripten_pthread_cond_wait_async, which unwinds the stack to allow the onmessage handler to be invoked. Legacy code using POSIX threads had to be partially refactored to benefit from this. However, it would only affect some functions, like pthread_join, pthread_mutex_lock, etc. Refactoring legacy code to replace its blocking calls with asynchronous variants would in many cases presumably not be much work.

In light of the fact that multi-threading would up performance of asm.js-based Javascript quite dramatically and many multi-threaded applications use very simple synchronization mechanisms with little shared state, I can see how emscripten-generated code could hugely benefit from such an enhancement.

Thoughts?

Soeren

Alon Zakai

unread,

Apr 7, 2014, 7:33:41 PM4/7/14

to emscripte...@googlegroups.com

On Wed, Apr 2, 2014 at 5:37 AM, Soeren Balko <soe...@zfaas.com> wrote:

...alright, I know this has been discussed before: Javascript doesn't have threads with shared state and, hence, pthreads are not supported. Web workers do offer concurrency, but rely on asynchronous message passing and can, hence, not be used to simulate pthreads. So I thought.

But here is a number of observations: message passing to/from workers using postMessage are actually synchronous (to some extent) from a _sender_ perspective. Both Firefox and Chrome behave like so: when a postMessage call is performed, the receiving thread (i.e., some worker or the UI thread) will _immediately_ receive the message. That is, the stack does not have to unwind before the message is delivered to the other thread.

How could this help to use Web workers to simulate pthreads w/ shared state?

We could have each thread hold its own copy of HEAP and use postMessages/onmessage to transfer the operation changing the state. In effect, multiple threads could temporarily see different versions of some shared state. Once an operation from another thread is received, it is applied to the local HEAP copy. Is this necessarily incorrect? I don't think so: unless there is some synchronization (see below) happening, two threads cannot know what the other is doing and must not assume anything about the effects of the other thread. So even if the other thread has already updated a piece of shared state and this effect is not yet known locally, this is not wrong. As long the other thread _eventually_ sees the effects of the first thread, this should be correct.

This is correct, and in fact is how many CPUs work. x86 tends to be coherent all the time, but ARM and others require explicit sync points between threads' views of memory. So your postMessages would represent such sync points.

But, performance sounds poor - we would need to send the entire heap in each such message, unless we log out each write. And I think we need to log out each write anyhow - otherwise, how do we merge changes between two threads? We need to know which are the new changes and which the old state, so we can keep the new changes.

Despite my skepticism, I think this is interesting and worth exploring. If you make a prototype it would be a good way to see exactly what the perf impact is here, very curious about that.

- Alon

Sören Balko

unread,

Apr 7, 2014, 7:43:57 PM4/7/14

to emscripte...@googlegroups.com

I will definitely be exploring that. If we had to transmit (a copy of) the entire HEAP, this might, in fact, not be worthwhile doing. However, I wonder if we couldn’t statically determine the shared memory accesses and only buffer the effects of operations manipulating the shared heap, which we send over to the other side at the sync points. The reasoning behind this is that the compiler should be able to relate any HEAP access to an initial memory allocation. It had to do some bookkeeping to “remember” the thread where that memory allocation happened. Only if another thread accesses a piece of state relating to a memory allocation done by another thread, we had to record the effect of that memory access.

Admittedly, this is a complex feature to implement in the compiler (or not, haven’t explored that really). So alternatively, we could as well have some sort of programmatic annotation, where we explicitly mark shared memory accesses. This would do away with pthread (or whatever native threading library) compatibility, but would still give us something similar. As we had to make the blocking operations non-blocking (i.e., asynchronous) anyways, this may well be acceptable. And if that works without major performance implications, one could later let the compiler do that job statically at compile-time.

Not sure when I come to doing that, will initially be extending the SIMD support, which is still patchy and does not make available the full expressiveness of the Javascript SIMD object to emscripten-ed code.

Soeren

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/IxXL55-0vKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dr. Soeren Balko

Founder & Director
zfaas Pty. Ltd.
+61-(0)4-81393191
soe...@zfaas.com

fabien cellier

unread,

Apr 9, 2014, 5:14:20 PM4/9/14

to emscripte...@googlegroups.com

Hello Soeren,Alon,

I have already explored some solutions for WebCL in pure JS. Let me share with you my experiments (although I could have done some mistakes but we can discuss about it;-)

Our solution for shared memory is to write in a single worker and the "subworkers", which are the threads, and call for synchronization with simple postMessage. It is an issue only if your application writes and read a lot or if you need low latency for the synchronization (nbody simulation for example).

But the real challenge is elsewhere: in order to do the synchronization, the JavaScript should stop its execution. Thanks to the pause, the worker will resolve the EventLoop (which will handle the update of the memory)... So, if your synchronization is in the middle of a loop, you have to rewrite the code in order to be able to stop and resume the execution of your loop/recursive call etc...

My solutions so far:

1 : yield :no problem except... no yield in asm.js or IE yet.Moreover, yield does not block all the execution, only the call of the generator...

2 : transform your code into a state machine (the same way regenerator emulates yield in js (http://facebook.github.io/regenerator/) ), and here you relaunch the code from the EventLoop -> big impact/overhead. In our implementation, to do that we emulate frame pointer,heap etc... Furthermore, the execution is resumed by a message from EventLoop (after synchro) = latency (imagine in a loop!!!)

3 : Restrict when the memory could be synchronized (we can't do that with "WebCL.js")

With the second solution (plus synchronization with worker), the nbody simulation is more than 20 times slower than single threaded JavaScript (4000x slower than GPGPU with WebCL)!

Note that nbody code is a bad use case (almost all memory is shared) and we are trying to optimize the code to see if it is possible to limit the overhead.

Hope it will help, and I can share more if you want (code or examples)

regards,

Fabien

Soeren

To unsubscribe from this group and all its topics, send an email to emscripten-discuss+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sören Balko

unread,

Apr 9, 2014, 7:34:38 PM4/9/14

to emscripte...@googlegroups.com

Thanks for sharing this!

I think the programming model would have to be different. Notably, (1) the blocking (synchronisation, locking) operations had to be replaced with asynchronous callbacks and (2) shared state updates had to be either manually annotated or statically analysed by the compiler. So, yes, one would have to refactor legacy code using normal parallel programming approaches. Inevitably, this would rule out some code from benefitting. In particular, multithreaded code, which does a lot of shared state accesses would probably not benefit.

I still think that the aforementioned approach is worthwhile exploring in greater depth.

To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

Fabien Cellier

unread,

Apr 11, 2014, 10:43:53 AM4/11/14

to emscripte...@googlegroups.com

>I still think that the aforementioned approach is worthwhile exploring in greater depth.

I am sorry if my email let you think that I don't agree that you should work on it.. The purpose was more to ask you how I can help and do you need some of my work/code/tests for that ;-)

Note that my code is not optimized yet.

Reply all

Reply to author

Forward