...alright, I know this has been discussed before: Javascript doesn't have threads with shared state and, hence, pthreads are not supported. Web workers do offer concurrency, but rely on asynchronous message passing and can, hence, not be used to simulate pthreads. So I thought.
But here is a number of observations: message passing to/from workers using postMessage are actually synchronous (to some extent) from a _sender_ perspective. Both Firefox and Chrome behave like so: when a postMessage call is performed, the receiving thread (i.e., some worker or the UI thread) will _immediately_ receive the message. That is, the stack does not have to unwind before the message is delivered to the other thread.
How could this help to use Web workers to simulate pthreads w/ shared state?
We could have each thread hold its own copy of HEAP and use postMessages/onmessage to transfer the operation changing the state. In effect, multiple threads could temporarily see different versions of some shared state. Once an operation from another thread is received, it is applied to the local HEAP copy. Is this necessarily incorrect? I don't think so: unless there is some synchronization (see below) happening, two threads cannot know what the other is doing and must not assume anything about the effects of the other thread. So even if the other thread has already updated a piece of shared state and this effect is not yet known locally, this is not wrong. As long the other thread _eventually_ sees the effects of the first thread, this should be correct.
How about synchronization? pthreads offer multiple mechanisms to do so. Here is an example of using pthread_cond_wait and pthread_cond_signal:
pthread_cond_signal would translate into a postMessage call where the signalling thread notifies the waiting thread(s). The message would be sent immediately such that pthread_cond_signal could be embedded into a synchronous call stack. On the receiving side, things are a bit more complicated. For a thread to receive a message in an onmessage handler, it needs to unwind its call stack. That is, one could not simply do a busy wait within pthread_cond_wait. This is where multiple solutions could be considered:
- The simple-most solution is to not support thread synchronization through locks and mutexes. The shared state of a thread would simply be copied back to the HEAP of the "spawning" thread once the other thread ends. Not sure how many real-life scenarios could actually benefit from that approach.
- Web workers cannot share in-memory state. Other "HTML5" APIs come to the rescue, notably the synchronous Filesystem API (Chrome only), synchronous IndexedDB (not yet supported in Web workers by Firefox). Using synchronous HTTP requests to a server, which buffers operations performed by a thread and lets another thread pick these up when it runs into a blocking call, could work as well. The latter is actually used in other asynchronous concurrency control mechanisms (like Operational Transformation).
- An approach like https://github.com/coolwanglu/hypnotic/wiki, where the JS interpreter is written in JS, allowing for synchronous sleep() calls. However, this might affect performance quite significantly.
- A custom asynchronous emscripten_pthread_cond_wait_async, which unwinds the stack to allow the onmessage handler to be invoked. Legacy code using POSIX threads had to be partially refactored to benefit from this. However, it would only affect some functions, like pthread_join, pthread_mutex_lock, etc. Refactoring legacy code to replace its blocking calls with asynchronous variants would in many cases presumably not be much work.
In light of the fact that multi-threading would up performance of asm.js-based Javascript quite dramatically and many multi-threaded applications use very simple synchronization mechanisms with little shared state, I can see how emscripten-generated code could hugely benefit from such an enhancement.
Thoughts?
Soeren