Concurrency standard module

3 views
Skip to first unread message

Kris Zyp

unread,
Sep 21, 2009, 12:12:32 PM9/21/09
to comm...@googlegroups.com
I was wondering if we could consider a concurrency module to help with
some basic concurrency constructs, in particular for working in a shared
memory threaded environments. While writing middleware for Jack, I
realized it can be almost impossible to write certain functionality in
thread safe manner without some concurrency support, like at
locks/mutexes, that are lacking from raw JavaScript. I realize that
there are certainly alternate concurrency models that do not require
locks, but I didn't think that CommonJS was attempting to force a
particular concurrency model (and I am not sure that it should either).
With the possibility that modules could be run in a threaded
environment, it seems important to have something to safeguard against
race conditions. For example, Jack runs on Jetty and Simple using
multiple threads with shared memory. While such a topic often sparks
responses suggesting that something like erlang's messaging would be
nice, but to really seriously consider that I think we would need a real
proposal and even then I am afraid it would be quite onerous to require
that something like Jack and Persevere be rewritten to follow a message
passing model, especially while preserving performance and scalability.
I think for where we are right now, we have to consider that the
concurrency model is an unknown, and provide a way for modules to
safeguard themselves from race conditions.

Anyway, here is a real simple, rough start to get going, based on some
of Java's stuff. I don't really care if it changes, I just want
something to be able to write cross engine middleware that is
thread-safe (or have someone explain how I can do it with what we have
right now).

require("concurrency").Lock -> constructor for a reentrant mutual
exclusion lock.
require("concurrency").SharedLock -> constructor for a reentrant shared
lock.
require("concurrency").ThreadLocal -> constructor for a thread-local
variable.

var lock = new (require("concurrency").Lock)();
lock.lock() -> aquires a lock
lock.unlock() -> attempts to releases a lock
lock.tryLock() -> try to aquire a lock

var threadLocal = new (require("concurrency").ThreadLocal)();
threadLocal.get() -> get the value for the current thread
threadLocal.get(value) -> set the value for the current thread

And we could consider something like Rhino's "sync" function.
Thanks,
Kris

Wes Garland

unread,
Sep 21, 2009, 1:20:27 PM9/21/09
to comm...@googlegroups.com
Hi, Kris;

Writing programs using multiple threads of JavaScript, what I call "MT JS" and JS in a multi-threaded environment (thread-safe JS) are two quite different things.  I know you know this, I'm just setting the stage for further discussions.

What you're talking about is "MT JS", where two separate threads of JavaScript share access to the same JavaScript objects.

This is possible in SpiderMonkey (caveat reador, bugs in 1.8+), I guess it must also be possible in Rhino?  I'm not sure it is in JSCore or V8, would love to hear from the peanut gallery.

Do you have a syntax for creating Threads from JS, or are you creating new threads of JS from Java and then allowing access to the objects "from the back"?  If you're planning to create Threads from JS, I think you should consider the API that GPSEE uses: http://www.page.ca/~wes/opensource/gpsee/docs/modules/symbols/Thread.Thread.html

I don't yet have a locking API -- haven't need one for my simple cases, jsapi serializes property accesses which is enough.  But your mutex API looks basic, and good. lock, unlock, tryLock. Perfecto.

I think the litmus test for "is the level-level lock API good enough" is -- can we write code like this out of it?

var myLock = new require("concurrency").Lock;
withLock(myLock, function() { doWork(); });

I believe the answer is yes:

function withLock(lock, fn)
{
   try
   {
     lock.lock();
     fn();
   }
   catch(e)
   {
     throw(e);
   }
   finally
   {
     lock.unlock();
   }
}


require("concurrency").Lock -> constructor for a reentrant mutual
exclusion lock.
require("concurrency").SharedLock -> constructor for a reentrant shared
lock.

Why do we need shared locks, and how do you see those working?  These are just member read locks?  If so, how are updates handled?  Also, I believe SharedLocks could be implemented in pure JS from exclusive locks, meaning it may not be necessary to spec these now.
 
require("concurrency").ThreadLocal -> constructor for a thread-local
variable.

Instanciating real thread local variables requires modification of the JavaScript interpreter.   I do not see that happening.  I think what you're going after should be called Thread Local Storage. Which is is totally doable.

var threadLocal  = new (require("concurrency").ThreadLocal)();
threadLocal.get() -> get the value for the current thread
threadLocal.get(value) -> set the value for the current thread

I take it you meant "get" in that last line.

I was just about to propose a CAS additional to this spec, but I just realized that it's not possible with generic JS types.  The syntax I considered was

v = require("concurrency").compareAndSwap(v, old, new);

The problem is that there is no lockless mechanism for assigning to v, as JS is pass-by-value. Unless..

if (!require("concurrency").compareAndSwap(scope, "v", old, new))
  print("failure");

I still believe that my CAS idea a bad one: just throwing it out there to keep others from going down that same path.

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Kris Zyp

unread,
Sep 21, 2009, 1:34:01 PM9/21/09
to comm...@googlegroups.com

Wes Garland wrote:
> Hi, Kris;
>
> Writing programs using multiple threads of JavaScript, what I call "MT
> JS" and JS in a multi-threaded environment (thread-safe JS) are two
> quite different things. I know you know this, I'm just setting the
> stage for further discussions.
>
> What you're talking about is "MT JS", where two separate threads of
> JavaScript share access to the same JavaScript objects.
>
> This is possible in SpiderMonkey (caveat reador, bugs in 1.8+), I
> guess it must also be possible in Rhino? I'm not sure it is in JSCore
> or V8, would love to hear from the peanut gallery.
>
> Do you have a syntax for creating Threads from JS, or are you creating
> new threads of JS from Java and then allowing access to the objects
> "from the back"? If you're planning to create Threads from JS, I
> think you should consider the API that GPSEE uses:
> http://www.page.ca/~wes/opensource/gpsee/docs/modules/symbols/Thread.Thread.html

> <http://www.page.ca/%7Ewes/opensource/gpsee/docs/modules/symbols/Thread.Thread.html>
The immediate situation where I was thinking about this was in Jack and
Persevere where the threads are created by the servlet engine. The JS
isn't actually creating the threads, but must deal with the fact that
may be executed in multiple threads. Of course being able to actually
create the threads in JS would be good as well. Your API looks good to
me. I know Rhino's shell also has a "spawn" function (simple executes
the callback parameter in a separate thread).


>
> I don't yet have a locking API -- haven't need one for my simple
> cases, jsapi serializes property accesses which is enough. But your
> mutex API looks basic, and good. lock, unlock, tryLock. Perfecto.
>
> I think the litmus test for "is the level-level lock API good enough"
> is -- can we write code like this out of it?
>
> var myLock = new require("concurrency").Lock;
> withLock(myLock, function() { doWork(); });
>
> I believe the answer is yes:
>
> function withLock(lock, fn)
> {
> try
> {
> lock.lock();
> fn();
> }
> catch(e)
> {
> throw(e);
> }
> finally
> {
> lock.unlock();
> }
> }
>

Yup.


>
> require("concurrency").Lock -> constructor for a reentrant mutual
> exclusion lock.
> require("concurrency").SharedLock -> constructor for a reentrant
> shared
> lock.
>
>
> Why do we need shared locks, and how do you see those working? These
> are just member read locks? If so, how are updates handled? Also, I
> believe SharedLocks could be implemented in pure JS from exclusive
> locks, meaning it may not be necessary to spec these now.

There are times when shared locks (that can acquire shared/read or
exclusive/write) are helpful, but you are right, they can be implemented
with simple locks. Just thought I would throw it out there.

>
>
> require("concurrency").ThreadLocal -> constructor for a thread-local
> variable.
>
>
> Instanciating real thread local variables requires modification of the
> JavaScript interpreter. I do not see that happening. I think what
> you're going after should be called Thread Local Storage. Which is is
> totally doable.
>
> var threadLocal = new (require("concurrency").ThreadLocal)();
> threadLocal.get() -> get the value for the current thread
> threadLocal.get(value) -> set the value for the current thread
>
>
> I take it you meant "get" in that last line.
>

yeah.
Kris

Neville Burnell

unread,
Sep 21, 2009, 5:39:30 PM9/21/09
to CommonJS
I'd like to also support implicit unlocking, something like:

require("concurrency").Lock(function() {
//executes within lock/unlock boundaries
});

Neville Burnell

unread,
Sep 21, 2009, 5:56:15 PM9/21/09
to CommonJS
ugh, that would be more like:

var lock = new (require("concurrency").Lock)();

lock.do(function(){
}); -> aquires a lock, calls func, unlocks

Neville Burnell

unread,
Sep 21, 2009, 6:03:52 PM9/21/09
to CommonJS
which is what Wes described also .... I need coffee 8-)

ihab...@gmail.com

unread,
Sep 21, 2009, 6:07:56 PM9/21/09
to comm...@googlegroups.com
Hi folks,

On Mon, Sep 21, 2009 at 9:12 AM, Kris Zyp <kri...@gmail.com> wrote:
> I was wondering if we could consider a concurrency module to help with
> some basic concurrency constructs, in particular for working in a shared

> memory threaded environments. ... I didn't think that CommonJS was


> attempting to force a particular concurrency model (and I am not sure that
> it should either).

To the extent that individual modules need to defend their consistency
of their state, they must assume *some* concurrency model. If a given
CommonJS module can be used in both threaded and non-threaded systems,
it must have a "worst case" to fall back onto. This worst case might
on the face of it be to build an implementation, and expose an API,
that includes explicit locking and thread-aware usage patterns. Hence
leaving the question open has forced everyone into coding for threads.
Now, how will the threaded primitives be supported on a non-threaded
system?

Does this mean we end up with 3 "flavors" of modules: (a)
non-threaded; (b) threaded; and (c) purely functional?

Perhaps I should ask this from another angle: JS programming has thus
far proceeded via event loop concurrency, and it seems like a
direction that is far less error-prone than _ad hoc_ threads and
locking. What is wrong with maintaining that?

Ihab

--
Ihab A.B. Awad, Palo Alto, CA

ihab...@gmail.com

unread,
Sep 21, 2009, 6:21:50 PM9/21/09
to comm...@googlegroups.com
By way of a concrete proposal here --

What I *do* think CommonJS should standardize is an API for building
and living within "workers", each of which is single threaded, and
which pass data to one another asynchronously. This should be similar
to Gears and HTML5 workers, and should be informed by those efforts.

Ihab

Wes Garland

unread,
Sep 21, 2009, 11:07:30 PM9/21/09
to comm...@googlegroups.com
Ihab:

I, too, think that JS workers are a good idea. There is another use-case to consider, however.

If an existing MT application wants to run multiple threads of JavaScript: how do we handle this?

Individual property access is not difficult; SpiderMonkey (and I assume Rhino) offer serialized property access.

The tricky part is maintaining atomic updates across multiple properties of the same object. This really is tricky, and we'd need to "stamp" modules MT-safe if we wanted to support this in CommonJS. This means that we need a theory of thread-safety for modules which purport to be MT safe, and it should be consistent everywhere.

I think the following rules may be sufficient for this:
 - Underlying JavaScript engine provides serialized property access
 - require() is guarded by a mutual exclusion lock across the entire JS runtime (all threads)
 - modules are responsible for their own theory of thread safety for non-exported functions and variables, and can assume the presence of a level-0 concurrency module
 - exported functions which accept non-Atom arguments must not modify those arguments
 - exported functions which return non-Atoms must return new non-Atoms
 - Boolean, Number and String are considered Atoms.

Note also that I really believe very very strongly that absolute minification in the theory of thread safety is absolutely paramount.

Wes Garland

unread,
Sep 21, 2009, 11:10:44 PM9/21/09
to comm...@googlegroups.com
> If an existing MT application wants to run multiple threads of JavaScript: how do we handle this?

Forgot to clarify this -- recall that CommonJS isn't necessarily about writing programs in JS, it's also about augment existing applications with JavaScript.  One such example is the Apache 2 web server running the threaded MPM. What if we .... wanted to write a JavaScript database API which pools connections?

Kris Zyp

unread,
Sep 22, 2009, 1:16:39 AM9/22/09
to comm...@googlegroups.com
I think this is indeed a reasonable alternative (and I appreciate the
concrete proposal, I most wanted to avoid indefinitely deferring a
specification because some other model might be better without any real
solid way to move forward). A few questions/thoughts (not necessarily in
any coherent argument):

First, this is a real uphill battle. CommonJS has barely started and we
have already been herded into shared memory threading with Jack and
others due to the underlying concurrency models of the JVM and C, and
the least resistance path of building on Jetty, Simple, etc. Keeping
shared nothing concurrency (with event loops) wouldn't be a bad path,
but it'll be non-trivial to keep on that path.

Second, it seems like CommonJS has had a fairly low-level focus so far.
Naturally the low-level aspect of concurrency is shared memory threading
(that which is exposed by the hardware and what the engines must deal
with in concurrent situations). Even if we want to everyone to be doing
shared nothing architecture, should we still define concurrency
constructs like thread constructors, thread-locals, and locks in order
to be able to compose shared nothing event loops in JavaScript by using
the lower level constructs? I suppose if we expose these things, we
can't guarantee that everyone will always build on the higher level
event loops.

Also, I wanted to actually think through what event looping would look
like in one of our primary target applications, that is the web server.
Of course a web server needs to be able concurrently process requests.
If we are using event loops with nothing shared between threads, we need
separate environments for each thread (and a unique JSGI app created for
each one). This introduces some funky non-determinism. With just a few
requests coming into something like jetty, you could very well have
every request fulfilled by the same JSGI app, since every request might
finish before the next came in. A programmer may (perhaps inadvertently)
be storing some state in a per-app variable (in a closure inside the
middleware, outside the app), and expect it to be able available to all
requests. It is not until requests come in at a fast enough pace to
force multiple threads to handle requests, that another JSGI app becomes
utilized, and that variable is no longer shared across all requests.

Also, if the inter-worker communication is asynchronous (which I presume
is preferable to avoid deadlocks), it means that any type of information
sharing between workers will force us into the added complexity of
callbacks or promises (preferably promises). Something as simple as a
server-side session management would seem to entangle us into fairly
complex asynchronous code.

That being said, this is appealing to me if others think we can really
make it go. The fact that there is an existing API for this in HTML5
could perhaps allow us to sidestep a lot of bike shedding. If we think
it is desirable, maybe we could make an event-loop based fork of Jack to
try it out. I might do it sometime, unless someone else jumps on it.
Thanks,
Kris

Ryan Dahl

unread,
Sep 22, 2009, 6:07:46 AM9/22/09
to comm...@googlegroups.com
I agree with Ihab - it would be nice if CommonJS defined APIs for a
single processes/worker and not enter the realm of threads. Javascript
does not have threads - there is not any proposal to have threads.
While some/most implementation can handle threads - it seems rather
unnecessary to leave behind ES5 and HTML5 just for lack of
imagination. Concurrency should either be done with an event loop
(which requires much more devotion to asynchronous APIs) or with
workers.

Node.js achieves high concurrency on a single event loop. Here is a
"hello world" web server benchmark I ran yesterday for a talk:
http://s3.amazonaws.com/four.livejournal/20090921/bench.png
There is no need to define extensions to javascript (threads) to write
servers - indeed, evidence suggests that doing so damns the server to
suck.

Hannes Wallnoefer

unread,
Sep 22, 2009, 6:53:37 AM9/22/09
to comm...@googlegroups.com
2009/9/22 Ryan Dahl <coldre...@gmail.com>:
>
> I agree with Ihab - it would be nice if CommonJS defined APIs for a
> single processes/worker and not enter the realm of threads. Javascript
> does not have threads - there is not any proposal to have threads.
> While some/most implementation can handle threads - it seems rather
> unnecessary to leave behind ES5 and HTML5 just for lack of
> imagination. Concurrency should either be done with an event loop
> (which requires much more devotion to asynchronous APIs) or with
> workers.

To me the threads model feels natural, but I know that's just because
I've banged my head on it for long enough. So I'm all for considering
fresh (or, as is the case for the event loop, refurbished) concepts.

> Node.js achieves high concurrency on a single event loop. Here is a
> "hello world" web server benchmark I ran yesterday for a talk:
> http://s3.amazonaws.com/four.livejournal/20090921/bench.png

Out of curiosity, is that Helma 1 or NG? Is there a blog post or
presentation for that talk?

Hannes

Neville Burnell

unread,
Sep 22, 2009, 7:03:46 AM9/22/09
to CommonJS
> Out of curiosity, is that Helma 1 or NG? Is there a blog post or
> presentation for that talk?

I'm interested in the detail also.

Ryan Dahl

unread,
Sep 22, 2009, 7:08:02 AM9/22/09
to comm...@googlegroups.com
On Tue, Sep 22, 2009 at 12:53 PM, Hannes Wallnoefer <han...@gmail.com> wrote:
>
>> Node.js achieves high concurrency on a single event loop. Here is a
>> "hello world" web server benchmark I ran yesterday for a talk:
>> http://s3.amazonaws.com/four.livejournal/20090921/bench.png
>
> Out of curiosity, is that Helma 1 or NG? Is there a blog post or
> presentation for that talk?

That's helma-1.6.3 using this http://helma.pastebin.com/f1d786c9 to
serve the request. (v8cgi 0.6, narwhal d147c160, node 0.1.11.) I know
this isn't fair to present benchmarks this way, without scripts,
methods, or data. I'm just being lazy. I did a more detailed benchmark
a few weeks ago (without helma)
http://four.livejournal.com/1019177.html and the setup is similar.

The talk was just a short lightening presentation:
http://s3.amazonaws.com/four.livejournal/20090922/webmontag.pdf

Kris Zyp

unread,
Sep 22, 2009, 9:06:13 AM9/22/09
to comm...@googlegroups.com
Do you plan on making Node be JSGI compliant? It does not look like
there is any conflict between the JSGI API and your fundamental
processing model. Or would one need to run something like Jack on top of
Node to utilize cross-server applications? I was curious what you meant
by "Narwhal". Narwhal's http module is a HTTP client, not a server
(maybe thats why it scored low!). Or did you mean Jack? And if so, which
engine was it running on? Having such low numbers with Simple would seem
suspect since Simple has such a high performance characteristics with
fairly little overhead added by Jack (or maybe there is something
expensive in there that I missed).
Kris

Ryan Dahl

unread,
Sep 22, 2009, 9:40:19 AM9/22/09
to comm...@googlegroups.com
On Tue, Sep 22, 2009 at 3:06 PM, Kris Zyp <kri...@gmail.com> wrote:
>
> Do you plan on making Node be JSGI compliant?

JSGI compliant - probably not. At least partially CommonJS compliant -
probably. At the moment I'm letting node evolve on its own.

> It does not look like
> there is any conflict between the JSGI API and your fundamental
> processing model. Or would one need to run something like Jack on top of
> Node to utilize cross-server applications?

I don't like that JSGI forces one to determine the response code and
response headers in a single function. One cannot for example, access
a database before returning the status code. Maybe I want to 404 if a
record does not exist. Well, one can do this if your database call
blocks execution inside the jack handler, but blocking implies
allocating a new execution stack with, if not threads, then
coroutines. Allocating a 2mb execution stack for each request is
rather an unacceptable sacrifice to be had at the API level, even
before an implementation is around to further slow down the server.

So, perhaps Node will have proper coroutines at some point (the
current implementation is rather bad). And perhaps someone will write
a JSGI interface on top of Node's more general HTTP interface, and
then people can access databases and wait() for the promise to
complete, and return 404 from the same function. This is not ruled
out.

> I was curious what you meant
> by "Narwhal". Narwhal's http module is a HTTP client, not a server
> (maybe thats why it scored low!). Or did you mean Jack? And if so, which
> engine was it running on? Having such low numbers with Simple would seem
> suspect since Simple has such a high performance characteristics with
> fairly little overhead added by Jack (or maybe there is something
> expensive in there that I missed).

The exact code being used was the same as described in
http://four.livejournal.com/1018026.html (which is a different link
than the one I gave above).

Kris Zyp

unread,
Sep 22, 2009, 10:06:33 AM9/22/09
to comm...@googlegroups.com

Ryan Dahl wrote:
> On Tue, Sep 22, 2009 at 3:06 PM, Kris Zyp <kri...@gmail.com> wrote:
>
>> Do you plan on making Node be JSGI compliant?
>>
>
> JSGI compliant - probably not. At least partially CommonJS compliant -
> probably. At the moment I'm letting node evolve on its own.
>

Oh, you aren't the one responsible for node?


>
>> It does not look like
>> there is any conflict between the JSGI API and your fundamental
>> processing model. Or would one need to run something like Jack on top of
>> Node to utilize cross-server applications?
>>
>
> I don't like that JSGI forces one to determine the response code and
> response headers in a single function. One cannot for example, access
> a database before returning the status code. Maybe I want to 404 if a
> record does not exist. Well, one can do this if your database call
> blocks execution inside the jack handler, but blocking implies
> allocating a new execution stack with, if not threads, then
> coroutines. Allocating a 2mb execution stack for each request is
> rather an unacceptable sacrifice to be had at the API level, even
> before an implementation is around to further slow down the server.
>

This is perfectly doable with the JSGI model plus promises that we have
discussed, as long as the JSGI server does not grab the status from the
response object until it needs to. Here is an example JSGI App with
async database interaction, completely inline with your event loop
model, no threading or coroutines needed (although I wouldn't mind some
coroutine sugar ala JS 1.7 generators, but its not necessary):

app = function databaseApp(env){
var resultPromise = doAsyncDatabase("select * from table where id = 3");
var response = {
status: 200, // the JSGI server should not write the status until
the first body write
headers:{},
body: {forEach: function(write){
return resultPromise.then(function(resultSet){
if(resultSet.length == 0){
response.status = 404; // change the status before the
forEach promise is fulfilled
}
else{
write(JSON.stringify(resultSet[0]));
}
});
}}
};
return response;
};

> So, perhaps Node will have proper coroutines at some point (the
> current implementation is rather bad). And perhaps someone will write
> a JSGI interface on top of Node's more general HTTP interface, and
> then people can access databases and wait() for the promise to
> complete, and return 404 from the same function. This is not ruled
> out.
>

Yeah, I guess we could do that, although hopefully it is apparent that
JSGI + promises API would not actually hinder Node (and it would be cool
to be able to run JSGI apps directly on it).
Kris

Wes Garland

unread,
Sep 22, 2009, 10:09:08 AM9/22/09
to comm...@googlegroups.com
> Allocating a 2mb execution stack for each request is
> rather an unacceptable sacrifice to be had at the API level, even
> before an implementation is around to further slow down the server.

It also seems about 1.8MB too heavy, but I suppose that's neither here nor there. Are Rhino contexts REALLY that big, thought?

Ryan Dahl

unread,
Sep 22, 2009, 10:52:54 AM9/22/09
to comm...@googlegroups.com
On Tue, Sep 22, 2009 at 4:06 PM, Kris Zyp <kri...@gmail.com> wrote:
>
> This is perfectly doable with the JSGI model plus promises that we have
> discussed, as long as the JSGI server does not grab the status from the
> response object until it needs to.

Okay - I stand corrected. That's an important feature!

> app = function databaseApp(env){
>   var resultPromise = doAsyncDatabase("select * from table where id = 3");
>   var response = {
>     status: 200, // the JSGI server should not write the status until
> the first body write
>     headers:{},
>     body: {forEach: function(write){
>         return resultPromise.then(function(resultSet){
>            if(resultSet.length == 0){
>               response.status = 404; // change the status before the
> forEach promise is fulfilled
>            }
>            else{
>                write(JSON.stringify(resultSet[0]));
>            }
>         });
>     }}
>   };
>   return response;
> };

Does seem rather convoluted, doesn't it? I would still rather see
something like:

server.addListener("request", function (request, response) {


var resultPromise = doAsyncDatabase("select * from table where id = 3");

resultPromise.addCallback(function (resultSet) {
if (resultSet.length == 0) {
response.sendHeader(404, {});
} else {
response.sendHeader(200, {});
response.sendBody(JSON.stringify(resultSet));
}
response.finish();
});
});

Kris Zyp

unread,
Sep 22, 2009, 10:58:38 AM9/22/09
to comm...@googlegroups.com

I agree that the JSGI/promise approach is a more awkward than your
example, but I think think the philosophy is that JSGI allows the simple
case (of a basic sync function) to be very simple, and you can deal with
extra complexity as needed. In that sense, I think JSGI strikes a great
balance, making the common use cases very easy to code while still
keeping it possible to do more advance async processing.
Kris

Kris Zyp

unread,
Sep 22, 2009, 11:16:11 AM9/22/09
to comm...@googlegroups.com

Ryan Dahl wrote:
>
> Does seem rather convoluted, doesn't it? I would still rather see
> something like:
>
> server.addListener("request", function (request, response) {
> var resultPromise = doAsyncDatabase("select * from table where id = 3");
> resultPromise.addCallback(function (resultSet) {
> if (resultSet.length == 0) {
> response.sendHeader(404, {});
> } else {
> response.sendHeader(200, {});
> response.sendBody(JSON.stringify(resultSet));
> }
> response.finish();
> });
> });
>

Another thing we could consider is allowing promises to be returned
directly from the JSGI app, rather than just from the forEach call as I
demonstrated. Returning a promise from a JSGI app would look like:

app = function databaseApp(env){
return doAsyncDatabase("select * from table where id = 3").
then(function(resultSet){
if(resultSet.length == 0){
return {status: 404, headers:{}, body:""};
}
else{
return {
status: 200,
headers:{},
body:JSON.stringify(resultSet[0])
};
}
});
};

That reads a lot nicer, but it would put extra burden on servers and
middleware to have to look for promises at two different places (you
still need promises from forEach in order to facilitate streamed responses).
Kris

Kevin Dangoor

unread,
Sep 22, 2009, 11:19:57 AM9/22/09
to comm...@googlegroups.com
On Tue, Sep 22, 2009 at 10:52 AM, Ryan Dahl <coldre...@gmail.com> wrote:
Does seem rather convoluted, doesn't it? I would still rather see
something like:

 server.addListener("request", function (request, response) {
     var resultPromise = doAsyncDatabase("select * from table where id = 3");
     resultPromise.addCallback(function (resultSet) {
          if (resultSet.length == 0) {
              response.sendHeader(404, {});
          } else {
              response.sendHeader(200, {});
              response.sendBody(JSON.stringify(resultSet));
          }
          response.finish();
     });
 });


It's worth noting that most app developers will not be coding to "raw JSGI". You could make an adapter to JSGI+Promises that looks like that, right?

Kevin

--
Kevin Dangoor

work: http://labs.mozilla.com/
email: k...@blazingthings.com
blog: http://www.BlueSkyOnMars.com

Kevin Dangoor

unread,
Sep 22, 2009, 11:21:24 AM9/22/09
to comm...@googlegroups.com
On Tue, Sep 22, 2009 at 11:16 AM, Kris Zyp <kri...@gmail.com> wrote:
That reads a lot nicer, but it would put extra burden on servers and
middleware to have to look for promises at two different places (you
still need promises from forEach in order to facilitate streamed responses).


Yeah, that is a heavy burden. The Python folks apparently had a lot of discussion about async WSGI and they were talking about "async-aware" middleware. A solution that doesn't require every piece of the stack to have to inspect the return value is a big win.

Kevin

Tom Robinson

unread,
Sep 22, 2009, 2:50:08 PM9/22/09
to comm...@googlegroups.com
I suspect doing this would break a lot of middleware that looks at the
response. But then again, the whole idea of JSGI middleware won't work
at all with a purely async interface like Node's, AFAIK.

The only thing I'd worry about is tricky unexpected behaviors, but I
suppose that can mostly be solved through documentation (listing which
middleware is partially or fully async compliant)

Mike Wilson

unread,
Sep 23, 2009, 7:19:43 AM9/23/09
to comm...@googlegroups.com
Kris Zyp wrote:
> Something as simple as a
> server-side session management would seem to entangle us into fairly
> complex asynchronous code.

Yes, this is what I also was thinking about in a discussion a
couple of weeks ago when we touched on threading/runtimes.

Theoretically, I'm all for the shared-nothing approach and
would like to see that materialized in CommonJS. Though, I
haven't had any personal revelation on how this would be
used in a web server that wants simultaneous requests to
share data "from RAM", ie application or session data, in
an unobtrusive way...
Other script languages often use the database for this data
sharing, but it sure would be nice to have a solution that
scales to "live" objects as well.

Best regards
Mike Wilson

ihab...@gmail.com

unread,
Sep 23, 2009, 11:04:07 AM9/23/09
to comm...@googlegroups.com
On Wed, Sep 23, 2009 at 4:19 AM, Mike Wilson <mik...@hotmail.com> wrote:
> ... I

> haven't had any personal revelation on how this would be
> used in a web server that wants simultaneous requests to
> share data "from RAM", ie application or session data, in
> an unobtrusive way...
> Other script languages often use the database for this data
> sharing, but it sure would be nice to have a solution that
> scales to "live" objects as well.

The "live" objects could live in a shared worker, and each worker
servicing an independent HTTP request could send that worker
asynchronous messages. That shared worker now behaves exactly like the
"database" you describe.

Mike Wilson

unread,
Sep 23, 2009, 12:23:33 PM9/23/09
to comm...@googlegroups.com
Yes, I realize it can be done that way. The key was
"unobtrusive", and the async patterns required by a
worker solution can get a bit verbose. It all depends
on how well you can factor out the session access I
guess.

Anyway, if there is general agreement that the typical
stateful webapp can be nicely designed with this
pattern, then let's go for it!

Best regards
Mike

ihab.awad wrote:

Donny Viszneki

unread,
Sep 23, 2009, 12:40:52 PM9/23/09
to comm...@googlegroups.com
On Mon, Sep 21, 2009 at 12:12 PM, Kris Zyp <kri...@gmail.com> wrote:
> require("concurrency").ThreadLocal -> constructor for a thread-local
> variable.

Assuming that a newly created thread has some kind of entry point at
which point it starts executing and has its own lexical scope... what
is the purpose of thread local memory? Thread local memory is a
technology for systems programming languages.

http://en.wikipedia.org/wiki/Thread-local_storage

--
http://codebad.com/

ihab...@gmail.com

unread,
Sep 23, 2009, 2:39:41 PM9/23/09
to comm...@googlegroups.com
Hi Wes,

On Mon, Sep 21, 2009 at 8:07 PM, Wes Garland <w...@page.ca> wrote:
> The tricky part is maintaining atomic updates across multiple properties of
> the same object.

Not just the *same* object -- *across* multiple objects as well. In
other words, for some subgraph S of objects, we need to be concerned
about whether a temporarily inconsistent state of this subgraph is (a)
observable; and (b) usable by other objects as a way to subvert that
subgraph's future attempts at regaining consistency.

The "worker" model offers a pretty good compromise. Each worker
processes one event at a time. While a worker is processing an event,
its internal state may be inconsistent. However, it does not serve any
other events, so this inconsistency is neither (a) observable nor (b)
subvertible. The programmer in charge of the worker should arrange to
reestablish consistency at the end of each event (even if by a Boolean
flag that says "inconsistent" and causes service requests to queue up
or be rejected).

> I think the following rules may be sufficient for this:
>  - Underlying JavaScript engine provides serialized property access
>  - require() is guarded by a mutual exclusion lock across the entire JS
> runtime (all threads)
>  - modules are responsible for their own theory of thread safety for
> non-exported functions and variables, and can assume the presence of a
> level-0 concurrency module
>  - exported functions which accept non-Atom arguments must not modify those
> arguments
>  - exported functions which return non-Atoms must return new non-Atoms
>  - Boolean, Number and String are considered Atoms.

I don't know if these rules would work, but they do seem to suggest
that each *module* lives in its own threading realm of sorts, exposing
a simplified interface to other modules. The worker model keeps the
two concerns largely independent: multiple modules may live within one
worker and communicate in a simple, synchronous manner, or they may
spawn other workers and communicate asynchronously with these workers.

Donny Viszneki

unread,
Sep 23, 2009, 3:16:52 PM9/23/09
to comm...@googlegroups.com
On Wed, Sep 23, 2009 at 2:39 PM, <ihab...@gmail.com> wrote:
> On Mon, Sep 21, 2009 at 8:07 PM, Wes Garland <w...@page.ca> wrote:
>> The tricky part is maintaining atomic updates across multiple properties of
>> the same object.
>
> Not just the *same* object -- *across* multiple objects as well. In
> other words, for some subgraph S of objects, we need to be concerned
> about whether a temporarily inconsistent state of this subgraph is (a)
> observable; and (b) usable by other objects as a way to subvert that
> subgraph's future attempts at regaining consistency.

Do you think your concern cannot be reasonably achieved if we have
Wes's concerns granted? For clarity:

Ihab seems concerned with collisions of Javascript logic.

Wes seems concerned with collisions of interpreter logic.

If you can be sure that accessing a particular variable concurrently
is safe, can't you then use that as a primitive for locking other
subsections of the object graph? I don't think the goal is to make it
impossible for a programmer to write poor code.

> The "worker" model offers a pretty good compromise. Each worker
> processes one event at a time. While a worker is processing an event,
> its internal state may be inconsistent. However, it does not serve any
> other events, so this inconsistency is neither (a) observable nor (b)
> subvertible. The programmer in charge of the worker should arrange to
> reestablish consistency at the end of each event (even if by a Boolean
> flag that says "inconsistent" and causes service requests to queue up
> or be rejected).

Sure, but can't Workers be *easily* implemented on top of the other
concurrency features others are asking for?

The Worker API is just a multi-threading API that chops up the object
graph for each Worker, such that the only things shared are basically
synchronized message queues.

p.s. if anyone wants to tell me what the mothership that launches the
first Workers is called in Worker vernacular, you win cookies

--
http://codebad.com/

Kris Zyp

unread,
Sep 23, 2009, 3:23:10 PM9/23/09
to comm...@googlegroups.com

Mike Wilson wrote:
> Yes, I realize it can be done that way. The key was
> "unobtrusive", and the async patterns required by a
> worker solution can get a bit verbose. It all depends
> on how well you can factor out the session access I
> guess.
>
> Anyway, if there is general agreement that the typical
> stateful webapp can be nicely designed with this
> pattern, then let's go for it!
>
You can't wrap an asynchronous operations in something to make it look
like a simple sync operation. Anytime you accessed a session (if it was
shared server side data structure) you would be forced to put the
continuation in a callback (via promises or otherwise, with the
exception of generator style async code). In that sense I am not sure if
it would be classified as "unobtrusive".

I am actually in favor of the event-loop, but I am not sure if we really
want or even can provide a total shared-nothing architecture. The most
popular platform (at least ATM) is Rhino with access to the Java
runtime. If a platform attempts to create multiple isolated
vats/mini-processes, each with their own event loop, they still
ultimately can access shared data in the JVM from different threads. The
only way to prevent that would be to block access to Java packages
(which is one of the main reasons people use Rhino is for Java
libraries, to fill in the gaps that CommonJS won't be able to provide
within the next 5 years) or run multiple Java processes (that would be
horrible).

Because of the fact that we can't realistically enforce a true
shared-nothing architecture in JS environment, I don't think we should
try to eliminate every possible form of access to shared data between
threads. I believe we would be in a better position to provide a good
event-loop based environment where developers don't really need to share
references to mutable objects across threads(they can pass async
messages between vats), but can if they really want to. The default
behavior is that modules should never need to deal with thread safety
(unless they are specifically designed for that purpose). Guidelines
should insist that mutable data that is referenced between threads
should not escape modules. In particular, JSGI needs to allow middleware
and apps to communicate with another worker process through the HTML5
worker API, such that they can communicate between concurrency without
accessing shared mutable data. Also, promises that enqueue tasks must
only be fulfilled on the initiated process.

Consequently, I believe we should still provide a *low-level*
concurrency API that defines some basic concurrency constructs. We can
then build an event loop system on top of this which could essentially
be implemented in JS using the concurrency module's functions (which
makes easier to span engines). Other modules may also use this API (with
great caution) if they need to implement something which fits better
with the threaded shared mutable data model. This allows module writers
to choose the best tool for the job.

Thoughts?
Thanks,
Kris

Tom Robinson

unread,
Sep 23, 2009, 3:51:08 PM9/23/09
to comm...@googlegroups.com

I've been trying to figure out how one would efficiently implement
(for example) a JSGI server that uses multiple web workers. Workers
only allow you to send messages back and forth. Would you need to
buffer and serialize the request and response, or is there some way
you could hand off the io streams to a worker?


Donny Viszneki

unread,
Sep 23, 2009, 4:08:23 PM9/23/09
to comm...@googlegroups.com

Doesn't that depend on the Workers implementation?

--
http://codebad.com/

Kris Zyp

unread,
Sep 23, 2009, 4:18:00 PM9/23/09
to comm...@googlegroups.com

I've been thinking through this as well. This is one of the reasons why
I suggested that we don't want a completely shared nothing architecture.
Basically, I think we want Jack to create a pool of workers, and then as
requests come in, they should be delegated to whatever worker is idle
and available (or have the request go into a pool waiting for idle
workers if none is available). Each worker has its own event loop, and
so when a request is delegated to the worker, it goes to in its event
loop to be processed. For the sake of efficiency, its important that one
can actually pass a reference to the
request/response/inputstream,outputstream to the workers, not just an
string.

I started toying with how this would work, and this is what I was thinking:
reactor.js - This would be built up to be the event loop handler (I
think that was Kris Kowal's intention with this module). It would have a
thread-safe enqueue function that could be called from any thread, and
would add a task to the event queue (in Rhino, we would probably use a
LinkedBlockingQueue). reactor would also expose an enterEventLoop()
function that would start the processing of the queue (should only be
called by the thread for the event loop).

worker.js - This would expose a Worker constructor that would start a
new thread and a new global environment for the worker, and put a
postMessage function there that would delegate to the caller's reactor
enqueue. It would startup a new reactor in the worker thread, execute
the script provided by the argument to worker and then and trigger
enterEventLoop().

The way I was thinking was that there might also be a "postEvent" that
could post any event to the worker with any data (not just a string).
The main JSGI worker (the mothership) would then be able to call
postEvent methods on each worker to send it the request object. The
worker would define an onrequest event to receive the events (this could
be implemented by a script that was passed to the worker from the Jack
main thread), which would be given the all the necessary references to
carry out the response to the request.

Does this seem like a reasonable approach?

Kris

Ryan Dahl

unread,
Sep 23, 2009, 5:49:28 PM9/23/09
to comm...@googlegroups.com
On Wed, Sep 23, 2009 at 9:51 PM, Tom Robinson <tlrob...@gmail.com> wrote:
>
> I've been trying to figure out how one would efficiently implement
> (for example) a JSGI server that uses multiple web workers. Workers
> only allow you to send messages back and forth. Would you need to
> buffer and serialize the request and response, or is there some way
> you could hand off the io streams to a worker?

Sending a "port" through a message channel is part of html5's spec
http://www.whatwg.org/specs/web-apps/current-work/multipage/comms.html#posting-messages
it's the analogue of sending an fd with sendmsg(). This would be a way
of distributing requests across multiple workers.

Kris Zyp

unread,
Sep 23, 2009, 11:02:46 PM9/23/09
to comm...@googlegroups.com
I thought an example me be helpful in looking at the implications of
different concurrency approachs. He is a page counter (with atomic
counting) in two different styles. Something like a server session
object would probably be fairly similar:

Using shared memory with locks:

var hits = 0;
var lock = new (require("concurrency")).Lock();
function incrementAndGetHits(){
lock.lock();
hits++;
lock.unlock();
return hits;
}
exports.app = function(env){
return {
status:200,
headers:{"content-type":"text/plain"},
body: "This page has been hit " + incrementAndGetHits() + " times";
};
}


And here is the same thing using shared nothing with HTML5 workers:

jsgi file:
var hitTrackerWorker = new (require("worker").SharedWorker)("hit-tracker");
var Future = require("promise").Future; // get the constructor
var waitingFutures = []; // all the requests that are waiting to be finished

// note that we can't just do this in a closure inside the app because
there may be
// multiple unfulfilled promises at once, but there is only one
onmessage handler
hitTrackerWorker.port.onmessage = function(event){
// probably easiest to use JSON for structured messages
var message = JSON.parse(event.data);
switch(message.action){
case "incrementResults":
waitingFutures.shift().fulfill(message.hits);
... all other types of response would need to register here
}
};
});
};

// this app will be created in each worker
exports.app = function(env){
// send a message to the shared worker to increment the hits and
send back the result
hitTrackerWorker.port.postMessage(JSON.stringify({
action: "incrementAndGetHits"
}));

var future = new Future();
// add it to our queue of futures that need to be fulfilled
waitingFutures.push(future);
return {
status:200,
headers:{"content-type":"text/plain"},
body: {
forEach: function(write){
// delay the response with a promise
return future.promise.then(function(hits){
write("This page has been hit " + hits + " times");
});
}
}
};
}

hit-tracker.js:
var hits = 0;
onmessage= function(event){
var message = JSON.parse(event.data);
switch(message.action){
case "incrementAndGetHits":
hits++; // this is thread-safe since it is in a single worker
// send back the results
event.port[0].postMessage(JSON.stringify({
action: "incrementResults":
hits: hits
}));
.... any other actions handled by this worker
}
}


Is there an easier way to do this with workers? This exercise makes me a
little skeptical that shared nothing architecture is really the best
tool for every need. I like having event loops available, and would
certainly use them whenever possible, but I am dubious that we should
(or could) try to eliminate shared mutable data across threads. What if
we simply associated an event loop with every thread (allowing them to
share everything), had JSGI servers distribute request across event
loops and had enqueuing operations (deferred futures, setTimeouts, etc.)
always enqueue on to the current thread's event loop so as to make it
easier for lexically private data to always remain on the same thread
and avoid race conditions?

Kris

ihab...@gmail.com

unread,
Sep 24, 2009, 12:33:49 AM9/24/09
to comm...@googlegroups.com
Hi Kris,

Thanks for writing this up ... this is a really good example.

My hunch is that the HTML5 workers solution is made verbose by the
fact that the HTML5 worker API is very low-level (albeit *perhaps*
adequate to implement the sugar we need).

I've fired off a thread to e-l...@eros-os.org, and it's already shown
up in the archives, so see:

http://www.eros-os.org/mailman/listinfo/e-lang

The gist of it is to ask, what would this look like in E? E was built,
with a lot of design freedom from legacy, around this specific
architecture, so it should be easy there, rght? Let's see what they
come up with, then we can figure out what missing parts we would need
to be able to do similar stuff, and whether these missing parts are
fundamental to JS or just helper libs that someone needs to up and
write.

Again, thanks for the problem and for taking the time to write out the
solution. To my knowledge, this is the first time we've asked
ourselves this question in any detail for pure JS. Kevin Reid, a
longtime E developer, built an implementation of CapTP (the RPC
protocol that E uses), but that was on top of Caja. I am not yet well
enough versed in his work to determine how much of it could (with
sacrifice of its security properties) be adapted to pure JS; Caja
gives JS programmers some virtualization abilities that pure JS
doesn't have.

Kris Zyp

unread,
Sep 24, 2009, 12:59:23 AM9/24/09
to comm...@googlegroups.com, e-l...@mail.eros-os.org, ihab...@gmail.com

ihab...@gmail.com wrote:
> Hi Kris,
>
> Thanks for writing this up ... this is a really good example.
>
> My hunch is that the HTML5 workers solution is made verbose by the
> fact that the HTML5 worker API is very low-level (albeit *perhaps*
> adequate to implement the sugar we need).
>

I am curious what it would look like in E, but you are right, I think we
can do much better than direct interaction with a low level API. I
believe it would be pretty straightforward to write a JSON-RPC library
on top of workers that would handle posting a message, waiting for
response, returning a promise, and fulfilling the promise. Then we could
actually write (with shared nothing event loop):

var hitTrackerWorker = new (require("worker").SharedWorker)("hit-tracker", "tracker");
var hitTrackerRPC = require("rpc-worker").wrapWorker(hitTrackerWorker);

exports.app = function(env){
return {
status:200,
headers:{"content-type":"text/plain"},

body: {
forEach: function(write){
// hitTrackerRPC.call will return a promise
return hitTrackerRPC.call("incrementAndGet", []).


then(function(hits){
write("This page has been hit " + hits + " times");
});
}
}
};
}


hit-tracker.js:

var exporter = require("rpc-exporter").exporter;

var hits = 0;
exporter({
incrementAndGet: function(){
returns ++hits;
});

And that doesn't look too bad at all.

Still, I wonder if we should rather be stating the question as where on the gradient of sharing do we want to be (since blocking any shared-mutables may not be realistic, at least with Rhino):

shared-difficult - Every worker has a separate global, postMessage only carries strings, block everything possible, but concede that code may access shared data in the JVM for Rhino users.

shared-easy - Create threads that share everything. But provide event loops to allow users to have a nice share-nothing style mechanism for avoiding sharing if desired.

shared-something-in-between - Perhaps separate globals, but allow posting non-string objects across workers?


Kris

Wes Garland

unread,
Sep 24, 2009, 8:33:54 AM9/24/09
to comm...@googlegroups.com
Kris;

This message is a little off-topic for this list, as I'm going to talk about feasibility of implementation, rather than the merits or the API.  But I'm hoping it will provide food for thought for CommonJS.onTopic. :)

On Thu, Sep 24, 2009 at 12:59 AM, Kris Zyp <kri...@gmail.com> wrote:

Still, I wonder if we should rather be stating the question as where on the gradient of sharing do we want to be (since blocking any shared-mutables may not be realistic, at least with Rhino):

By "blocking shared-mutables", do mean blocking on read of an object, when another thread has some kind of lock on it?  If so, I think this IS implementable -- at least in SpiderMonkey -- but not for plain objects; only objects of a specific (custom) class which is implemented in C. This custom class would need to use a class-wide getter and setter, and use the private storage slot.  The private slot would contain a mutex (e.g. pointer to pthread_mutex_t)  The getters and setters would need to hold the mutex before accessing the object, and the JS programmers wanting to do multi-property updates would need to access lock()/unlock() methods which work on that private mutex.  I'll bet you can do something equivalent in Rhino.

shared-difficult - Every worker has a separate global, postMessage only carries strings, block everything possible, but concede that code may access shared data in the JVM for Rhino users.

Totally implementable, don't know why you need to block anything, strings are immutable upon emergence from the constructor, which is where they are first exposed thread races.
 
shared-easy - Create threads that share everything. But provide event loops to allow users to have a nice share-nothing style mechanism for avoiding sharing if desired.

This is (relatively) easy to implement. I have implemented this in SpiderMonkey, although I have not written any fancy event loops.

Incidentally -- the state of the SpiderMonkey Threading Union -- bugs post 1.7 were introduced which made thread which shared more than Atoms buggy.  That is mostly fixed now, although Arrays still can't be shared between threads; also, E4X and generator/iterators have thread safety issues.

Something worthy of notice with my Thread implementation -- the thread handle can be used for communication between the thread creator and the thread itself, e.g.

function myfunc()
{
  print("hello");

  this.myStatus = "world";

}

var t = new Thread(myfunc);
t.start();
t.join();
print(t.myStatus);

(program output is hello<newline>world)
 
shared-something-in-between - Perhaps separate globals, but allow posting non-string objects across workers?

This is something like what my Thread class implements as well -- only the primordial thread has the "real" global object: the other threads run with their scopes set to a child of the real global. Exactly how modules run.

Completely separate globals might not be possible for non-Atom data interchange. The reason I make this claim is that prototypes for those objects would be on the "real" global. I've also tried to implement this in Spidermonkey without success. :)

As one "solution", have we considered the co-routine-ish solution employed by the browser?  Basically, when an event happens (e.g. setTimeout()), the JS engine executes an asynchronous callback  (I think of them as non-maskable interrupts) in other JS code, and jumps back when that other piece of code has run.

While THAT solution has obvious drawbacks -- it's not multithreaded! -- it may be worth considering because it allows some async stuff, has no thread-safety issues, and can be implemented on all the JS engines, not just SpiderMonkey and Rhino.  If you were careful, you might be able to get performance similar to WIN16-style multitasking.

Wes

Kris Zyp

unread,
Sep 24, 2009, 8:43:08 AM9/24/09
to comm...@googlegroups.com

Wes Garland wrote:
> Kris;
>
> This message is a little off-topic for this list, as I'm going to talk
> about feasibility of implementation, rather than the merits or the
> API. But I'm hoping it will provide food for thought for
> CommonJS.onTopic. :)
>
> On Thu, Sep 24, 2009 at 12:59 AM, Kris Zyp <kri...@gmail.com
> <mailto:kri...@gmail.com>> wrote:
>
>
> Still, I wonder if we should rather be stating the question as
> where on the gradient of sharing do we want to be (since blocking
> any shared-mutables may not be realistic, at least with Rhino):
>
>
> By "blocking shared-mutables", do mean blocking on read of an object,
> when another thread has some kind of lock on it?

No, I meant preventing shared access to mutables between threads. Sorry,
poor choice of words since "blocking" has a different meaning in the
concurrency context.

Yeah, having objects that originate from multiple globals is indeed a
pain (we experience that in the browse with frames).

>
> As one "solution", have we considered the co-routine-ish solution
> employed by the browser? Basically, when an event happens (e.g.
> setTimeout()), the JS engine executes an asynchronous callback (I
> think of them as non-maskable interrupts) in other JS code, and jumps
> back when that other piece of code has run.

Thats not coroutines, setTimeout uses the event loop that we are talking
about (places the provided callback on the event queue). The only form
of coroutines in JS are generators.

>
> While THAT solution has obvious drawbacks -- it's not multithreaded!
> -- it may be worth considering because it allows some async stuff, has
> no thread-safety issues, and can be implemented on all the JS engines,
> not just SpiderMonkey and Rhino. If you were careful, you might be
> able to get performance similar to WIN16-style multitasking.

Right, the event loops allows for asynchronity within a thread/process.
But there could be multiple thread/processes each with their own event
loops (thats what happens with workers).
Kris

Wes Garland

unread,
Sep 24, 2009, 8:57:04 AM9/24/09
to comm...@googlegroups.com
Ihab;

On Wed, Sep 23, 2009 at 2:39 PM, <ihab...@gmail.com> wrote:
Not just the *same* object -- *across* multiple objects as well. In
other words, for some subgraph S of objects, we need to be concerned
about whether a temporarily inconsistent state of this subgraph is (a)
observable; and (b) usable by other objects as a way to subvert that
subgraph's future attempts at regaining consistency.

Good point, although I consider all objects to children of global, so it really is a question of perspecitive. :)

Locking access to multiple objects for atomic update would have to be responsibility of the using programmer.  This is not a drawback, face it, Threads Are Hard.  There is no way the environment can serialize inter-object or inter-property access (I see these as nearly the same thing) for the programmer: he has to design and implement his own theory of thread safety for his application / module / etc.
 
The "worker" model offers a pretty good compromise. Each worker
processes one event at a time. While a worker is processing an event,
its internal state may be inconsistent. However, it does not serve any
other events, so this inconsistency is neither (a) observable nor (b)
subvertible. The programmer in charge of the worker should arrange to
reestablish consistency at the end of each event (even if by a Boolean
flag that says "inconsistent" and causes service requests to queue up
or be rejected).

That's interesting -- I've never used the worker model as such (recall: I've spent most of my life programming in C) -- but this is pretty much the same as the C threading pattern where the "inconsistent"  Bool is really just a mutex; if you hold the mutex, you are allowed to put the data structure in an inconsistent state.
 
> I think the following rules may be sufficient for this:
>  - Underlying JavaScript engine provides serialized property access
>  - require() is guarded by a mutual exclusion lock across the entire JS
> runtime (all threads)
>  - modules are responsible for their own theory of thread safety for
> non-exported functions and variables, and can assume the presence of a
> level-0 concurrency module
>  - exported functions which accept non-Atom arguments must not modify those
> arguments
>  - exported functions which return non-Atoms must return new non-Atoms
>  - Boolean, Number and String are considered Atoms.

I don't know if these rules would work, but they do seem to suggest
that each *module* lives in its own threading realm of sorts, exposing
a simplified interface to other modules.

It implies that each module has, internally, it's own theory of thread safety. The module owner what does, and does not, represent internal inconsistent state and he can manage it.  A clever module owner might even write a module which requires no synchronization primitives. :)
 
The worker model keeps the
two concerns largely independent: multiple modules may live within one
worker and communicate in a simple, synchronous manner, or they may
spawn other workers and communicate asynchronously with these workers.

This is not really different from what I've proposed, I think.

What happens in the worker model when two threads want to load the same module? I see three possibilities without my rules:

1. You break the original promise of module singletons, or
2. You lock access to module exports while they are in use by another thread (how? JS programmer remembers?)
3. You hope like hell the module is thread safe

I've proposed try to arrange it so that in #3 you don't need to hope.  Everything on the module's exports is immutable, and everything being sent to the module is unchanging.

Note that I just realized that my rules have a hole in them: 
>  - exported functions which accept non-Atom arguments must not modify those
> arguments

This is not strong enough in the general-threads case.

Thanks to your explanation, I have now considered the case of worker threads calling into singleton modules. The rules I've proposed above "fall away" given that no objects are shared across threads, except this one:


>  - modules are responsible for their own theory of thread safety for
> non-exported functions and variables, and can assume the presence of a
> level-0 concurrency module

...which leads to an interesting question, in a shared-nothing environment, is that "nothing" also module scope?  If so, you can't have stateful modules, or you can't have module singletons.

Wes
 

Kris Zyp

unread,
Sep 24, 2009, 9:07:07 AM9/24/09
to comm...@googlegroups.com

Wes Garland wrote:
>
>
> What happens in the worker model when two threads want to load the
> same module? I see three possibilities without my rules:
>
> 1. You break the original promise of module singletons, or

With a shared nothing/hard/little approach, modules are loaded once for
each different worker. This doesn't break any promises that we have
made, it is always been known that separate processes would each load
their own modules. The promise we made was that it would be
non-observable; code would never have a reference to two different
module exports.

>
> ...which leads to an interesting question, in a shared-nothing
> environment, is that "nothing" also module scope?

Yes.


> If so, you can't have stateful modules, or you can't have module
> singletons.

Modules can't have their own state that is shared across all workers.
Rather shared state must be emulated through messaging to a central
shared worker.
Kris

Wes Garland

unread,
Sep 24, 2009, 9:52:05 AM9/24/09
to comm...@googlegroups.com
Kris:

Interesting points, all of which make workers much more implementable.

They definitely put more requirements on the implementation of require, however, at least when require isn't written in JavaScript.

It means that require() needs to track exports and module scope on a thread-by-thread basis, presumably storing pointers to them in thread local storage or similar.

"Similar" because TLS won't work if the implementation pools M:N threads:contexts.  I guess I actually mean "worker-by-worker" basis; 1:1 worker:context seems right.

So, if I understand correctly, we can revise the require() semantics to formally say "per-context singletons" and most of the MT issues go away with workers, even from the native engine POV? That's attractive right there.

Donny Viszneki

unread,
Sep 24, 2009, 12:25:00 PM9/24/09
to comm...@googlegroups.com
On Wed, Sep 23, 2009 at 11:02 PM, Kris Zyp <kri...@gmail.com> wrote:
> Using shared memory with locks:
>
> var hits = 0;
> var lock = new (require("concurrency")).Lock();
> function incrementAndGetHits(){
> lock.lock();
> hits++;
> lock.unlock();
> return hits;
> }
> exports.app = function(env){
> return {
> status:200,
> headers:{"content-type":"text/plain"},
> body: "This page has been hit " + incrementAndGetHits() + " times";
> };
> }
>
> And here is the same thing using shared nothing with HTML5 workers:
> [snip]

> Is there an easier way to do this with workers?

On Thu, Sep 24, 2009 at 12:59 AM, Kris Zyp <kri...@gmail.com> wrote:
> I
> believe it would be pretty straightforward to write a JSON-RPC library
> on top of workers that would handle posting a message, waiting for
> response, returning a promise, and fulfilling the promise.

> [snip]


> And that doesn't look too bad at all.

As said earlier in this thread, Workers are isolated threads joined by
synchronized messaging queues. RPC to my mind looks about the same as
using those queues for synchronizing. In fact you could do RPC over
those queues. On top of this, you can build mutex and shared locks,
and then your mutex-locked hit-counter example applies to HTML5
Workers.

--
http://codebad.com/

Donny Viszneki

unread,
Sep 24, 2009, 12:26:09 PM9/24/09
to comm...@googlegroups.com
On Thu, Sep 24, 2009 at 12:25 PM, Donny Viszneki
<donny.v...@gmail.com> wrote:
> RPC to my mind looks about the same as
> using those queues for synchronizing. In fact you could do RPC over
> those queues. On top of this, you can build mutex and shared locks,
> and then your mutex-locked hit-counter example applies to HTML5
> Workers.

Or put another way, both RPC and the messaging queues in HTML5 Workers
provide the primitive synchronization needed for the typical
synchronized programming scenario.

--
http://codebad.com/

Mike Wilson

unread,
Sep 25, 2009, 5:55:16 AM9/25/09
to comm...@googlegroups.com
I can just say that you are touching on very interesting
subjects here, that I hope will provide a lot of revelations
on how to best do things in the web stack. Keep up the good
work!

Best regards
Mike

Kris Zyp

unread,
Sep 25, 2009, 8:39:07 AM9/25/09
to comm...@googlegroups.com

Not completely, asynchronous messaging can't be encapsulated into a sync
function (that doesn't require a callback or promise) that can provide
locking. It is impossible to use workers to build something that looks
like a synchronously accessible object, or a synchronous mutex. It might
be more accurate to say that mutexes are one of the primitives that can
be used to build workers (for synchronizing the message queue).
Kris

Kris Zyp

unread,
Sep 25, 2009, 9:59:22 AM9/25/09
to comm...@googlegroups.com
I wanted to throw out a proposal based on our discussions to see if it
we might have something to come to more consensus about:

Event-loop concurrency with shared-only-explicitly
* CommonJS will utilize HTML5 style workers with event-loops to isolate
data from other concurrent processes. Each worker will have its own
global environment and its own set of modules (singleton per worker).
Modules will therefore not need thread-safe coding.
* CommonJS standard modules will not guarantee any mutex or other
thread-safe techniques to access or modify data, unless they explicitly
advertise such capability for their particular function. Users should
not expect any other modules to be thread-safe, unless explicitly
advertised. Modules used in workers that access data accessible only
from a single worker/thread will execute deterministically.
* CommonJS will provide a module for creating and accessing shared data
that spans workers. Anyone that uses this shared data must do so with
caution, realizing that passing this to other modules may result in
non-deterministic behavior.

This is based on:
* Event-loops are familiar from browser-based JavaScript, they have been
demonstrating to be powerful form of concurrency that is makes it
relatively simple to avoid race conditions and deadlocks, and preserving
this paradigm is useful.
* CommonJS is intended to have a fairly large scope of potential users.
Different jobs require different tools, so forcing a single concurrency
on everyone may not be best in our situation. In fact, there is already
existing JavaScript written on multiple threads with shared state, and
will continue to be, regardless of what we say or do here. Defining APIs
(and practices) for modules is in the scope of CommonJS. Eliminating
every piece of JS code in the world that uses shared-state multithreaded
is not.
* Modules need to have some expectations of concurrency to be properly
written.

The following modules and APIs will be defined:

event-queue - This module will be an event-queue which provides a
thread-safe "enqueue" function (generally called internally from
different threads) and a "next" or "take" function that returns the next
event in the queue, or blocks until one is available. The event-queue
may also have an "isIdle" function, and "onready" event that can be used
by construct worker pools to delegate tasks to (especially for JSGI
servers). Entering the event loop is as simple as creating a loop that
gets the next event and executes it.

worker - This module will provide a constructor for Worker and
SharedWorker that follow the HTML5 specification, providing a
postMessage function and an onmessage event handling. Each worker can
execute concurrently and will have its own global environment and set of
modules. Each worker, once spawned will execute the provided script and
then enter the event loop, which is just an infinite loop of getting the
next event from the queue and executed it. When a worker is constructed,
the initiating code may (optional) also explicitly pass (to the
constructor or to "provideShared()" function) a shared object. This
object will be accessible to the worker through the "getShared()"
function on the worker module (the worker's instance of the module). The
worker module will also provide a postData(object) function which will
be sugar for postMessage(JSON.stringify(object)); with automatic
JSON.parse(message) to fill the event.data property in the onmessage
listeners (noting that it would make it very easy for implementations to
provide much faster processing than actually doing a JSON
serialization/deserialization).

concurrency - This module will provide mutex/locking in order to be able
to safely interact with a shared object or any other mutable data is
accessible to multiples threads through means outside of that defined in
CommonJS (for example, if an environment provided a thread constructor).

JSGI will then explicitly require that all concurrent request handling
be performed by workers. JSGI will provide a shared object (initially
empty) to those workers. Other than the shared object, workers
environments should follow the standard isolation (different globals and
module instances).

Also, it might be worth adding a module that would create a new global
and new loader for modules (creating new singletons in the global).
Between concurrency (if it had mutex as well wait/notify type of
synchronization) one could probably build the worker and event-queue
modules entirely in JavaScript, and native bindings would only be
required for concurrency and the global creator modules.

An RPC module would also be useful, but I wanted to start low-level.

Anyway, I hope this proposal strikes a good balance of defaulting and
encouraging users towards event-loop concurrency, while still allowing
for shared-state semantics when necessary.

Kris

Wes Garland

unread,
Sep 25, 2009, 10:40:56 AM9/25/09
to comm...@googlegroups.com
Kris:

Thanks for taking the time to draw what I see as an excellent road map. 

The only suggestion in your message which I am not in full agreement with is the discussion of JSGI mechanics -- and I am not in disagreement there: I have not done the requisite thinking required to form an opinion.


> one could probably build the worker and event-queue
> modules entirely in JavaScript, and native bindings would only be
> required for concurrency and the global creator modules.

That is in fact my plan.

Adding a concurrency module for locks would give me the tools to implement either of these easily in pure JS (plus some other classes I have already), and I could back them with either threaded or setTimeout-style events.

Note that it is not strictly necessary to have native bindings for mutexes: IIRC both Dekker and Lamport's Bakery algorithms have been implemented successfully in JS with code "out there" somewhere.

Donny Viszneki

unread,
Sep 25, 2009, 7:02:17 PM9/25/09
to comm...@googlegroups.com
On Fri, Sep 25, 2009 at 8:39 AM, Kris Zyp <kri...@gmail.com> wrote:
> It is impossible to use workers to build something that looks
> like a synchronously accessible object, or a synchronous mutex. It might
> be more accurate to say that mutexes are one of the primitives that can
> be used to build workers (for synchronizing the message queue).

I see things differently.

Suppose we have two synchronized message queues, and two threads of
execution. Each thread holds the write-end of one queue, and the
read-end of the other queue. Let's define these things now with a
little pseudo-code:

CODE
var MessagesFromA = new MessageQueue;
var MessagesFromB = new MessageQueue;
var EventSource = whatever;
function ThreadA() {
/* Thread A will process events from EventSource */
for (var event in EventSource) {
/* Inform Thread B of a hit */
MessagesFromA.write("hit");
/* Wait for the number of hits */
var hits = MessagesFromB.read();
/* presumably you'd then do something with "hits" */
event.respond("You have hit me "+hits+" times.");
}
}
function ThreadB() {
var counter = 0;
/* Thread B will count */
for (var message in MessagesFromA) {
/* Record hits */
if (message == "hit")
counter++;
/* Report hit count */
MessagesFromB.write(counter);
}
}
/CODE

Does this clarify things? Is there some limitation of this paradigm I
don't appreciate?

--
http://codebad.com/

Neville Burnell

unread,
Sep 25, 2009, 7:10:30 PM9/25/09
to CommonJS
For those following this discussion, Lamport's Bakery algorithm in
JavaScript:

http://www.aitk.info/Bakery/Bakery.html

Kris Zyp

unread,
Sep 26, 2009, 8:28:57 AM9/26/09
to comm...@googlegroups.com

I understand now, sometimes JavaScript a lot better for communication
than English :). What you are doing is synchronous getting messages
(your read function blocks), and you can compose and encapsulate with
that. However, HTML5 workers (and E and any other inspirations, I
believe) use asynchronous messaging and do not provide direct access to
the event queue (and programmatically blocking for the next event).

I think this is further evidence that we should be exposing that the
low-level concurrency constructs (mutexes, threads, blocking thread-safe
queues, or other stuff to build them), and then building workers on top
of these capabilities, rather than just a monolithic worker API. In your
example, it would certainly be more efficient to use a mutex rather than
emulating one in such a way that forces a new thread for every mutex
(expensive). My intent with the concurrency plan I proposed was to
provide these low-levels constructs as well as workers, so developers
could use what suited their situation.

Kris


Mark Miller

unread,
Sep 26, 2009, 10:34:48 AM9/26/09
to comm...@googlegroups.com
On Sat, Sep 26, 2009 at 5:28 AM, Kris Zyp <kri...@gmail.com> wrote:
> I understand now, sometimes JavaScript a lot better for communication
> than English :). What you are doing is synchronous getting messages
> (your read function blocks), and you can compose and encapsulate with
> that. However, HTML5 workers (and E and any other inspirations, I
> believe) use asynchronous messaging and do not provide direct access to
> the event queue (and programmatically blocking for the next event).
>
> I think this is further evidence that we should be exposing that the
> low-level concurrency constructs (mutexes, threads, blocking thread-safe
> queues, or other stuff to build them), and then building workers on top
> of these capabilities, rather than just a monolithic worker API. In your
> example, it would certainly be more efficient to use a mutex rather than
> emulating one in such a way that forces a new thread for every mutex
> (expensive). My intent with the concurrency plan I proposed was to
> provide these low-levels constructs as well as workers, so developers
> could use what suited their situation.


This argument is only evidence if one accepts that emulating mutexes
across vats is desirable. There is a growing literature explaining
just how awful this concurrency paradigm is.
(<http://www.erights.org/talks/promises/paper/tgc05.pdf> is my own
contribution, adapted into Part 3 of
<http://erights.org/talks/thesis/>, which is more complete but also
longer.)

Not only has JavaScript not yet made the threading mistake (which Java
will probably never recover from), it is already heading in the right
direction. Asynchronously communicating event loops is a great
concurrency model, is already the model that JavaScript is investing
in, and is the only one possible for JavaScript in the browser.

--
Text by me above is hereby placed in the public domain

Cheers,
--MarkM

Message has been deleted

sleepnova

unread,
Sep 26, 2009, 11:56:08 AM9/26/09
to CommonJS
Just to show some prior art.

Concurrent JavaScript - http://osteele.com/sources/javascript/concurrent
Erlang style concurrency in Java -
http://codemonkeyism.com/want-erlang-concurrency-but-are-stuck-with-java-4-alternatives
Concurrent Programming in Clojure - http://clojure.org/concurrent_programming
Erlang-style concurrency with JavaScript 1.7 - http://www.beatniksoftware.com/blog/?p=80

Sleepnova

ihab...@gmail.com

unread,
Sep 26, 2009, 12:56:30 PM9/26/09
to comm...@googlegroups.com, kpr...@mac.com
So ...

On Wed, Sep 23, 2009 at 9:33 PM, <ihab...@gmail.com> wrote:
> I've fired off a thread to e-l...@eros-os.org ...

For those of us not on e-lang, here is Kevin Reid's solution:

http://www.eros-os.org/pipermail/e-lang/2009-September/013274.html

Interestingly, the main cool trick he used is the "pass by
construction" ability which simplifies the passing of live objects
(including far references *back* to the originating vat - note how the
object can refer back to 'incrementAndGetHits', as a far reference,
*after* it has migrated).

None of this is fundamentally unachievable in CommonJS (given 'eval'
and a data channel), but it requires a rich inter-vat RPC library. As
I mentioned, Kevin Reid has built a capability RPC library for Caja,
here:

http://code.google.com/p/caja-captp/

and, as I mentioned, it's not clear to me yet how much of this can be
built on top of pure JS without Caja.

Tom Robinson

unread,
Sep 26, 2009, 1:00:40 PM9/26/09
to comm...@googlegroups.com

On Sep 26, 2009, at 5:28 AM, Kris Zyp wrote:

> I understand now, sometimes JavaScript a lot better for communication
> than English :). What you are doing is synchronous getting messages
> (your read function blocks), and you can compose and encapsulate with
> that. However, HTML5 workers (and E and any other inspirations, I
> believe) use asynchronous messaging and do not provide direct access
> to
> the event queue (and programmatically blocking for the next event).
>
> I think this is further evidence that we should be exposing that the
> low-level concurrency constructs (mutexes, threads, blocking thread-
> safe
> queues, or other stuff to build them), and then building workers on
> top
> of these capabilities, rather than just a monolithic worker API. In
> your
> example, it would certainly be more efficient to use a mutex rather
> than
> emulating one in such a way that forces a new thread for every mutex
> (expensive). My intent with the concurrency plan I proposed was to
> provide these low-levels constructs as well as workers, so developers
> could use what suited their situation.
>
> Kris

FWIW, V8 does not support any sort of multithreading (apparently even
with completely independent contexts within the same process)

http://groups.google.com/group/v8-users/browse_thread/thread/5b30d0a7907f22aa
http://groups.google.com/group/v8-users/browse_thread/thread/451dd45e48f83c1c

In fact, if you fire up Chrome and run a web worker demo you'll see
each worker show up as a separate process.


-tom

sleepnova

unread,
Sep 26, 2009, 1:26:12 PM9/26/09
to CommonJS
It seems function chaining is a good way to explicit specify the
dependency between tasks.
In some sense, it's all about dependencies.
An extreme design of explicit dependency management lead to an event
driven architecture.

On Sep 22, 11:16 pm, Kris Zyp <kris...@gmail.com> wrote:
> Ryan Dahl wrote:
>
> > Does seem rather convoluted, doesn't it? I would still rather see
> > something like:
>
> >   server.addListener("request", function (request, response) {
> >       var resultPromise = doAsyncDatabase("select * from table where id = 3");
> >       resultPromise.addCallback(function (resultSet) {
> >            if (resultSet.length == 0) {
> >                response.sendHeader(404, {});
> >            } else {
> >                response.sendHeader(200, {});
> >                response.sendBody(JSON.stringify(resultSet));
> >            }
> >            response.finish();
> >       });
> >   });
>
> Another thing we could consider is allowing promises to be returned
> directly from the JSGI app, rather than just from the forEach call as I
> demonstrated. Returning a promise from a JSGI app would look like:
>
> app = function databaseApp(env){
>    return doAsyncDatabase("select * from table where id = 3").
>         then(function(resultSet){
>             if(resultSet.length == 0){
>                return {status: 404, headers:{}, body:""};
>             }
>             else{
>                return {
>                 status: 200,
>                 headers:{},
>                 body:JSON.stringify(resultSet[0])
>                };
>             }
>          });
>
> };
>
> That reads a lot nicer, but it would put extra burden on servers and
> middleware to have to look for promises at two different places (you
> still need promises from forEach in order to facilitate streamed responses).
> Kris

Kris Zyp

unread,
Sep 26, 2009, 11:55:44 PM9/26/09
to comm...@googlegroups.com

Its worth a lot! The fact that running V8 in multiple threads will put
the VM in a state of "likely to crash fairly soon", means that is simply
impossible to write shared state multithreaded code across all
platforms. It would seem that the only type of concurrency that CommonJS
can standardize support for initiating is shared-nothing (and of course
we would use event loops). If JSGI is to be something that can truly be
implemented on all engines, it would seem it must be specified as having
shared-nothing concurrency (sans intentional work-arounds by users, like
going directly to the JVM, as one should still be able to use JVM
threads to implement this).

If threading is platform specific, than concurrency constructs should be
platform specific (they certainly already exist in Rhino), and not a
part of CommonJS. Providing shared data through workers (as I had
originally suggested) should also not be available. However, providing
real access to event queues (which are used to build workers) through an
event-queue module could still be available. In particular, it may be
helpful to allow for construction of multiple event-queues (within a
single thread/process/vat) and permit associating event-queues with
receiving messages for specific worker ports. One could then
programmatically block on waiting for an event (as one typically is
doing in a normal event-loop while idle), in order to achieve
synchronous execution, like Donny pointed out. I believe this is more
deadlock prone, but that might be a small price to pay for being able to
provide sync messaging when necessary (worker-style async message would
still be the default, encouraged mechanism).

Kris

ihab...@gmail.com

unread,
Sep 27, 2009, 12:11:39 AM9/27/09