Anyway, here is a real simple, rough start to get going, based on some
of Java's stuff. I don't really care if it changes, I just want
something to be able to write cross engine middleware that is
thread-safe (or have someone explain how I can do it with what we have
right now).
require("concurrency").Lock -> constructor for a reentrant mutual
exclusion lock.
require("concurrency").SharedLock -> constructor for a reentrant shared
lock.
require("concurrency").ThreadLocal -> constructor for a thread-local
variable.
var lock = new (require("concurrency").Lock)();
lock.lock() -> aquires a lock
lock.unlock() -> attempts to releases a lock
lock.tryLock() -> try to aquire a lock
var threadLocal = new (require("concurrency").ThreadLocal)();
threadLocal.get() -> get the value for the current thread
threadLocal.get(value) -> set the value for the current thread
And we could consider something like Rhino's "sync" function.
Thanks,
Kris
require("concurrency").Lock -> constructor for a reentrant mutual
exclusion lock.
require("concurrency").SharedLock -> constructor for a reentrant shared
lock.
require("concurrency").ThreadLocal -> constructor for a thread-local
variable.
var threadLocal = new (require("concurrency").ThreadLocal)();
threadLocal.get() -> get the value for the current thread
threadLocal.get(value) -> set the value for the current thread
Wes Garland wrote:
> Hi, Kris;
>
> Writing programs using multiple threads of JavaScript, what I call "MT
> JS" and JS in a multi-threaded environment (thread-safe JS) are two
> quite different things. I know you know this, I'm just setting the
> stage for further discussions.
>
> What you're talking about is "MT JS", where two separate threads of
> JavaScript share access to the same JavaScript objects.
>
> This is possible in SpiderMonkey (caveat reador, bugs in 1.8+), I
> guess it must also be possible in Rhino? I'm not sure it is in JSCore
> or V8, would love to hear from the peanut gallery.
>
> Do you have a syntax for creating Threads from JS, or are you creating
> new threads of JS from Java and then allowing access to the objects
> "from the back"? If you're planning to create Threads from JS, I
> think you should consider the API that GPSEE uses:
> http://www.page.ca/~wes/opensource/gpsee/docs/modules/symbols/Thread.Thread.html
> <http://www.page.ca/%7Ewes/opensource/gpsee/docs/modules/symbols/Thread.Thread.html>
The immediate situation where I was thinking about this was in Jack and
Persevere where the threads are created by the servlet engine. The JS
isn't actually creating the threads, but must deal with the fact that
may be executed in multiple threads. Of course being able to actually
create the threads in JS would be good as well. Your API looks good to
me. I know Rhino's shell also has a "spawn" function (simple executes
the callback parameter in a separate thread).
>
> I don't yet have a locking API -- haven't need one for my simple
> cases, jsapi serializes property accesses which is enough. But your
> mutex API looks basic, and good. lock, unlock, tryLock. Perfecto.
>
> I think the litmus test for "is the level-level lock API good enough"
> is -- can we write code like this out of it?
>
> var myLock = new require("concurrency").Lock;
> withLock(myLock, function() { doWork(); });
>
> I believe the answer is yes:
>
> function withLock(lock, fn)
> {
> try
> {
> lock.lock();
> fn();
> }
> catch(e)
> {
> throw(e);
> }
> finally
> {
> lock.unlock();
> }
> }
>
Yup.
>
> require("concurrency").Lock -> constructor for a reentrant mutual
> exclusion lock.
> require("concurrency").SharedLock -> constructor for a reentrant
> shared
> lock.
>
>
> Why do we need shared locks, and how do you see those working? These
> are just member read locks? If so, how are updates handled? Also, I
> believe SharedLocks could be implemented in pure JS from exclusive
> locks, meaning it may not be necessary to spec these now.
There are times when shared locks (that can acquire shared/read or
exclusive/write) are helpful, but you are right, they can be implemented
with simple locks. Just thought I would throw it out there.
>
>
> require("concurrency").ThreadLocal -> constructor for a thread-local
> variable.
>
>
> Instanciating real thread local variables requires modification of the
> JavaScript interpreter. I do not see that happening. I think what
> you're going after should be called Thread Local Storage. Which is is
> totally doable.
>
> var threadLocal = new (require("concurrency").ThreadLocal)();
> threadLocal.get() -> get the value for the current thread
> threadLocal.get(value) -> set the value for the current thread
>
>
> I take it you meant "get" in that last line.
>
yeah.
Kris
On Mon, Sep 21, 2009 at 9:12 AM, Kris Zyp <kri...@gmail.com> wrote:
> I was wondering if we could consider a concurrency module to help with
> some basic concurrency constructs, in particular for working in a shared
> memory threaded environments. ... I didn't think that CommonJS was
> attempting to force a particular concurrency model (and I am not sure that
> it should either).
To the extent that individual modules need to defend their consistency
of their state, they must assume *some* concurrency model. If a given
CommonJS module can be used in both threaded and non-threaded systems,
it must have a "worst case" to fall back onto. This worst case might
on the face of it be to build an implementation, and expose an API,
that includes explicit locking and thread-aware usage patterns. Hence
leaving the question open has forced everyone into coding for threads.
Now, how will the threaded primitives be supported on a non-threaded
system?
Does this mean we end up with 3 "flavors" of modules: (a)
non-threaded; (b) threaded; and (c) purely functional?
Perhaps I should ask this from another angle: JS programming has thus
far proceeded via event loop concurrency, and it seems like a
direction that is far less error-prone than _ad hoc_ threads and
locking. What is wrong with maintaining that?
Ihab
--
Ihab A.B. Awad, Palo Alto, CA
What I *do* think CommonJS should standardize is an API for building
and living within "workers", each of which is single threaded, and
which pass data to one another asynchronously. This should be similar
to Gears and HTML5 workers, and should be informed by those efforts.
Ihab
Node.js achieves high concurrency on a single event loop. Here is a
"hello world" web server benchmark I ran yesterday for a talk:
http://s3.amazonaws.com/four.livejournal/20090921/bench.png
There is no need to define extensions to javascript (threads) to write
servers - indeed, evidence suggests that doing so damns the server to
suck.
That's helma-1.6.3 using this http://helma.pastebin.com/f1d786c9 to
serve the request. (v8cgi 0.6, narwhal d147c160, node 0.1.11.) I know
this isn't fair to present benchmarks this way, without scripts,
methods, or data. I'm just being lazy. I did a more detailed benchmark
a few weeks ago (without helma)
http://four.livejournal.com/1019177.html and the setup is similar.
The talk was just a short lightening presentation:
http://s3.amazonaws.com/four.livejournal/20090922/webmontag.pdf
JSGI compliant - probably not. At least partially CommonJS compliant -
probably. At the moment I'm letting node evolve on its own.
> It does not look like
> there is any conflict between the JSGI API and your fundamental
> processing model. Or would one need to run something like Jack on top of
> Node to utilize cross-server applications?
I don't like that JSGI forces one to determine the response code and
response headers in a single function. One cannot for example, access
a database before returning the status code. Maybe I want to 404 if a
record does not exist. Well, one can do this if your database call
blocks execution inside the jack handler, but blocking implies
allocating a new execution stack with, if not threads, then
coroutines. Allocating a 2mb execution stack for each request is
rather an unacceptable sacrifice to be had at the API level, even
before an implementation is around to further slow down the server.
So, perhaps Node will have proper coroutines at some point (the
current implementation is rather bad). And perhaps someone will write
a JSGI interface on top of Node's more general HTTP interface, and
then people can access databases and wait() for the promise to
complete, and return 404 from the same function. This is not ruled
out.
> I was curious what you meant
> by "Narwhal". Narwhal's http module is a HTTP client, not a server
> (maybe thats why it scored low!). Or did you mean Jack? And if so, which
> engine was it running on? Having such low numbers with Simple would seem
> suspect since Simple has such a high performance characteristics with
> fairly little overhead added by Jack (or maybe there is something
> expensive in there that I missed).
The exact code being used was the same as described in
http://four.livejournal.com/1018026.html (which is a different link
than the one I gave above).
Ryan Dahl wrote:
> On Tue, Sep 22, 2009 at 3:06 PM, Kris Zyp <kri...@gmail.com> wrote:
>
>> Do you plan on making Node be JSGI compliant?
>>
>
> JSGI compliant - probably not. At least partially CommonJS compliant -
> probably. At the moment I'm letting node evolve on its own.
>
Oh, you aren't the one responsible for node?
>
>> It does not look like
>> there is any conflict between the JSGI API and your fundamental
>> processing model. Or would one need to run something like Jack on top of
>> Node to utilize cross-server applications?
>>
>
> I don't like that JSGI forces one to determine the response code and
> response headers in a single function. One cannot for example, access
> a database before returning the status code. Maybe I want to 404 if a
> record does not exist. Well, one can do this if your database call
> blocks execution inside the jack handler, but blocking implies
> allocating a new execution stack with, if not threads, then
> coroutines. Allocating a 2mb execution stack for each request is
> rather an unacceptable sacrifice to be had at the API level, even
> before an implementation is around to further slow down the server.
>
This is perfectly doable with the JSGI model plus promises that we have
discussed, as long as the JSGI server does not grab the status from the
response object until it needs to. Here is an example JSGI App with
async database interaction, completely inline with your event loop
model, no threading or coroutines needed (although I wouldn't mind some
coroutine sugar ala JS 1.7 generators, but its not necessary):
app = function databaseApp(env){
var resultPromise = doAsyncDatabase("select * from table where id = 3");
var response = {
status: 200, // the JSGI server should not write the status until
the first body write
headers:{},
body: {forEach: function(write){
return resultPromise.then(function(resultSet){
if(resultSet.length == 0){
response.status = 404; // change the status before the
forEach promise is fulfilled
}
else{
write(JSON.stringify(resultSet[0]));
}
});
}}
};
return response;
};
> So, perhaps Node will have proper coroutines at some point (the
> current implementation is rather bad). And perhaps someone will write
> a JSGI interface on top of Node's more general HTTP interface, and
> then people can access databases and wait() for the promise to
> complete, and return 404 from the same function. This is not ruled
> out.
>
Yeah, I guess we could do that, although hopefully it is apparent that
JSGI + promises API would not actually hinder Node (and it would be cool
to be able to run JSGI apps directly on it).
Kris
Okay - I stand corrected. That's an important feature!
> app = function databaseApp(env){
> var resultPromise = doAsyncDatabase("select * from table where id = 3");
> var response = {
> status: 200, // the JSGI server should not write the status until
> the first body write
> headers:{},
> body: {forEach: function(write){
> return resultPromise.then(function(resultSet){
> if(resultSet.length == 0){
> response.status = 404; // change the status before the
> forEach promise is fulfilled
> }
> else{
> write(JSON.stringify(resultSet[0]));
> }
> });
> }}
> };
> return response;
> };
Does seem rather convoluted, doesn't it? I would still rather see
something like:
server.addListener("request", function (request, response) {
var resultPromise = doAsyncDatabase("select * from table where id = 3");
resultPromise.addCallback(function (resultSet) {
if (resultSet.length == 0) {
response.sendHeader(404, {});
} else {
response.sendHeader(200, {});
response.sendBody(JSON.stringify(resultSet));
}
response.finish();
});
});
I agree that the JSGI/promise approach is a more awkward than your
example, but I think think the philosophy is that JSGI allows the simple
case (of a basic sync function) to be very simple, and you can deal with
extra complexity as needed. In that sense, I think JSGI strikes a great
balance, making the common use cases very easy to code while still
keeping it possible to do more advance async processing.
Kris
Ryan Dahl wrote:
>
> Does seem rather convoluted, doesn't it? I would still rather see
> something like:
>
> server.addListener("request", function (request, response) {
> var resultPromise = doAsyncDatabase("select * from table where id = 3");
> resultPromise.addCallback(function (resultSet) {
> if (resultSet.length == 0) {
> response.sendHeader(404, {});
> } else {
> response.sendHeader(200, {});
> response.sendBody(JSON.stringify(resultSet));
> }
> response.finish();
> });
> });
>
Another thing we could consider is allowing promises to be returned
directly from the JSGI app, rather than just from the forEach call as I
demonstrated. Returning a promise from a JSGI app would look like:
app = function databaseApp(env){
return doAsyncDatabase("select * from table where id = 3").
then(function(resultSet){
if(resultSet.length == 0){
return {status: 404, headers:{}, body:""};
}
else{
return {
status: 200,
headers:{},
body:JSON.stringify(resultSet[0])
};
}
});
};
That reads a lot nicer, but it would put extra burden on servers and
middleware to have to look for promises at two different places (you
still need promises from forEach in order to facilitate streamed responses).
Kris
Does seem rather convoluted, doesn't it? I would still rather see
something like:
server.addListener("request", function (request, response) {
var resultPromise = doAsyncDatabase("select * from table where id = 3");resultPromise.addCallback(function (resultSet) {
if (resultSet.length == 0) {
response.sendHeader(404, {});
} else {
response.sendHeader(200, {});
response.sendBody(JSON.stringify(resultSet));
}
response.finish();
});
});
That reads a lot nicer, but it would put extra burden on servers and
middleware to have to look for promises at two different places (you
still need promises from forEach in order to facilitate streamed responses).
Yes, this is what I also was thinking about in a discussion a
couple of weeks ago when we touched on threading/runtimes.
Theoretically, I'm all for the shared-nothing approach and
would like to see that materialized in CommonJS. Though, I
haven't had any personal revelation on how this would be
used in a web server that wants simultaneous requests to
share data "from RAM", ie application or session data, in
an unobtrusive way...
Other script languages often use the database for this data
sharing, but it sure would be nice to have a solution that
scales to "live" objects as well.
Best regards
Mike Wilson
The "live" objects could live in a shared worker, and each worker
servicing an independent HTTP request could send that worker
asynchronous messages. That shared worker now behaves exactly like the
"database" you describe.
Assuming that a newly created thread has some kind of entry point at
which point it starts executing and has its own lexical scope... what
is the purpose of thread local memory? Thread local memory is a
technology for systems programming languages.
On Mon, Sep 21, 2009 at 8:07 PM, Wes Garland <w...@page.ca> wrote:
> The tricky part is maintaining atomic updates across multiple properties of
> the same object.
Not just the *same* object -- *across* multiple objects as well. In
other words, for some subgraph S of objects, we need to be concerned
about whether a temporarily inconsistent state of this subgraph is (a)
observable; and (b) usable by other objects as a way to subvert that
subgraph's future attempts at regaining consistency.
The "worker" model offers a pretty good compromise. Each worker
processes one event at a time. While a worker is processing an event,
its internal state may be inconsistent. However, it does not serve any
other events, so this inconsistency is neither (a) observable nor (b)
subvertible. The programmer in charge of the worker should arrange to
reestablish consistency at the end of each event (even if by a Boolean
flag that says "inconsistent" and causes service requests to queue up
or be rejected).
> I think the following rules may be sufficient for this:
> - Underlying JavaScript engine provides serialized property access
> - require() is guarded by a mutual exclusion lock across the entire JS
> runtime (all threads)
> - modules are responsible for their own theory of thread safety for
> non-exported functions and variables, and can assume the presence of a
> level-0 concurrency module
> - exported functions which accept non-Atom arguments must not modify those
> arguments
> - exported functions which return non-Atoms must return new non-Atoms
> - Boolean, Number and String are considered Atoms.
I don't know if these rules would work, but they do seem to suggest
that each *module* lives in its own threading realm of sorts, exposing
a simplified interface to other modules. The worker model keeps the
two concerns largely independent: multiple modules may live within one
worker and communicate in a simple, synchronous manner, or they may
spawn other workers and communicate asynchronously with these workers.
Do you think your concern cannot be reasonably achieved if we have
Wes's concerns granted? For clarity:
Ihab seems concerned with collisions of Javascript logic.
Wes seems concerned with collisions of interpreter logic.
If you can be sure that accessing a particular variable concurrently
is safe, can't you then use that as a primitive for locking other
subsections of the object graph? I don't think the goal is to make it
impossible for a programmer to write poor code.
> The "worker" model offers a pretty good compromise. Each worker
> processes one event at a time. While a worker is processing an event,
> its internal state may be inconsistent. However, it does not serve any
> other events, so this inconsistency is neither (a) observable nor (b)
> subvertible. The programmer in charge of the worker should arrange to
> reestablish consistency at the end of each event (even if by a Boolean
> flag that says "inconsistent" and causes service requests to queue up
> or be rejected).
Sure, but can't Workers be *easily* implemented on top of the other
concurrency features others are asking for?
The Worker API is just a multi-threading API that chops up the object
graph for each Worker, such that the only things shared are basically
synchronized message queues.
p.s. if anyone wants to tell me what the mothership that launches the
first Workers is called in Worker vernacular, you win cookies
I am actually in favor of the event-loop, but I am not sure if we really
want or even can provide a total shared-nothing architecture. The most
popular platform (at least ATM) is Rhino with access to the Java
runtime. If a platform attempts to create multiple isolated
vats/mini-processes, each with their own event loop, they still
ultimately can access shared data in the JVM from different threads. The
only way to prevent that would be to block access to Java packages
(which is one of the main reasons people use Rhino is for Java
libraries, to fill in the gaps that CommonJS won't be able to provide
within the next 5 years) or run multiple Java processes (that would be
horrible).
Because of the fact that we can't realistically enforce a true
shared-nothing architecture in JS environment, I don't think we should
try to eliminate every possible form of access to shared data between
threads. I believe we would be in a better position to provide a good
event-loop based environment where developers don't really need to share
references to mutable objects across threads(they can pass async
messages between vats), but can if they really want to. The default
behavior is that modules should never need to deal with thread safety
(unless they are specifically designed for that purpose). Guidelines
should insist that mutable data that is referenced between threads
should not escape modules. In particular, JSGI needs to allow middleware
and apps to communicate with another worker process through the HTML5
worker API, such that they can communicate between concurrency without
accessing shared mutable data. Also, promises that enqueue tasks must
only be fulfilled on the initiated process.
Consequently, I believe we should still provide a *low-level*
concurrency API that defines some basic concurrency constructs. We can
then build an event loop system on top of this which could essentially
be implemented in JS using the concurrency module's functions (which
makes easier to span engines). Other modules may also use this API (with
great caution) if they need to implement something which fits better
with the threaded shared mutable data model. This allows module writers
to choose the best tool for the job.
Thoughts?
Thanks,
Kris
I've been trying to figure out how one would efficiently implement
(for example) a JSGI server that uses multiple web workers. Workers
only allow you to send messages back and forth. Would you need to
buffer and serialize the request and response, or is there some way
you could hand off the io streams to a worker?
I've been thinking through this as well. This is one of the reasons why
I suggested that we don't want a completely shared nothing architecture.
Basically, I think we want Jack to create a pool of workers, and then as
requests come in, they should be delegated to whatever worker is idle
and available (or have the request go into a pool waiting for idle
workers if none is available). Each worker has its own event loop, and
so when a request is delegated to the worker, it goes to in its event
loop to be processed. For the sake of efficiency, its important that one
can actually pass a reference to the
request/response/inputstream,outputstream to the workers, not just an
string.
I started toying with how this would work, and this is what I was thinking:
reactor.js - This would be built up to be the event loop handler (I
think that was Kris Kowal's intention with this module). It would have a
thread-safe enqueue function that could be called from any thread, and
would add a task to the event queue (in Rhino, we would probably use a
LinkedBlockingQueue). reactor would also expose an enterEventLoop()
function that would start the processing of the queue (should only be
called by the thread for the event loop).
worker.js - This would expose a Worker constructor that would start a
new thread and a new global environment for the worker, and put a
postMessage function there that would delegate to the caller's reactor
enqueue. It would startup a new reactor in the worker thread, execute
the script provided by the argument to worker and then and trigger
enterEventLoop().
The way I was thinking was that there might also be a "postEvent" that
could post any event to the worker with any data (not just a string).
The main JSGI worker (the mothership) would then be able to call
postEvent methods on each worker to send it the request object. The
worker would define an onrequest event to receive the events (this could
be implemented by a script that was passed to the worker from the Jack
main thread), which would be given the all the necessary references to
carry out the response to the request.
Does this seem like a reasonable approach?
Kris
Sending a "port" through a message channel is part of html5's spec
http://www.whatwg.org/specs/web-apps/current-work/multipage/comms.html#posting-messages
it's the analogue of sending an fd with sendmsg(). This would be a way
of distributing requests across multiple workers.
Using shared memory with locks:
var hits = 0;
var lock = new (require("concurrency")).Lock();
function incrementAndGetHits(){
lock.lock();
hits++;
lock.unlock();
return hits;
}
exports.app = function(env){
return {
status:200,
headers:{"content-type":"text/plain"},
body: "This page has been hit " + incrementAndGetHits() + " times";
};
}
And here is the same thing using shared nothing with HTML5 workers:
jsgi file:
var hitTrackerWorker = new (require("worker").SharedWorker)("hit-tracker");
var Future = require("promise").Future; // get the constructor
var waitingFutures = []; // all the requests that are waiting to be finished
// note that we can't just do this in a closure inside the app because
there may be
// multiple unfulfilled promises at once, but there is only one
onmessage handler
hitTrackerWorker.port.onmessage = function(event){
// probably easiest to use JSON for structured messages
var message = JSON.parse(event.data);
switch(message.action){
case "incrementResults":
waitingFutures.shift().fulfill(message.hits);
... all other types of response would need to register here
}
};
});
};
// this app will be created in each worker
exports.app = function(env){
// send a message to the shared worker to increment the hits and
send back the result
hitTrackerWorker.port.postMessage(JSON.stringify({
action: "incrementAndGetHits"
}));
var future = new Future();
// add it to our queue of futures that need to be fulfilled
waitingFutures.push(future);
return {
status:200,
headers:{"content-type":"text/plain"},
body: {
forEach: function(write){
// delay the response with a promise
return future.promise.then(function(hits){
write("This page has been hit " + hits + " times");
});
}
}
};
}
hit-tracker.js:
var hits = 0;
onmessage= function(event){
var message = JSON.parse(event.data);
switch(message.action){
case "incrementAndGetHits":
hits++; // this is thread-safe since it is in a single worker
// send back the results
event.port[0].postMessage(JSON.stringify({
action: "incrementResults":
hits: hits
}));
.... any other actions handled by this worker
}
}
Is there an easier way to do this with workers? This exercise makes me a
little skeptical that shared nothing architecture is really the best
tool for every need. I like having event loops available, and would
certainly use them whenever possible, but I am dubious that we should
(or could) try to eliminate shared mutable data across threads. What if
we simply associated an event loop with every thread (allowing them to
share everything), had JSGI servers distribute request across event
loops and had enqueuing operations (deferred futures, setTimeouts, etc.)
always enqueue on to the current thread's event loop so as to make it
easier for lexically private data to always remain on the same thread
and avoid race conditions?
Kris
Thanks for writing this up ... this is a really good example.
My hunch is that the HTML5 workers solution is made verbose by the
fact that the HTML5 worker API is very low-level (albeit *perhaps*
adequate to implement the sugar we need).
I've fired off a thread to e-l...@eros-os.org, and it's already shown
up in the archives, so see:
http://www.eros-os.org/mailman/listinfo/e-lang
The gist of it is to ask, what would this look like in E? E was built,
with a lot of design freedom from legacy, around this specific
architecture, so it should be easy there, rght? Let's see what they
come up with, then we can figure out what missing parts we would need
to be able to do similar stuff, and whether these missing parts are
fundamental to JS or just helper libs that someone needs to up and
write.
Again, thanks for the problem and for taking the time to write out the
solution. To my knowledge, this is the first time we've asked
ourselves this question in any detail for pure JS. Kevin Reid, a
longtime E developer, built an implementation of CapTP (the RPC
protocol that E uses), but that was on top of Caja. I am not yet well
enough versed in his work to determine how much of it could (with
sacrifice of its security properties) be adapted to pure JS; Caja
gives JS programmers some virtualization abilities that pure JS
doesn't have.
ihab...@gmail.com wrote:
> Hi Kris,
>
> Thanks for writing this up ... this is a really good example.
>
> My hunch is that the HTML5 workers solution is made verbose by the
> fact that the HTML5 worker API is very low-level (albeit *perhaps*
> adequate to implement the sugar we need).
>
I am curious what it would look like in E, but you are right, I think we
can do much better than direct interaction with a low level API. I
believe it would be pretty straightforward to write a JSON-RPC library
on top of workers that would handle posting a message, waiting for
response, returning a promise, and fulfilling the promise. Then we could
actually write (with shared nothing event loop):
var hitTrackerWorker = new (require("worker").SharedWorker)("hit-tracker", "tracker");
var hitTrackerRPC = require("rpc-worker").wrapWorker(hitTrackerWorker);
exports.app = function(env){
return {
status:200,
headers:{"content-type":"text/plain"},
body: {
forEach: function(write){
// hitTrackerRPC.call will return a promise
return hitTrackerRPC.call("incrementAndGet", []).
then(function(hits){
write("This page has been hit " + hits + " times");
});
}
}
};
}
hit-tracker.js:
var exporter = require("rpc-exporter").exporter;
var hits = 0;
exporter({
incrementAndGet: function(){
returns ++hits;
});
And that doesn't look too bad at all.
Still, I wonder if we should rather be stating the question as where on the gradient of sharing do we want to be (since blocking any shared-mutables may not be realistic, at least with Rhino):
shared-difficult - Every worker has a separate global, postMessage only carries strings, block everything possible, but concede that code may access shared data in the JVM for Rhino users.
shared-easy - Create threads that share everything. But provide event loops to allow users to have a nice share-nothing style mechanism for avoiding sharing if desired.
shared-something-in-between - Perhaps separate globals, but allow posting non-string objects across workers?
Kris
Still, I wonder if we should rather be stating the question as where on the gradient of sharing do we want to be (since blocking any shared-mutables may not be realistic, at least with Rhino):
shared-difficult - Every worker has a separate global, postMessage only carries strings, block everything possible, but concede that code may access shared data in the JVM for Rhino users.
shared-easy - Create threads that share everything. But provide event loops to allow users to have a nice share-nothing style mechanism for avoiding sharing if desired.
shared-something-in-between - Perhaps separate globals, but allow posting non-string objects across workers?
Wes Garland wrote:
> Kris;
>
> This message is a little off-topic for this list, as I'm going to talk
> about feasibility of implementation, rather than the merits or the
> API. But I'm hoping it will provide food for thought for
> CommonJS.onTopic. :)
>
> On Thu, Sep 24, 2009 at 12:59 AM, Kris Zyp <kri...@gmail.com
> <mailto:kri...@gmail.com>> wrote:
>
>
> Still, I wonder if we should rather be stating the question as
> where on the gradient of sharing do we want to be (since blocking
> any shared-mutables may not be realistic, at least with Rhino):
>
>
> By "blocking shared-mutables", do mean blocking on read of an object,
> when another thread has some kind of lock on it?
No, I meant preventing shared access to mutables between threads. Sorry,
poor choice of words since "blocking" has a different meaning in the
concurrency context.
Yeah, having objects that originate from multiple globals is indeed a
pain (we experience that in the browse with frames).
>
> As one "solution", have we considered the co-routine-ish solution
> employed by the browser? Basically, when an event happens (e.g.
> setTimeout()), the JS engine executes an asynchronous callback (I
> think of them as non-maskable interrupts) in other JS code, and jumps
> back when that other piece of code has run.
Thats not coroutines, setTimeout uses the event loop that we are talking
about (places the provided callback on the event queue). The only form
of coroutines in JS are generators.
>
> While THAT solution has obvious drawbacks -- it's not multithreaded!
> -- it may be worth considering because it allows some async stuff, has
> no thread-safety issues, and can be implemented on all the JS engines,
> not just SpiderMonkey and Rhino. If you were careful, you might be
> able to get performance similar to WIN16-style multitasking.
Right, the event loops allows for asynchronity within a thread/process.
But there could be multiple thread/processes each with their own event
loops (thats what happens with workers).
Kris
Not just the *same* object -- *across* multiple objects as well. In
other words, for some subgraph S of objects, we need to be concerned
about whether a temporarily inconsistent state of this subgraph is (a)
observable; and (b) usable by other objects as a way to subvert that
subgraph's future attempts at regaining consistency.
The "worker" model offers a pretty good compromise. Each worker
processes one event at a time. While a worker is processing an event,
its internal state may be inconsistent. However, it does not serve any
other events, so this inconsistency is neither (a) observable nor (b)
subvertible. The programmer in charge of the worker should arrange to
reestablish consistency at the end of each event (even if by a Boolean
flag that says "inconsistent" and causes service requests to queue up
or be rejected).
> I think the following rules may be sufficient for this:I don't know if these rules would work, but they do seem to suggest
> - Underlying JavaScript engine provides serialized property access
> - require() is guarded by a mutual exclusion lock across the entire JS
> runtime (all threads)
> - modules are responsible for their own theory of thread safety for
> non-exported functions and variables, and can assume the presence of a
> level-0 concurrency module
> - exported functions which accept non-Atom arguments must not modify those
> arguments
> - exported functions which return non-Atoms must return new non-Atoms
> - Boolean, Number and String are considered Atoms.
that each *module* lives in its own threading realm of sorts, exposing
a simplified interface to other modules.
The worker model keeps the
two concerns largely independent: multiple modules may live within one
worker and communicate in a simple, synchronous manner, or they may
spawn other workers and communicate asynchronously with these workers.
Wes Garland wrote:
>
>
> What happens in the worker model when two threads want to load the
> same module? I see three possibilities without my rules:
>
> 1. You break the original promise of module singletons, or
With a shared nothing/hard/little approach, modules are loaded once for
each different worker. This doesn't break any promises that we have
made, it is always been known that separate processes would each load
their own modules. The promise we made was that it would be
non-observable; code would never have a reference to two different
module exports.
>
> ...which leads to an interesting question, in a shared-nothing
> environment, is that "nothing" also module scope?
Yes.
> If so, you can't have stateful modules, or you can't have module
> singletons.
Modules can't have their own state that is shared across all workers.
Rather shared state must be emulated through messaging to a central
shared worker.
Kris
On Thu, Sep 24, 2009 at 12:59 AM, Kris Zyp <kri...@gmail.com> wrote:
> I
> believe it would be pretty straightforward to write a JSON-RPC library
> on top of workers that would handle posting a message, waiting for
> response, returning a promise, and fulfilling the promise.
> [snip]
> And that doesn't look too bad at all.
As said earlier in this thread, Workers are isolated threads joined by
synchronized messaging queues. RPC to my mind looks about the same as
using those queues for synchronizing. In fact you could do RPC over
those queues. On top of this, you can build mutex and shared locks,
and then your mutex-locked hit-counter example applies to HTML5
Workers.
Or put another way, both RPC and the messaging queues in HTML5 Workers
provide the primitive synchronization needed for the typical
synchronized programming scenario.
Not completely, asynchronous messaging can't be encapsulated into a sync
function (that doesn't require a callback or promise) that can provide
locking. It is impossible to use workers to build something that looks
like a synchronously accessible object, or a synchronous mutex. It might
be more accurate to say that mutexes are one of the primitives that can
be used to build workers (for synchronizing the message queue).
Kris
Event-loop concurrency with shared-only-explicitly
* CommonJS will utilize HTML5 style workers with event-loops to isolate
data from other concurrent processes. Each worker will have its own
global environment and its own set of modules (singleton per worker).
Modules will therefore not need thread-safe coding.
* CommonJS standard modules will not guarantee any mutex or other
thread-safe techniques to access or modify data, unless they explicitly
advertise such capability for their particular function. Users should
not expect any other modules to be thread-safe, unless explicitly
advertised. Modules used in workers that access data accessible only
from a single worker/thread will execute deterministically.
* CommonJS will provide a module for creating and accessing shared data
that spans workers. Anyone that uses this shared data must do so with
caution, realizing that passing this to other modules may result in
non-deterministic behavior.
This is based on:
* Event-loops are familiar from browser-based JavaScript, they have been
demonstrating to be powerful form of concurrency that is makes it
relatively simple to avoid race conditions and deadlocks, and preserving
this paradigm is useful.
* CommonJS is intended to have a fairly large scope of potential users.
Different jobs require different tools, so forcing a single concurrency
on everyone may not be best in our situation. In fact, there is already
existing JavaScript written on multiple threads with shared state, and
will continue to be, regardless of what we say or do here. Defining APIs
(and practices) for modules is in the scope of CommonJS. Eliminating
every piece of JS code in the world that uses shared-state multithreaded
is not.
* Modules need to have some expectations of concurrency to be properly
written.
The following modules and APIs will be defined:
event-queue - This module will be an event-queue which provides a
thread-safe "enqueue" function (generally called internally from
different threads) and a "next" or "take" function that returns the next
event in the queue, or blocks until one is available. The event-queue
may also have an "isIdle" function, and "onready" event that can be used
by construct worker pools to delegate tasks to (especially for JSGI
servers). Entering the event loop is as simple as creating a loop that
gets the next event and executes it.
worker - This module will provide a constructor for Worker and
SharedWorker that follow the HTML5 specification, providing a
postMessage function and an onmessage event handling. Each worker can
execute concurrently and will have its own global environment and set of
modules. Each worker, once spawned will execute the provided script and
then enter the event loop, which is just an infinite loop of getting the
next event from the queue and executed it. When a worker is constructed,
the initiating code may (optional) also explicitly pass (to the
constructor or to "provideShared()" function) a shared object. This
object will be accessible to the worker through the "getShared()"
function on the worker module (the worker's instance of the module). The
worker module will also provide a postData(object) function which will
be sugar for postMessage(JSON.stringify(object)); with automatic
JSON.parse(message) to fill the event.data property in the onmessage
listeners (noting that it would make it very easy for implementations to
provide much faster processing than actually doing a JSON
serialization/deserialization).
concurrency - This module will provide mutex/locking in order to be able
to safely interact with a shared object or any other mutable data is
accessible to multiples threads through means outside of that defined in
CommonJS (for example, if an environment provided a thread constructor).
JSGI will then explicitly require that all concurrent request handling
be performed by workers. JSGI will provide a shared object (initially
empty) to those workers. Other than the shared object, workers
environments should follow the standard isolation (different globals and
module instances).
Also, it might be worth adding a module that would create a new global
and new loader for modules (creating new singletons in the global).
Between concurrency (if it had mutex as well wait/notify type of
synchronization) one could probably build the worker and event-queue
modules entirely in JavaScript, and native bindings would only be
required for concurrency and the global creator modules.
An RPC module would also be useful, but I wanted to start low-level.
Anyway, I hope this proposal strikes a good balance of defaulting and
encouraging users towards event-loop concurrency, while still allowing
for shared-state semantics when necessary.
Kris
I see things differently.
Suppose we have two synchronized message queues, and two threads of
execution. Each thread holds the write-end of one queue, and the
read-end of the other queue. Let's define these things now with a
little pseudo-code:
CODE
var MessagesFromA = new MessageQueue;
var MessagesFromB = new MessageQueue;
var EventSource = whatever;
function ThreadA() {
/* Thread A will process events from EventSource */
for (var event in EventSource) {
/* Inform Thread B of a hit */
MessagesFromA.write("hit");
/* Wait for the number of hits */
var hits = MessagesFromB.read();
/* presumably you'd then do something with "hits" */
event.respond("You have hit me "+hits+" times.");
}
}
function ThreadB() {
var counter = 0;
/* Thread B will count */
for (var message in MessagesFromA) {
/* Record hits */
if (message == "hit")
counter++;
/* Report hit count */
MessagesFromB.write(counter);
}
}
/CODE
Does this clarify things? Is there some limitation of this paradigm I
don't appreciate?
I understand now, sometimes JavaScript a lot better for communication
than English :). What you are doing is synchronous getting messages
(your read function blocks), and you can compose and encapsulate with
that. However, HTML5 workers (and E and any other inspirations, I
believe) use asynchronous messaging and do not provide direct access to
the event queue (and programmatically blocking for the next event).
I think this is further evidence that we should be exposing that the
low-level concurrency constructs (mutexes, threads, blocking thread-safe
queues, or other stuff to build them), and then building workers on top
of these capabilities, rather than just a monolithic worker API. In your
example, it would certainly be more efficient to use a mutex rather than
emulating one in such a way that forces a new thread for every mutex
(expensive). My intent with the concurrency plan I proposed was to
provide these low-levels constructs as well as workers, so developers
could use what suited their situation.
Kris
This argument is only evidence if one accepts that emulating mutexes
across vats is desirable. There is a growing literature explaining
just how awful this concurrency paradigm is.
(<http://www.erights.org/talks/promises/paper/tgc05.pdf> is my own
contribution, adapted into Part 3 of
<http://erights.org/talks/thesis/>, which is more complete but also
longer.)
Not only has JavaScript not yet made the threading mistake (which Java
will probably never recover from), it is already heading in the right
direction. Asynchronously communicating event loops is a great
concurrency model, is already the model that JavaScript is investing
in, and is the only one possible for JavaScript in the browser.
--
Text by me above is hereby placed in the public domain
Cheers,
--MarkM
On Wed, Sep 23, 2009 at 9:33 PM, <ihab...@gmail.com> wrote:
> I've fired off a thread to e-l...@eros-os.org ...
For those of us not on e-lang, here is Kevin Reid's solution:
http://www.eros-os.org/pipermail/e-lang/2009-September/013274.html
Interestingly, the main cool trick he used is the "pass by
construction" ability which simplifies the passing of live objects
(including far references *back* to the originating vat - note how the
object can refer back to 'incrementAndGetHits', as a far reference,
*after* it has migrated).
None of this is fundamentally unachievable in CommonJS (given 'eval'
and a data channel), but it requires a rich inter-vat RPC library. As
I mentioned, Kevin Reid has built a capability RPC library for Caja,
here:
http://code.google.com/p/caja-captp/
and, as I mentioned, it's not clear to me yet how much of this can be
built on top of pure JS without Caja.
> I understand now, sometimes JavaScript a lot better for communication
> than English :). What you are doing is synchronous getting messages
> (your read function blocks), and you can compose and encapsulate with
> that. However, HTML5 workers (and E and any other inspirations, I
> believe) use asynchronous messaging and do not provide direct access
> to
> the event queue (and programmatically blocking for the next event).
>
> I think this is further evidence that we should be exposing that the
> low-level concurrency constructs (mutexes, threads, blocking thread-
> safe
> queues, or other stuff to build them), and then building workers on
> top
> of these capabilities, rather than just a monolithic worker API. In
> your
> example, it would certainly be more efficient to use a mutex rather
> than
> emulating one in such a way that forces a new thread for every mutex
> (expensive). My intent with the concurrency plan I proposed was to
> provide these low-levels constructs as well as workers, so developers
> could use what suited their situation.
>
> Kris
FWIW, V8 does not support any sort of multithreading (apparently even
with completely independent contexts within the same process)
http://groups.google.com/group/v8-users/browse_thread/thread/5b30d0a7907f22aa
http://groups.google.com/group/v8-users/browse_thread/thread/451dd45e48f83c1c
In fact, if you fire up Chrome and run a web worker demo you'll see
each worker show up as a separate process.
-tom
Its worth a lot! The fact that running V8 in multiple threads will put
the VM in a state of "likely to crash fairly soon", means that is simply
impossible to write shared state multithreaded code across all
platforms. It would seem that the only type of concurrency that CommonJS
can standardize support for initiating is shared-nothing (and of course
we would use event loops). If JSGI is to be something that can truly be
implemented on all engines, it would seem it must be specified as having
shared-nothing concurrency (sans intentional work-arounds by users, like
going directly to the JVM, as one should still be able to use JVM
threads to implement this).
If threading is platform specific, than concurrency constructs should be
platform specific (they certainly already exist in Rhino), and not a
part of CommonJS. Providing shared data through workers (as I had
originally suggested) should also not be available. However, providing
real access to event queues (which are used to build workers) through an
event-queue module could still be available. In particular, it may be
helpful to allow for construction of multiple event-queues (within a
single thread/process/vat) and permit associating event-queues with
receiving messages for specific worker ports. One could then
programmatically block on waiting for an event (as one typically is
doing in a normal event-loop while idle), in order to achieve
synchronous execution, like Donny pointed out. I believe this is more
deadlock prone, but that might be a small price to pay for being able to
provide sync messaging when necessary (worker-style async message would
still be the default, encouraged mechanism).
Kris
Am I correct that what you are proposing here is having each "vat"
export multiple message endpoints, each one independently addressable
by the outside world?
> One could then
> programmatically block on waiting for an event (as one typically is
> doing in a normal event-loop while idle), in order to achieve
> synchronous execution, like Donny pointed out.
In other words, within a "vat", computation may be blocked on the
event queues of one or more of these message endpoints, while
computation proceeds in response to events received at other message
endpoints?
If so, then this can be achieved simply building an inter-"vat" RPC
layer where the target of a message is the tuple (vatid, oid). "oid"
is the address of the target object.
This may not be true. Taking care to respect that "multithreaded"
doesn't need to mean "using kernel threads," there may be reasonable
ways to get V8 to do something called "multithreading."
Also, who knows what will be in V8's future. Do not rule out the
possibility that what CommonJS decides may well influence the
direction of V8, if the group can actually put together some good
ideas.
> If threading is platform specific, than concurrency constructs should be
> platform specific (they certainly already exist in Rhino), and not a
> part of CommonJS.
It certainly raises some important questions. Perhaps we should be
evaluating whether or not it would be prudent to define different
types of compliance. (A "level zero compliance" or some such thing has
already been proposed. "Levels" of compliance are not *so* different
from independent optional specifications.) It may turn out, though,
that it's best to only focus on a single model of concurrency. But I
would *not* say this is the case merely because V8 isn't yet up to the
task.
> originally suggested) should also not be available. However, providing
> real access to event queues (which are used to build workers) through an
> event-queue module could still be available. In particular, it may be
> helpful to allow for construction of multiple event-queues (within a
> single thread/process/vat) and permit associating event-queues with
> receiving messages for specific worker ports. One could then
> programmatically block on waiting for an event (as one typically is
> doing in a normal event-loop while idle), in order to achieve
> synchronous execution, like Donny pointed out.
I disagree. We don't really need formal access to event queues (as
opposed to the popular Javascript idiom of providing event handler
functions) and if you are really anal about needing them, you can
implement it using generators. (I was writing a long email in which
one of the things I address is the confusion that explicitly waiting
on an event in a blocking function on a formal reference to an event
queue is semantically identical to what the Javascript world does now
with event handlers.
Put another way, imagine that at the top of every Javascript program
is a bit of Javascript code that you just can't see, that goes like
this:
function thisFunctionHostsPrograms() {
var yourProgram = new Program("your-javascript-source-code");
var EventQueue = whatever.
for(var event in EventQueue)
if (yourProgram.handlesEvent(event.type))
yourProgram.handle(event); // maybe this is a
setInterval() interval event, or an onClick event being fired
}
The reason I recommend against providing formal access to an event
queue object is because there are really very few use cases (I
believe, maybe I need to spend another day thinking) and you can
always use generators.
Donny Viszneki wrote:
> On Sat, Sep 26, 2009 at 11:55 PM, Kris Zyp <kri...@gmail.com> wrote:
>
>> Its worth a lot! The fact that running V8 in multiple threads will put
>> the VM in a state of "likely to crash fairly soon", means that is simply
>> impossible to write shared state multithreaded code across all
>> platforms.
>>
>
> This may not be true. Taking care to respect that "multithreaded"
> doesn't need to mean "using kernel threads," there may be reasonable
> ways to get V8 to do something called "multithreading."
>
> Also, who knows what will be in V8's future. Do not rule out the
> possibility that what CommonJS decides may well influence the
> direction of V8, if the group can actually put together some good
> ideas.
>
From what I understood from the mailing list threads Tom referenced, V8
supports global lock and switch, (like context switching in preemptive
threading), but that by no means provides the real ability of threads to
run on multiple cores simultaneously. I believe they stated that there
are a 1000 undocumented places that rely on the single thread (or global
lock) assumption. It sounds like adding multiple threading support would
be a massive project. I know that in the Rhino, the necessary mechanisms
for dealing with multiple threads are pervasive throughout the codebase.
I this this was a ground up architectural decision in Rhino, and it
would be non-trivial addition to V8.
>
>> If threading is platform specific, than concurrency constructs should be
>> platform specific (they certainly already exist in Rhino), and not a
>> part of CommonJS.
>>
>
> It certainly raises some important questions. Perhaps we should be
> evaluating whether or not it would be prudent to define different
> types of compliance. (A "level zero compliance" or some such thing has
> already been proposed. "Levels" of compliance are not *so* different
> from independent optional specifications.) It may turn out, though,
> that it's best to only focus on a single model of concurrency. But I
> would *not* say this is the case merely because V8 isn't yet up to the
> task.
>
Perhaps it would be worthwhile for us to have some common agreement on
what a concurrency API looks like for those machines that support it,
while noting that it isn't a part of formal CommonJS (both due to the
lack of platform support, and the worker-style concurrency assumption of
other modules).
>
>> originally suggested) should also not be available. However, providing
>> real access to event queues (which are used to build workers) through an
>> event-queue module could still be available. In particular, it may be
>> helpful to allow for construction of multiple event-queues (within a
>> single thread/process/vat) and permit associating event-queues with
>> receiving messages for specific worker ports. One could then
>> programmatically block on waiting for an event (as one typically is
>> doing in a normal event-loop while idle), in order to achieve
>> synchronous execution, like Donny pointed out.
>>
>
> I disagree. We don't really need formal access to event queues (as
> opposed to the popular Javascript idiom of providing event handler
> functions) and if you are really anal about needing them, you can
> implement it using generators. (I was writing a long email in which
> one of the things I address is the confusion that explicitly waiting
> on an event in a blocking function on a formal reference to an event
> queue is semantically identical to what the Javascript world does now
> with event handlers.
>
Generators are only in Mozilla engines, they are not in V8 or JSCore.
Generators hardly solve all our problems either. Their design makes them
very awkward to use, in particular, for returning values (vs yielding).
> Put another way, imagine that at the top of every Javascript program
> is a bit of Javascript code that you just can't see, that goes like
> this:
>
> function thisFunctionHostsPrograms() {
> var yourProgram = new Program("your-javascript-source-code");
> var EventQueue = whatever.
> for(var event in EventQueue)
> if (yourProgram.handlesEvent(event.type))
> yourProgram.handle(event); // maybe this is a
> setInterval() interval event, or an onClick event being fired
> }
>
>
Yup, thats your basic event-loop.
> The reason I recommend against providing formal access to an event
> queue object is because there are really very few use cases (I
> believe, maybe I need to spend another day thinking) and you can
> always use generators.
>
In the example you had provided earlier, I thought you were attempting
to demonstrate how to achieve synchronous processing by connecting
directly to the message/event queues. Isn't that the use case?
Kris
No, my point was the whole vat would block will waiting for a particular
message. It might be nice to provide a synchronous function for certain
actions that are know to be fast, rather than forcing async on callers
everytime intervat communication is needed.
Kris
I like my idea better.
> Generators are only in Mozilla engines, they are not in V8 or JSCore.
> Generators hardly solve all our problems either. Their design makes them
> very awkward to use, in particular, for returning values (vs yielding).
As I have posted on the list previously, the generator pattern is
little more than switch()ed closures with non-local jumps (exception
handling) which every Javascript platform that matters has.
http://groups.google.com/group/commonjs/msg/7a60c2251168a5b6
>> I disagree. We don't really need formal access to event queues (as
>> opposed to the popular Javascript idiom of providing event handler
>> functions) and if you are really anal about needing them, you can
>> implement it using generators. (I was writing a long email in which
>> one of the things I address is the confusion that explicitly waiting
>> on an event in a blocking function on a formal reference to an event
>> queue is semantically identical to what the Javascript world does now
>> with event handlers.
>>
>> Put another way, imagine that at the top of every Javascript program
>> is a bit of Javascript code that you just can't see, that goes like
>> this:
>>
>> function thisFunctionHostsPrograms() {
>> var yourProgram = new Program("your-javascript-source-code");
>> var EventQueue = whatever.
>> for(var event in EventQueue)
>> if (yourProgram.handlesEvent(event.type))
>> yourProgram.handle(event); // maybe this is a
>> setInterval() interval event, or an onClick event being fired
>> }
>
> Yup, thats your basic event-loop.
>
>> The reason I recommend against providing formal access to an event
>> queue object is because there are really very few use cases (I
>> believe, maybe I need to spend another day thinking) and you can
>> always use generators.
>>
> In the example you had provided earlier, I thought you were attempting
> to demonstrate how to achieve synchronous processing by connecting
> directly to the message/event queues. Isn't that the use case?
No. It is true that I employed a formal reference to an event queue,
but it was only for demonstration purposes. Also as I said in my
previous post, the way in which I used that event queue reference is
semantically identical to the currently popular event dispatch
paradigm used throughout the Javascript world, where the event
dispatcher is hidden from the Javascript programmer, who merely
provides the handlers/callbacks for those events. Please note again
that you can also build the formal reference style upon the hidden
dispatch style, or so I claim you can. My GMail account still has my
draft for posting a follow-up that really addresses that better.
From what I understood from the mailing list threads Tom referenced, V8
supports global lock and switch, (like context switching in preemptive
threading), but that by no means provides the real ability of threads to
run on multiple cores simultaneously.
setTimeout events don't preempt, they simply add an event into the event
queue like other events, which will be executed in turn in the
event-loop. There are certain points in the browser environment where
events will be pulled and executed from the queue before another has
finished (sync XHR in FF3, and alert in some situations), but these are
still cooperative switches, not preemptive.
It really sounds like V8's execution is simply not stable with
concurrent execution of JavaScript within a process.
>
> Pre-emptive threading might also be graft-on-able to V8 with
> techniques like GNU Pth, although I personally would just bite the
> bullet and go to spidermonkey if I was in that person's shoes.
>
> By-the-way -- multi-machine message passing for worker threads is also
> pretty interesting to me. It also strikes me that CommonJS might
> specify the inter-machine protocol, although I am fearful that the
> specification approach that CommonJS has taken thus far is very
> unlikely to produce a good wire-level protocol.
I hope CommonJS doesn't attempt to specify an inter-machines protocol,
there are plenty of good protocols out there, we don't want to reinvent
that wheel, and we certainly wouldn't want to be limited to only
communicating with other CommonJS machines (the machines could be
running any language).
Kris
Hm. Good point, but I honestly don't know....
So with async-everything, the risk is that casual programmers will be
put off by the extra work and will reason incorrectly about the state
of their "vat" under multiple interleaved events.
With sync-sometimes, these same programmers will probably program
themselves into deadlocks.
We are *so* far away from worrying about this at the moment, so I
agree with you. However, this is a bit of a hobby horse of mine, so
here goes: It would be really great to have a true, easy to use
distributed object protocol for JS (sort of like what Kevin Reid
showed us with E, where he simply schlepped an object from one "vat"
to another). That protocol would of necessity be JS specific. And that
would be ok.
I hope CommonJS doesn't attempt to specify an inter-machines protocol,
there are plenty of good protocols out there, we don't want to reinvent
that wheel, and we certainly wouldn't want to be limited to only
communicating with other CommonJS machines (the machines could be
running any language).
setTimeout events don't preempt, they simply add an event into the event
queue like other events, which will be executed in turn in the
event-loop. There are certain points in the browser environment where
events will be pulled and executed from the queue before another has
finished (sync XHR in FF3, and alert in some situations), but these are
still cooperative switches, not preemptive.
It really sounds like V8's execution is simply not stable with
concurrent execution of JavaScript within a process.
I hope CommonJS doesn't attempt to specify an inter-machines protocol,
there are plenty of good protocols out there, we don't want to reinvent
that wheel, and we certainly wouldn't want to be limited to only
communicating with other CommonJS machines (the machines could be
running any language).
No. As mentioned prior, the Javascript world handles events with an
implicit loop at the very bottom of the stack that, notionally, waits
for events in some sort of event queue, and dispatches them to your
handlers. Realize, too, that even the "main" entry point (although in
the browser there may be many such "main" events, and many SCRIPT
elements) is responding to such an event.
So far in the Javascript world, the role of "preempting" happens, if
at all, in non-Javascript land. Semantically it is indistinguishable
whether there is any actual preempting/calling-back happening from the
context of the world above Javascript, or whether it is polling for
updates. It is likely that every browser does a mix of both, then
pushes it into a queue, and runs handlers for anything in the queue
any time the queue is not empty.
Mark S. Miller wrote:
> [+e-lang]
>
> On Sun, Sep 27, 2009 at 8:30 AM, Kris Zyp <kri...@gmail.com
> <mailto:kri...@gmail.com>> wrote:
>
> I hope CommonJS doesn't attempt to specify an inter-machines protocol,
> there are plenty of good protocols out there, we don't want to
> reinvent
> that wheel, and we certainly wouldn't want to be limited to only
> communicating with other CommonJS machines (the machines could be
> running any language).
>
>
> Agreed. Both Tyler's web_send protocol
> <http://waterken.sourceforge.net/web_send/> and E's CapTP protocol
> <http://code.google.com/p/caja-captp/> are good language neutral
> distributed capability protocols with implementations in several
> languages supporting communicating event-loops concurrency. Which one
> is better is a complex question involving many tradeoffs. Their APIs
> in JavaScript are quite similar. This community should look at both
> and understand the considerations each is optimizing for.
>
And FWIW, Persevere uses and will continue to primarily use HTTP (with a
preferred resource representation of application/javascript) as the
primary inter-machine protocol (perhaps others like the ones you
mentioned could be supported as well). JSON referencing is used for
links/URI referencing and JSON-RPC when necessary for tighter coupled
invocations. HTTP seems to be doing pretty decent in terms of adoption,
scalability, and language neutrality :). It has worked really nicely
with JS in Persevere as well. Clearly, there are a number of options out
there that people can and will use, so as we have said, CommonJS
shouldn't dictate how we communicate with others.
Kris
* Error management. An important part of the event loop is that each
event execution in the loop is effectively wrapped in a try/catch (you
don't want an error preventing the next event from executing). In the
browser the catch handler does various things (log to the console in FF,
alert or status bar update in IE, etc.). Trying to decide on a universal
catch mechanism in CommonJS doesn't seem desirable. I would think it
would be far better to allow users to explicitly decide how errors will
be managed. In fact the event loop function is great place for error
management, since it is top-level default error handler for all uncaught
errors. There may also be other cleanup or maintenance operations that a
program may wish to execute between events.
* This is a basic construct that can be used to build the higher level
worker constructor/module (the worker module would still require native
code for spawning a process or thread, and creating the JS global scope).
* There needs to be a way for events to be enqueued on the event queue.
In the browser, all events are enqueued through native code (albiet
these operations can be triggered programmatically through setTimeout,
fireEvent, etc). In CommonJS, it seems it would be desirable for events
to be enqueuable through JavaScript rather than having to go to native
code for all enqueuing. Having an enqueue function could also
potentially allow for enqueuing events with different priority levels as
well. (In fact, I believe the basic stubs for an event queue already
exist in Narwhal in the "reactor" module (not sure I understand the
name), albeit there is no real implementation behind it yet.)
* Additional event queue functions can be provided for constructing
workers that monitor other queues for things like availability. Having
an "isEmpty" function would be important for doing worker pools with
efficient event delegation (which would be critical for scalable JSGI
servers).
* Some programs don't even need or want an event loop. A program that
simple copies a file and exits would never need an event-loop, no need
to force one on it.
* Perhaps most importantly, synchronous processing can be achieved by
programmatically processing the event-queue until an appropriate event
is received. One of the great pains of asynchronous operations is that
an action can't be encapsulated for use by synchronous callers. For
example, if another module relies on an "incrementAndGetHits" function,
expecting a numerical return value, we can't change the implementation
to use asynchronous messaging without forcing the caller to handle a
promise or callback. If the caller is another module that can't be
changed, we may simply be out of luck (and I have seen this happen often
in the browser code). Generators can provide sugar, but they don't solve
this problem either. This need to change the interface when the
implementation changes is a basic violation of encapsulation. If a code
could execute through the event queue events, one could effectively wait
for a response from another worker, and create an encapsulated
synchronous function to preserve the interface.
Perhaps an API could look like this.
var queue = require("event-queue"); // get the event queue for this
vat/process.
queue.nextEvent(); // returns (or executes?) the next event in the
queue, blocking until one is available if it is empty.
queue.enqueue(event, priority); // add the event (a function) to the queue
queue.isEmpty(); // returns whether or not the queue is empty
Kris
>
>
> Perhaps an API could look like this.
> var queue = require("event-queue"); // get the event queue for this
> vat/process.
> queue.nextEvent(); // returns (or executes?) the next event in the
> queue, blocking until one is available if it is empty.
> queue.enqueue(event, priority); // add the event (a function) to the
> queue
> queue.isEmpty(); // returns whether or not the queue is empty
>
I have no problems with this API on first glance, but I do think it is
getting ahead of ourselves a little bit: we haven't yet got an agreed
upon proposal for file I/O. Can we put this as a future effort and
revisit it (hopefully) soon once we have more of the lower level
things ratified?
-ash
Browser javascript is inherently event loop based but does not need
such constructs - server side code does not either. Something like
Node's API should be adopted:
http://tinyclouds.org/node/api.html#_events
This allows implementors to employ optimal code for juggling I/O like
libev (http://libev.schmorp.de/bench.html) instead of forcing
implementations to do this at the javascript level.
On Sun, Sep 27, 2009 at 11:14 PM, Kris Zyp <kri...@gmail.com> wrote:
>
> I believe there are a number of reasons for providing an API for
> accessing the event queue. (Also, I realized that the motivations for
> accessing the event queue really don't require multiple event queues per
> vat/process, just one). Anyway, some reasons:
>
> * Error management. An important part of the event loop is that each
> event execution in the loop is effectively wrapped in a try/catch (you
> don't want an error preventing the next event from executing). In the
> browser the catch handler does various things (log to the console in FF,
> alert or status bar update in IE, etc.). Trying to decide on a universal
> catch mechanism in CommonJS doesn't seem desirable. I would think it
> would be far better to allow users to explicitly decide how errors will
> be managed. In fact the event loop function is great place for error
> management, since it is top-level default error handler for all uncaught
> errors. There may also be other cleanup or maintenance operations that a
> program may wish to execute between events.
Uncaught exceptions can be handled by some sort of a global exception event:
process.addListener("exception", ...)
> * This is a basic construct that can be used to build the higher level
> worker constructor/module (the worker module would still require native
> code for spawning a process or thread, and creating the JS global scope).
Although one might want to implement an event loop in javascript,
commonjs should not specify this as a standard. The standard should
start above the worker-level.
> * Some programs don't even need or want an event loop. A program that
> simple copies a file and exits would never need an event-loop, no need
> to force one on it.
If someone copies a file and exits, then they simply drop out of the
event loop after one rotation - this is not a special case. The event
loop should exit naturally when there are no more pending events.
Rather the opposite. Defining other I/O before thinking about
concurrency is getting ahead of ourselves.
Ryan Dahl wrote:
> Optimizing the event loop at a low level is important to performance.
> Specifying such an interface as you suggests would force us to
> implement most of the event loop in javascript.
>
> Browser javascript is inherently event loop based but does not need
> such constructs - server side code does not either. Something like
> Node's API should be adopted:
> http://tinyclouds.org/node/api.html#_events
> This allows implementors to employ optimal code for juggling I/O like
> libev (http://libev.schmorp.de/bench.html) instead of forcing
> implementations to do this at the javascript level.
>
I am curious how explicitly asking for the next event in loop breaks
optimizations, it seems like all the optimizations can occur in
nextEvent() (getting the next event and executing it) which could all be
native code. However, your API provides a promise.wait() function that
effectively satisfies the goal of synchronously processing async actions
that I wanted to see (complete with disclosure of the effect queue
stacking). I am not sure how executing waiting events with
promise.wait() is more doable than an explicit call to nextEvent(), but
that would work for me.
Also would your API provide a means for creating efficient worker pools
so that a web server could effectively delegate incoming requests to
workers that are ready to process them? I had suggested that an isEmpty
function would be useful for delegating events to workers.
I still feel like explicit control of the event loop would be a more
logical way of composing all that functionality that you provide, but
you have addressed most of my requirements, and I wouldn't be opposed to
using this as a basis for our concurrency model, this is good stuff. I
think there are some spelling differences that would need to be
reconciled where we have already made API proposals (like promises), but
the fact that you have a working implementation is compelling, and
certainly gives your opinion a lot of weight.
Kris
Yes. It's rather implicit in most of javascript's APIs already. Just
take something like setTimeout(). With an event loop setTimeout is a
very natural construction; with threads it's a nightmare. In the new
HTML5 specifications it is very explicit:
http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#event-loops
And one could counter that argument, by asking how much of this
JavaScript is really out there. We don't know of course, but I bet it's
a tiny fraction of all the JavaScript out there.
Urban
Wes Garland wrote:
> > > Are you proposing here that all CommonJS programs become event driven?
>
> > Yes. It's rather implicit in most of javascript's APIs already.
>
> I think I could make a reasonable argument that that argument does not
> hold true for the majority of JavaScript code written for non-browser
> environments.
It is probably useful to think how event-loops fit into the evolution of
a CommonJS platform. Event loops are proposed for situations when
asynchronous actions are triggered. Maybe this wasn't clear, but if your
platform does not yet have any support for asynchronous actions, you
don't need an event loop. It is pointless if nothing is going to happen
asynchronously (nothing would ever go in the event queue). As I pointed
out on IRC, if you are doing HTTP request processing, you are probably
doing loop of listening for request, request processing, and repeating.
This is implicitly an event loop, where the only type of event is an
incoming request.
The major premise of the event loop concurrency is that if/when you do
add asynchronous capabilities, that you maintain the single-thread per
vat assumption. JavaScript code traditionally hasn't been written to
handle concurrent access to its data, and CommonJS will continue to
allow developers to preserve this assumption. Thus, if you create a
setTimeout function for your platform (or async XHR may be a more common
use case), the callback must not be executed in a separate thread
concurrently with other code in the same scope or CommonJS modules
should no longer be expected to behave deterministically/correctly.
The discussion about if and how to explicitly process events to block
until a desired event is finished also applies only if you actually have
an event loop of course. I also still feel that is helpful if starting
the event loop was explicit, so you don't use one unless you need to.
Kris
Quite apart from the browser vs. non-browser, I think the issue is how
"large" your "vat" is.
"Large" vat -> many services running in the same event queue. If one
object chooses to invoke a sync API and block, all the *other*
services in this "vat" will be stuck waiting while this one little
object gets its reply. These services will rightfully ask that object,
"can't you just async so we can get work done while you wait?".
=> A pure form of this is a set of APIs that do not allow *any*
objects to do any sync operations whatsoever. This is the case of the
browser assuming developers stay away from sync XHR due to its
shortcomings. This is also (afaik) the E model.
"Small" vat -> essentially one service running in one event queue.
Objects may want the option to do both sync *and* async operations. If
an object invokes a sync API, it is probably because the author of
that object has knowledge about the service and realizes it will not
be able to serve up anything meaningful anyway until the sync API
completes.
=> A pure form of this is a set of APIs that *always* block; if you
want async behavior, you create multiple "vats" and have them talk to
one another. This is Erlang.