Node.x concurrency model

Tim Fox

unread,

Jul 19, 2011, 1:59:10 PM7/19/11

to nodex-dev

We don't want developers to have to worry too much about concurrency.
No worrying about synchronisation, race conditions, CAS or deadlocks.
We want devs to be able to write their code as single threaded, like
it's done in node.js or eventmachine, but we want to be able to
leverage multiple cores on the machine, without having to spin up
multiple processes.

We also want to be able to share state between threads in a safe way,
so users don't always have to resort to dumping all state in a
database / reeds / whatever.

How can we do this?

Frameworks like node.js, eventmachine etc use what is known as the
"reactor" pattern. This basically means there is a single thread which
delivers all events to the application code. Events can be data
received from sockets, callbacks from file IO, timers firing,
whatever. The important point is they all get executed serially using
the same thread, never concurrently.

An advantage of this is the developer does not have to reason much
about concurrency. All code can be written as single threaded since
it's guaranteed that only the reactor thread will ever execute it.

A disadvantage of this frameworks like these only scale by spinning up
multiple processes - potentially _many_ processes if you have many
cores on your server, and it's hard to share state between the
processes, resulting in state normally being dumped in a database, or
if state is maintained internally, the system never scaling beyond one
core on your server.

1) With node.x I propose the "multi-reactor" pattern. This is
basically the same as the reactor pattern, but instead of having one
event loop, we have multiple event loops. In reality we will probably
chose number of event loops to match number of cores.

2) Executing events are associated with a "context". When an event
registers a callback, timer or anything else which can result in
another event, then that event is associated too with the same context
as the caller. Each context is associated with a single event loop.
When an event fires it will always be executed using the event loop
associated to the context.

This basically gives us the reactor pattern _per context_. For all
code sharing the same context we therefore don't have to worry about
concurrency concerns and we can write the code as single threaded.

A context defines an island of single-threadedness. You can think of
the program being composed of a set of a contexts, within each context
you don't have to worry about concurrency which makes programming
easy. But different contexts can execute concurrently.

3) By default each time a new connection is created, a new context is
created. This means the code for each connection will always get
executed by the same event loop. However code for different
connections may get executed by a different event loop. This is how we
scale.

(Advanced users can also create contexts themselves, but I don't
envision this being something commonly used)

What is a connection? There's nothing really special about connection.
But it's a natural place to associate new contexts. Connections can be
TCP connections, HTTP connections etc. The documentation for create
connection methods will show whether the connection they create runs
in a new context or not.

4) For some applications it's desirable to share state between
contexts.

A good example would be implementing a shared cache which needs to be
accessible by multiple contexts (connections). (You can't do this
scalable in node.js) To do this sanely, scalable and with a guarantee
of no deadlocks we will provide a set of classes to enable this.

We will provide shared Map, List and Set implementations. These
structures can contain only immutable data such as primitive types and
strings. The Map is backed by Cliff Click's super scalable concurrent
hash map.

5) For other applications it's desirable to send messages between
contexts.

An example would be a pub-sub server, where we have multiple client
connections, and as data is published we need to pass that data to
other contexts (connections) so it can be written out to the
subscribers.

In this case we don't want to be calling methods (write) on other
connections directly from the context of another connection as this
violates the principle that all code for each context is always
executed on the same event loop.

What we want to do is to be able to send a message to a particular
connection saying "please receive this data on your event loop and
then write it out to your subscriber".

Objects, e.g. connections will be registered with the system using an
id. If a connection has a method foo(), another connection, given the
id, can call the method foo as follows:

system.getActor(id).foo()

The interesting point here is that foo() is not executed directly,
instead it's executed asynchronously as an event on the event loop of
the callee. I.e. in the context of the callee. (Any return values are
discarded)

In other words we can register objects as (kind of) actors, and send
messages between them. Actually it's more like Akka typed actors than
conventional actors. The bottom line is they're trivially easy to
call.

6) We will need to interface with some legacy APIs, e.g. JDBC, which
are inherently blocking. This poses problem for us since we cannot
allow our limited number of event loop threads to block at any time,
or the system won't be fully utilised. To get around this we can
maintain a separate fixed size pool of threads, fed by a feeder queue
which executes the legacy blocking calls on a thread from the pool and
when the result arrives this is provided to the user as an event just
like any other event. The event will be delivered on the event loop
corresponding to the context which invoked the blocking operation.

------

The end result of the above is it should enable users to write
complex, highly scalable, highly concurrent network server
applications without having to worry about concurrency issues, they
just need to remember:

1) Don't share state directly between connections, use the provided
data structures
2) If you want to call other connections, do it via the message
sending API.

If they follow the above they code everything as a single threaded
program.

Comments?

Tim Fox

unread,

Aug 14, 2011, 7:47:04 AM8/14/11

to nodex-dev

I've simplified the "actor"part of this by just letting any code
within a context register a callback. The act or registration returns
a context_id which can be used from any other context to send a
message.

The message is always received by the callback in the context of the
code which set the original callback.

james northrup

unread,

Aug 14, 2011, 10:36:36 AM8/14/11

to node...@googlegroups.com

in 1xio i had great luck storing the context inside of NIO selector data. there's no thread sync code, however to handle POST and gwt RF i use

task=new Runnable{

Object data=selector.data();

void run(){

//use context "data" in threadpool

}

Future x=threadpool.submit()

as the closure to work with any threadlocal artifacts which will be imposed by j2ee or NIO.

at the end the runnable sets the selector ready flags to proceed with native IO events.

i found that for a given benchmark, the per-thread throughput exceeds that of jetty but a single thread cannot keep up with multiple allocators and syncronizers in a multicore solution.

for my needs, this was far above gigabit ethernet throughput, and 1 single core reactor caps itself at 8% cpu usage on a 12-core setup, leaving 92% of the cores for non-IO runnables

--
Jim Northrup * (415) 935-4611 *

Tim Fox

unread,

Aug 14, 2011, 11:35:17 AM8/14/11

to nodex-dev

Hi Jim,

I wonder if you could explain your thread model in more detail? I'm
not really parsing it well from your description.

james northrup

unread,

Aug 14, 2011, 12:44:45 PM8/14/11

to node...@googlegroups.com

for static content i didnt see fit to create any threads. one i7 core can saturate any link with NIO async IO using [a shared pool of] ByteBuffers.

for static content alone, it spends 99.999% of the jvm profiler in sun native IO waits, which ironically show up as false CPU usage in jvm profiling.

its as close to C as i could make it.

once the baseline above is established 1xio runs gigabytes per second at 12 megs jvm heap cost (due to using a modest pool of Direct Byte Buffers in the NIO stack) and 100% cpu usage on one core. this usage is native select() and sendfile() libc functions.

the threading model for POST and GWT Requestfactory closures is simply creating an artificial Threadlocal mapping from SelectionKey -> Future such that ORM objects dependant on threadlocal state have thier own lookup to access thier own threadlocals, and to map request headers and cookies for the life of the closure.

by closure, i mean java.lang.Runnable or Callable.

Tim Fox

unread,

Aug 14, 2011, 12:49:20 PM8/14/11

to node...@googlegroups.com

On 14/08/2011 17:44, james northrup wrote:

for static content i didnt see fit to create any threads. one i7 core can saturate any link with NIO async IO using [a shared pool of] ByteBuffers.

for static content alone, it spends 99.999% of the jvm profiler in sun native IO waits, which ironically show up as false CPU usage in jvm profiling.

its as close to C as i could make it.

once the baseline above is established 1xio runs gigabytes per second at 12 megs jvm heap cost (due to using a modest pool of Direct Byte Buffers in the NIO stack) and 100% cpu usage on one core. this usage is native select() and sendfile() libc functions.

the threading model for POST and GWT Requestfactory closures is simply creating an artificial Threadlocal mapping from SelectionKey -> Future such that ORM objects dependant on threadlocal state have thier own lookup to access thier own threadlocals, and to map request headers and cookies for the life of the closure.

I'm still not grokking this, can you give a real code example? I want to see how this compares to node.x or netty.

james northrup

unread,

Aug 14, 2011, 12:53:21 PM8/14/11

to node...@googlegroups.com

http://code.google.com/p/1xio/downloads/detail?name=1xio-0.0.3.zip&can=2&q=

this is most succinct. its 4 classes. this has no POST or closures.

Tim Fox

unread,

Aug 14, 2011, 1:02:29 PM8/14/11

to node...@googlegroups.com

Sure, I know where your code is ;)

I was just interested if you were providing a certain set of concurrency guarantees to your users. One of the big draws of node.x is we guarantee that all event handlers for any context are always executed in that same context, which means, on the same event loop thread. This means application developers using node.x don't have to worry about concurrency.

james northrup

unread,

Aug 14, 2011, 1:06:56 PM8/14/11

to node...@googlegroups.com

i dont grok how users that don't have to worry about concurrency cannot use a threadpool in one process.

Tim Fox

unread,

Aug 14, 2011, 1:52:27 PM8/14/11

to node...@googlegroups.com

On 14/08/2011 18:06, james northrup wrote:

i dont grok how users that don't have to worry about concurrency cannot use a threadpool in one process.

Can you rephrase that?

Reply all

Reply to author

Forward