Tim Fox
unread,Jul 19, 2011, 1:59:10 PM7/19/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to nodex-dev
We don't want developers to have to worry too much about concurrency.
No worrying about synchronisation, race conditions, CAS or deadlocks.
We want devs to be able to write their code as single threaded, like
it's done in node.js or eventmachine, but we want to be able to
leverage multiple cores on the machine, without having to spin up
multiple processes.
We also want to be able to share state between threads in a safe way,
so users don't always have to resort to dumping all state in a
database / reeds / whatever.
How can we do this?
Frameworks like node.js, eventmachine etc use what is known as the
"reactor" pattern. This basically means there is a single thread which
delivers all events to the application code. Events can be data
received from sockets, callbacks from file IO, timers firing,
whatever. The important point is they all get executed serially using
the same thread, never concurrently.
An advantage of this is the developer does not have to reason much
about concurrency. All code can be written as single threaded since
it's guaranteed that only the reactor thread will ever execute it.
A disadvantage of this frameworks like these only scale by spinning up
multiple processes - potentially _many_ processes if you have many
cores on your server, and it's hard to share state between the
processes, resulting in state normally being dumped in a database, or
if state is maintained internally, the system never scaling beyond one
core on your server.
1) With node.x I propose the "multi-reactor" pattern. This is
basically the same as the reactor pattern, but instead of having one
event loop, we have multiple event loops. In reality we will probably
chose number of event loops to match number of cores.
2) Executing events are associated with a "context". When an event
registers a callback, timer or anything else which can result in
another event, then that event is associated too with the same context
as the caller. Each context is associated with a single event loop.
When an event fires it will always be executed using the event loop
associated to the context.
This basically gives us the reactor pattern _per context_. For all
code sharing the same context we therefore don't have to worry about
concurrency concerns and we can write the code as single threaded.
A context defines an island of single-threadedness. You can think of
the program being composed of a set of a contexts, within each context
you don't have to worry about concurrency which makes programming
easy. But different contexts can execute concurrently.
3) By default each time a new connection is created, a new context is
created. This means the code for each connection will always get
executed by the same event loop. However code for different
connections may get executed by a different event loop. This is how we
scale.
(Advanced users can also create contexts themselves, but I don't
envision this being something commonly used)
What is a connection? There's nothing really special about connection.
But it's a natural place to associate new contexts. Connections can be
TCP connections, HTTP connections etc. The documentation for create
connection methods will show whether the connection they create runs
in a new context or not.
4) For some applications it's desirable to share state between
contexts.
A good example would be implementing a shared cache which needs to be
accessible by multiple contexts (connections). (You can't do this
scalable in node.js) To do this sanely, scalable and with a guarantee
of no deadlocks we will provide a set of classes to enable this.
We will provide shared Map, List and Set implementations. These
structures can contain only immutable data such as primitive types and
strings. The Map is backed by Cliff Click's super scalable concurrent
hash map.
5) For other applications it's desirable to send messages between
contexts.
An example would be a pub-sub server, where we have multiple client
connections, and as data is published we need to pass that data to
other contexts (connections) so it can be written out to the
subscribers.
In this case we don't want to be calling methods (write) on other
connections directly from the context of another connection as this
violates the principle that all code for each context is always
executed on the same event loop.
What we want to do is to be able to send a message to a particular
connection saying "please receive this data on your event loop and
then write it out to your subscriber".
Objects, e.g. connections will be registered with the system using an
id. If a connection has a method foo(), another connection, given the
id, can call the method foo as follows:
system.getActor(id).foo()
The interesting point here is that foo() is not executed directly,
instead it's executed asynchronously as an event on the event loop of
the callee. I.e. in the context of the callee. (Any return values are
discarded)
In other words we can register objects as (kind of) actors, and send
messages between them. Actually it's more like Akka typed actors than
conventional actors. The bottom line is they're trivially easy to
call.
6) We will need to interface with some legacy APIs, e.g. JDBC, which
are inherently blocking. This poses problem for us since we cannot
allow our limited number of event loop threads to block at any time,
or the system won't be fully utilised. To get around this we can
maintain a separate fixed size pool of threads, fed by a feeder queue
which executes the legacy blocking calls on a thread from the pool and
when the result arrives this is provided to the user as an event just
like any other event. The event will be delivered on the event loop
corresponding to the context which invoked the blocking operation.
------
The end result of the above is it should enable users to write
complex, highly scalable, highly concurrent network server
applications without having to worry about concurrency issues, they
just need to remember:
1) Don't share state directly between connections, use the provided
data structures
2) If you want to call other connections, do it via the message
sending API.
If they follow the above they code everything as a single threaded
program.
Comments?