ANN: threads - enabling safe use of synchronous coding

100 views
Skip to first unread message

rtweed

unread,
Aug 31, 2011, 8:28:52 AM8/31/11
to nodejs
There was much recent discussion in the thread relating to the Globals
database and the whys and wherefores of its synchronous APIs.

I therefore decided to put together the module I call threads (https://
github.com/robtweed/threads) which provides a simple solution by
making use of the new Child Node Processes in Node.js v0.5.x

See the ReadMe in the repository for full details, but in summary:

threads is a simple module for enabling and managing a scalable but
high-performance multi-threaded environment in Node.js. The primary
incentive for writing it was to create a hybrid environment where the
many benefits of synchronous coding could be acheived without the risk
of blocking the main server thread.

It consists of four components:

- a master Node server process that you can ensure is 100%
asynchronous and non-blocking
- a pool of child Node processes that only ever handle a single action/
request at a time
- a queue of pending actions/requests
- a queue processor that attempts to allocate requests/actions on the
queue to available child Node processes

The child processes persist throughout the lifetime of the master Node
server process, so there is no setup/teardown overhead when handling
requests/actions. Once a child Node process has completed the
processing of an action/request, it is returned immediately to the
available Child Process pool. When a request/action is allocated to a
Child Node process, it is removed from the available process pool.

Because a Child Node Process is only handling a single request/action
at a time, it can include as much synchronous logic as you like
without the risk of blocking anyone else.

Clearly, the larger the pool of Node.js threads, the less likely it is
that the request/action queue will build up. On the other hand, you
need to be aware that each child Node process uses about 10Mb memory
according to the Node.js documentation. Additionally, the quicker
your child process can handle a request, the sooner it will become
available again to the pool to handle a queued request/action.

You can configure a number of parameters, in particular the size of
the Child Node Process pool (default = 5).

This is version 1 of threads, so it may have a few rough edges. I'll
be interested in any feedback.

As soon as InterSystems make a 0.5.x version of the cache.node file
available for Globals, it should be possible to use their synch APIs
without any of the concerns that folks previously raised.

Rob

rtweed

unread,
Aug 31, 2011, 8:31:10 AM8/31/11
to nodejs
The link to the threads repository appears to have been split in my
original post. Here it is again, hopefully as a full working URL:

https://github.com/robtweed/threads

Rob

Ben Noordhuis

unread,
Aug 31, 2011, 9:22:00 AM8/31/11
to nod...@googlegroups.com
On Wed, Aug 31, 2011 at 14:28, rtweed <rob....@gmail.com> wrote:
> threads is a simple module for enabling and managing a scalable but
> high-performance multi-threaded environment in Node.js.  The primary
> incentive for writing it was to create a hybrid environment where the
> many benefits of synchronous coding could be acheived without the risk
> of blocking the main server thread.
>
> It consists of four components:
>
> - a master Node server process that you can ensure is 100%
> asynchronous and non-blocking
> - a pool of child Node processes that only ever handle a single action/
> request at a time
> - a queue of pending actions/requests
> - a queue processor that attempts to allocate requests/actions on the
> queue to available child Node processes

threads is something of a misnomer, isn't it? It's the old prefork model.

Diogo Resende

unread,
Aug 31, 2011, 10:49:40 AM8/31/11
to nod...@googlegroups.com

I think the node edge version has a .fork() with messaging that seem
pretty good.

rtweed

unread,
Aug 31, 2011, 11:00:59 AM8/31/11
to nodejs
>
> I think the node edge version has a .fork() with messaging that seem
> pretty good.

That's what the threads module uses

Rob

Mark Hahn

unread,
Aug 31, 2011, 2:13:50 PM8/31/11
to nod...@googlegroups.com
I would agree that calling it threads is going to cause confusion. It
has nothing to do with threads. Maybe it could be called Procs?

BTW: On the project before this one I used a home-brew architecture
that was almost identical. My original motivation was to protect
executing processes from crashing each other but it performed well
also. Having processes sitting there waiting to be used pretty much
nullified the Node argument that firing off a process was too
expensive.

My only performance issue was the cost of serializing the IO. I
didn't have the new fork.

> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>

Isaac Schlueter

unread,
Aug 31, 2011, 2:43:41 PM8/31/11
to nod...@googlegroups.com
Yeah, -1 on the name. That's very confusing, on a point which is
already confusing enough.

The child_process.fork doesn't create a new thread. It creates a
full-fledged OS child process, complete with its own pid and memory
and everything. You can send send it signals, renice it, etc. It
can't share memory directly with its parent. It's not a thread.

Not to say this isn't a useful lib. Seems like a nice little sugar
layer over child_process.fork, which is probably very useful. But
it's absolutely not in any way "multithreaded".

Diogo Resende

unread,
Aug 31, 2011, 3:09:23 PM8/31/11
to nod...@googlegroups.com
On Wed, 31 Aug 2011 11:43:41 -0700, Isaac Schlueter wrote:
> Yeah, -1 on the name. That's very confusing, on a point which is
> already confusing enough.
>
> The child_process.fork doesn't create a new thread. It creates a
> full-fledged OS child process, complete with its own pid and memory
> and everything. You can send send it signals, renice it, etc. It

Speaking of .fork, is there any plans to add .signal(), .renice() to
the forked processes using child_process? It would simplify a lot :)

---
Diogo R.

Marcel Laverdet

unread,
Aug 31, 2011, 3:52:10 PM8/31/11
to nod...@googlegroups.com
Also joining the "why is it called 'threads' crew".

I had gotten excited considering you could totally implement a worker paradigm in v8+Node with Isolates.. it's just no one has done it yet.


--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Marak Squires

unread,
Aug 31, 2011, 3:59:15 PM8/31/11
to nod...@googlegroups.com
To play devil's advocate, the functionality you describe in this library is a byproduct of the hook.io projects core functionality.

I just haven't spent a lot of time documenting it / writing example code / making the specific API saner, because there hasn't been any demand for it yet.

I think you'll find this type of I/O strategy ( master / slave worker pool ) is only one of many you'll eventually need in an I/O heavy application. 

- Marak

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Diogo Resende

unread,
Aug 31, 2011, 3:59:20 PM8/31/11
to nod...@googlegroups.com
On Wed, 31 Aug 2011 14:52:10 -0500, Marcel Laverdet wrote:
> Also joining the "why is it called 'threads' crew".
>
> I had gotten excited considering you could totally implement a worker
> paradigm in v8+Node with Isolates.. it's just no one has done it yet.
>
Yeah.. I also tried that once, but I'm not a v8 expert and I got into
a stupid locking error that I couldn't figure out. I guess I'll have
to wait for somebody to do the first step :P

---
Diogo R.

Matt

unread,
Aug 31, 2011, 4:04:29 PM8/31/11
to nod...@googlegroups.com
On Wed, Aug 31, 2011 at 2:43 PM, Isaac Schlueter <i...@izs.me> wrote:
Yeah, -1 on the name. That's very confusing, on a point which is
already confusing enough.

LOL, isn't this rather ironic, given it's built on child_process.fork(), which isn't fork(2) at all, and is really quite confusing?
 
The child_process.fork doesn't create a new thread.  It creates a
full-fledged OS child process, complete with its own pid and memory
and everything.  You can send send it signals, renice it, etc.  It
can't share memory directly with its parent.  It's not a thread.

Threads have many definitions, not necessarily just those of shared PID and memory space. In fact the Wikipedia page specifically mentions a thread being a "fork". http://en.wikipedia.org/wiki/Thread_(computer_programming)

Not to say this isn't a useful lib.  Seems like a nice little sugar
layer over child_process.fork, which is probably very useful.  But
it's absolutely not in any way "multithreaded".

Computer scientists would probably disagree with you.

Matt.

Isaac Schlueter

unread,
Aug 31, 2011, 4:44:58 PM8/31/11
to nod...@googlegroups.com
On Wed, Aug 31, 2011 at 13:04, Matt <hel...@gmail.com> wrote:
> On Wed, Aug 31, 2011 at 2:43 PM, Isaac Schlueter <i...@izs.me> wrote:
>>
>> Yeah, -1 on the name. That's very confusing, on a point which is
>> already confusing enough.
>
> LOL, isn't this rather ironic, given it's built on child_process.fork(),
> which isn't fork(2) at all, and is really quite confusing?

Indeed. I recommended not calling it fork for that very reason.
Whatever, it's done :)

child_process.fork is a wrapper around child_process.spawn, with an
additional channel open to send and receive messages in a somewhat
webworker-like manner.

child_process.spawn uses vfork(2), which is quite similar to fork(2),
but uses virtual memory much more efficiently. Also, it is not safe
or advisable to use vfork(2) for any purpose other than an immediate
call to some form of exec(2).

Using fork+exec is not quite the same as merely forking and then
running some more code. Yes, it does create a thread, but in that
thread, it splits off a whole other child process, and that's all it
does.

So, yes, it *does* use fork, it just doesn't *only* use fork. The
thread itself is not exposed to JavaScript, and you never have two
bits of JavaScript code running in parallel in the same process. This
is the guarantee that Node makes which makes reentry not an issue we
have to even think about.

If you're writing C++ addons and using eio_custom or something, then
you do have to be aware of these issues.

>> it's absolutely not in any way "multithreaded".
>
> Computer scientists would probably disagree with you.

Well, I mean, of course it's using threads. You can't write software
on a CPU without "using threads", but it's not as if
child_process.fork results in a new thread in the same process that
can communicate with its parent by use of a shared memory space.

When *I* start complaining that you're being too hyperliteral and
robotic, you might have a problem. ;P

Mark Hahn

unread,
Aug 31, 2011, 5:08:42 PM8/31/11
to nod...@googlegroups.com
Many more people than computer scientists could benefit from this and
it would be a shame if they didn't look into it because of its name.

Joshua Holbrook

unread,
Aug 31, 2011, 5:19:00 PM8/31/11
to nod...@googlegroups.com
Is it too late to rename it? Maybe "ropes." ;)

--Josh

rtweed

unread,
Aug 31, 2011, 5:33:34 PM8/31/11
to nodejs
I really dont care what you want me to call it. I spent a few minutes
looking for a name that was vaguely related to its architecture and
came up with threads and found the name wasn't registered in npm so
went with it. I mainly wanted to get it out as soon as possible
because I have specific plans for its use.

How about you guys vote for a name and I'll rename it accordingly :-)

Rob

Joshua Holbrook

unread,
Aug 31, 2011, 5:41:55 PM8/31/11
to nod...@googlegroups.com
On Wed, Aug 31, 2011 at 2:33 PM, rtweed <rob....@gmail.com> wrote:
>
> How about you guys vote for a name and I'll rename it accordingly :-)
>
> Rob

This sounds fun! Maybe we (you?) can collect a short list of possible
names, and then have an online poll to decide which wins. Maybe
running it from threads' issues page
(https://github.com/robtweed/threads/issues) would be the way to go.

--Josh

Mark Hahn

unread,
Aug 31, 2011, 6:37:47 PM8/31/11
to nod...@googlegroups.com
It doesn't matter what it is called as long as it isn't confusing.
Marketing studies for product names have shown that names unrelated to
function perform better in the market, hence Apple, Google, etc.

I like the "ropes" moniker.

On Wed, Aug 31, 2011 at 2:33 PM, rtweed <rob....@gmail.com> wrote:

Marcel Laverdet

unread,
Aug 31, 2011, 7:16:49 PM8/31/11
to nod...@googlegroups.com

Diogo Resende

unread,
Aug 31, 2011, 7:19:25 PM8/31/11
to nod...@googlegroups.com
On Wed, 31 Aug 2011 15:37:47 -0700, Mark Hahn wrote:
> It doesn't matter what it is called as long as it isn't confusing.
> Marketing studies for product names have shown that names unrelated
> to
> function perform better in the market, hence Apple, Google, etc.
>
> I like the "ropes" moniker.

How about spoon? It's not a fork..

---
Diogo R.

Zachary Carter

unread,
Aug 31, 2011, 7:47:56 PM8/31/11
to nod...@googlegroups.com

Or maybe spork, as a play on spawn/fork.


--
Zach Carter

Marcel Laverdet

unread,
Aug 31, 2011, 9:59:35 PM8/31/11
to nod...@googlegroups.com
I vote that his project be named "bike shed". You could put it in npm as bshed.

Isaac Schlueter

unread,
Sep 1, 2011, 12:41:24 AM9/1/11
to nod...@googlegroups.com
I think abbreviating "bikeshed" down to "bshed" would be confusing.
It should be called "a-shed-for-bikes", just to be more clear.

Rasmus Wikman

unread,
Sep 1, 2011, 1:22:14 AM9/1/11
to nod...@googlegroups.com

Spock?

Live long and prosper,

Rasmus

rtweed

unread,
Sep 1, 2011, 1:33:34 AM9/1/11
to nodejs
ForknThreads, its only a name guys

Actually I'm tempted to call it that now...

Rob

On Sep 1, 6:22 am, Rasmus Wikman <rasmus.wik...@gmail.com> wrote:
> Spock?
>
> Live long and prosper,
>
> Rasmus
>
> On 1 sep 2011, at 02.47, Zachary Carter wrote:
>
>
>
>
>
>
>
> > On Wed, Aug 31, 2011 at 7:19 PM, Diogo Resende <drese...@thinkdigital.pt

rtweed

unread,
Sep 1, 2011, 1:43:33 AM9/1/11
to nodejs
or how about ForknSync

Rob

Jorge

unread,
Sep 1, 2011, 3:48:21 AM9/1/11
to nod...@googlegroups.com
On 31/08/2011, at 23:33, rtweed wrote:
>
> How about you guys vote for a name and I'll rename it accordingly :-)

It's gonna need a logo, too :-P
--
Jorge.

rtweed

unread,
Sep 1, 2011, 3:55:25 AM9/1/11
to nodejs
Marak

I actually did look at hook.io in some detail before writing my
module, but decided, rightly or wrongly, it was too generic for my
purposes and to be honest it wasn't clear whether or not it was doing
stuff the way I wanted it to work (ie using a pool of pre-started
child Node processes to provide very low latency). Hook.io seemed
more aimed at distributing processing across systems, but that may
have just been my misunderstanding based on the available
documentation.

In any case, the architecture I used in threads is based on one that
I'd already developed for another of my modules (ewdGateway), so it
only took a day and a half to put threads together (much of which was
documentation), so no big waste of my time even if I did find I was
reinventing a wheel.

I've actually got a very specific purpose in mind for the threads
module: allowing the many benefits of the synchronous APIs of the
Globals database (http://globalsdb.org) to be achieved without
breaking the "thou shalt not block the thread" rule of Node. In
combination with Globals, I believe we now have the best of all
worlds: a very high-performance, very flexible database, accessible in-
process with Node, and using synchronous logic for manipulating the
database.

However, as it turns out, the threads module would seem to me to meet
a set of generic needs for the wider Node community, including:

- allowing proper (ie not simulated) sync coding to be done generally
- isolating the main server thread from damage caused by a problem in
handling a specific action
- distributing load across multiple processes and therefore across
multiple cores

... so I wanted to make it available more widely to the Node
community, at least as a starting point to achieving these goals.

I'm more than happy for others to take what I've done and improve it -
I'm personally much more interested in the high-performance database/
application side of things, so threads was just a means to an end for
me, though it would be nice to be acknowledged to have provided this
basic contribution to the wider cause :-)

Rob

rtweed

unread,
Sep 1, 2011, 4:04:08 AM9/1/11
to nodejs
>
> It's gonna need a logo, too :-P
> --
> Jorge.

Anyone want to photoshop together a picture of a fork and a sink for
me, please? ;-)

Rob

Jorge

unread,
Sep 1, 2011, 4:17:48 AM9/1/11
to nod...@googlegroups.com
On 31/08/2011, at 21:52, Marcel Laverdet wrote:

> Also joining the "why is it called 'threads' crew".
>
> I had gotten excited considering you could totally implement a worker paradigm in v8+Node with Isolates.. it's just no one has done it yet.

Launching a new plain vanilla v8 JS context in a new thread in the node process would be quite easy (thanks to isolates), the not-so-easy part would be adding node's functionality to it. For that node itself would have to have its own 'isolates' branch that took care of its own global shared state.

--
Jorge.

Reply all
Reply to author
Forward
0 new messages