Google's Go language

709 views
Skip to first unread message

Matthew Richmond

unread,
May 6, 2011, 5:27:12 AM5/6/11
to nod...@googlegroups.com
The Register has a nice high level discussion about the Go language, with a direct comparison to Node (incl. quotes by Ry) on page 4.

Has anyone ever tried Go? Any specific thoughts? What are it's pros and cons compared to Node (aside from being compiled (pro and con), multithreaded (pro))?

Cheers,

Matt

Jorge

unread,
May 6, 2011, 6:50:34 AM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 11:27, Matthew Richmond wrote:

> The Register has a nice high level discussion about the Go language, with a direct comparison to Node (incl. quotes by Ry) on page 4.
>
> Has anyone ever tried Go? Any specific thoughts? What are it's pros and cons compared to Node (aside from being compiled (pro and con), multithreaded (pro))?

Yeah, and this:

<quote>
"Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
</quote>

Is what node.js needs to address, the sooner the better, imho.

Like so:

var newSharedNothingJSThread= newJSThread(
aFunction, // The code to run in the background thread. (NOTE_1)
data, // an object *not* to be copied/duplicated/serialized nor shared: just pass it by reference and forget it here in this context.
);

newSharedNothingJSThread.on('message', messageHandlerFunction);

function messageHandlerFunction (data) {
/* here we get data back, the *same* data object that we passed in newJSThread() above */
/* It may have been mutated by the code in the other thread, and that's exactly the whole point */
}

console.log(data);
// this above should throw, or log null or undefined,
// Because the object that data was pointing to is no loger reachable in this JS context,
// it was forgotten in this context when it was passed to newJSThread: that's what 'pass and forget' means.


NOTE_1: The receiving thread receives aFunction as text ( srcText = aFunction.toString() ), and does an eval( '(' + srcText + ')' )(data); to bootstrap. The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts. The only means it has to communicate with another (its parent) JS context is to pass a message (containing a data object, that will be passed by reference in a 'pass and forget' fashion too).
--
Jorge.

Gary Katsevman

unread,
May 6, 2011, 9:56:01 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 06:50, Jorge <jo...@jorgechamorro.com> wrote:
> <quote>
> "Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
> </quote>
>
> Is what node.js needs to address, the sooner the better, imho.

But node has support for webworkers via several modules which allow
you to run in separate threads. Also, a lot of people run two
instances of node and load balance between them. There are modules for
that.
Writing code to handle shared memory multithreading is annoying.
------------------------
Gary Katsevman
Computer Science Undergraduate
Northeastern University
gkatsev.com

Ryan Gahl

unread,
May 6, 2011, 10:02:30 AM5/6/11
to nod...@googlegroups.com
This is a non-issue really. The solution is a little something called processes. They work great, just as Ryan says in his Google talk. And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).





--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


Ryan Gahl

unread,
May 6, 2011, 10:04:30 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 9:02 AM, Ryan Gahl <ryan...@gmail.com> wrote:
This is a non-issue really. The solution is a little something called processes. They work great, just as Ryan says in his Google talk. And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).

er, that is, hubris to do so in the context of "this is why Go/XYZ is better than Node..." - bottom line is it's really not that difficult in node.

Jorge

unread,
May 6, 2011, 10:05:04 AM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 15:56, Gary Katsevman wrote:
> On Fri, May 6, 2011 at 06:50, Jorge <jo...@jorgechamorro.com> wrote:
>> <quote>
>> "Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
>> </quote>
>>
>> Is what node.js needs to address, the sooner the better, imho.
>
> But node has support for webworkers via several modules which allow
> you to run in separate threads. Also, a lot of people run two
> instances of node and load balance between them. There are modules for
> that.

Inefficient and slow, WebWorkers are a shame.

> Writing code to handle shared memory multithreading is annoying.

Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?
--
Jorge.

Gary Katsevman

unread,
May 6, 2011, 10:17:17 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 10:05, Jorge <jo...@jorgechamorro.com> wrote:
>> Writing code to handle shared memory multithreading is annoying.
>
> Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?

Hm.. Yes, I did miss that. But I think spawn.js that Ryan mentioned
above does do exactly that.

Dean Landolt

unread,
May 6, 2011, 10:19:45 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 10:05 AM, Jorge <jo...@jorgechamorro.com> wrote:
On 06/05/2011, at 15:56, Gary Katsevman wrote:
> On Fri, May 6, 2011 at 06:50, Jorge <jo...@jorgechamorro.com> wrote:
>> <quote>
>> "Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
>> </quote>
>>
>> Is what node.js needs to address, the sooner the better, imho.
>
> But node has support for webworkers via several modules which allow
> you to run in separate threads. Also, a lot of people run two
> instances of node and load balance between them. There are modules for
> that.

Inefficient and slow, WebWorkers are a shame.

That's a bit harsh -- WebWorkers as implemented in processes are inefficient (still plenty good for a lot of use cases). But WebWorkers as a pattern could be made much more efficient :)

> Writing code to handle shared memory multithreading is annoying.

Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?

Well, if it doesn't share anything it may as well be a process, right? It has to share something, even if that something is immediately forgotten by the host. Calling it threading only serves to confuse folks though -- I guess what you're talking about is more like forking.

 
--
Jorge.

Dean Landolt

unread,
May 6, 2011, 10:23:43 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 10:17 AM, Gary Katsevman <m...@gkatsev.com> wrote:
On Fri, May 6, 2011 at 10:05, Jorge <jo...@jorgechamorro.com> wrote:
>> Writing code to handle shared memory multithreading is annoying.
>
> Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?

Hm.. Yes, I did miss that. But I think spawn.js that Ryan mentioned
above does do exactly that.

No, spawn uses processes and thus still uses IPC so it still has to pay the serialization penalty Jorge is complaining about.

Jorge

unread,
May 6, 2011, 10:25:53 AM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 16:02, Ryan Gahl wrote:

This is a non-issue really. The solution is a little something called processes.

I don't care much whether a background task takes place in a separate process or in separate thread, each has its own pros/cons, its use cases, both are good and both can and both should be available as an option.

They work great, just as Ryan says in his Google talk.

What for, exactly, do they "work great" for ?

And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).

No, right now it's not. Not 'pure hubris'. Not at all.

Because the current mechanisms for IPC are too slow (serialize to text, usually as JSON, and re-create the object) and too expensive (the text is passed by copy, and you've got to have 4 copies of the object just to pass it once: the original, the serialized as text copy that you want to pass, the serialized as text copy that the receiving end receives, and the instantiated as a real object in the receiving end, how's that 'hubris' ? ) to pass anything but the smallish data at any reasonable prize/speed and efficiency.
-- 
Jorge.

Ryan Gahl

unread,
May 6, 2011, 10:34:34 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 9:25 AM, Jorge <jo...@jorgechamorro.com> wrote:
No, right now it's not. Not 'pure hubris'. Not at all.

Because the current mechanisms for IPC are too slow (serialize to text, usually as JSON, and re-create the object) and too expensive (the text is passed by copy, and you've got to have 4 copies of the object just to pass it once: the original, the serialized as text copy that you want to pass, the serialized as text copy that the receiving end receives, and the instantiated as a real object in the receiving end, how's that 'hubris' ? ) to pass anything but the smallish data at any reasonable prize/speed and efficiency.


Yeah, I did regret the use of the word hubris. But don't you think some of the issues around the serialization stem from the choice in serialization format and/or the communication mechanisms? In other words, a msgpack or BSON format, or a 0mq like solution seems to me like it should perform pretty well. The cost of spawning processes can be mitigated by a simple pooling pattern, too.

I agree we could use some more sugar here, but I think the tools are there to achieve your goals without _too_ much effort... or am I all wet on this?

Jorge

unread,
May 6, 2011, 10:47:18 AM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 16:04, Ryan Gahl wrote:
On Fri, May 6, 2011 at 9:02 AM, Ryan Gahl <ryan...@gmail.com> wrote:
This is a non-issue really. The solution is a little something called processes. They work great, just as Ryan says in his Google talk. And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).

er, that is, hubris to do so in the context of "this is why Go/XYZ is better than Node..." - bottom line is it's really not that difficult in node.

Er, Ryan, do you realize who these 'go' guys are ?

I for one take it as a given that coming from *them*, 'go' must have *a*lot* of goodness and know-how into it. No doubt.
-- 
Jorge.

Ryan Gahl

unread,
May 6, 2011, 11:13:48 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 9:47 AM, Jorge <jo...@jorgechamorro.com> wrote:
Er, Ryan, do you realize who these 'go' guys are ?

I for one take it as a given that coming from *them*, 'go' must have *a*lot* of goodness and know-how into it. No doubt.


Heh, yeah, I do realize who these 'go' guys are. I'm not sure I understand how that changes things in the context of the discussion around how easy (or not) it is to create a high performance multi process solution using node today with fairly minimal work and maybe some integration with things like msgpack and 0mq?

I wonder if Peter Griess still hangs out here and might speak to how Yahoo Mail uses node, and how these issues were resolved? AFAIK they are a shining example of a real world high volume use case, and I do know they (at least at one point) were making heavy use of web-workers.

Do you disagree that choosing a better serialization format and implementing a simple pooling pattern might mitigate (at least) some of the issues you are bringing up?

Thomas Shinnick

unread,
May 6, 2011, 11:18:26 AM5/6/11
to nod...@googlegroups.com
Some know who these guys are.  Some also know that after more than 2.5 years of development they hadn't yet decided what to do about "exceptions".  Go smelled to me after discovering that, even though they supposedly fixed that shortly after. 

Hype seems attached to Go more than reality.  Perhaps after a year more, more real world has happened?  I'll go read the article.  I'm really tired of hype and fadism.

Dean Landolt

unread,
May 6, 2011, 11:29:21 AM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 11:13 AM, Ryan Gahl <ryan...@gmail.com> wrote:
On Fri, May 6, 2011 at 9:47 AM, Jorge <jo...@jorgechamorro.com> wrote:
Er, Ryan, do you realize who these 'go' guys are ?

I for one take it as a given that coming from *them*, 'go' must have *a*lot* of goodness and know-how into it. No doubt.


Heh, yeah, I do realize who these 'go' guys are. I'm not sure I understand how that changes things in the context of the discussion around how easy (or not) it is to create a high performance multi process solution using node today with fairly minimal work and maybe some integration with things like msgpack and 0mq?

Sure, you can squeeze quite a lot out of IPC but there are some fundamental limitations (as Jorge has pointed out) and some use cases that really demand some kind of memory hand-off. Still, whether the "go" guys have it figured out is questionable -- I'm betting on those "rust" guys.
 

I wonder if Peter Griess still hangs out here and might speak to how Yahoo Mail uses node, and how these issues were resolved? AFAIK they are a shining example of a real world high volume use case, and I do know they (at least at one point) were making heavy use of web-workers.

Do you disagree that choosing a better serialization format and implementing a simple pooling pattern might mitigate (at least) some of the issues you are bringing up?

He's not denying that you can get suitable performance with IPC, just that there are classes of problems that need shared memory. There was a thread about this a few months back -- tl;dr I argue WebWorkers could be done in threads allowing them to be passed deeply frozen objects in a sandbox. The objection is the freeze part, with the alternative being a special node host object (a la Buffer) that forgets the object once it's passed to a thread. Interesting idea, and also sidesteps the shared memory problem, but seems a little magical for my tastes.

Jorge

unread,
May 6, 2011, 11:39:06 AM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 16:19, Dean Landolt wrote:
> On Fri, May 6, 2011 at 10:05 AM, Jorge <jo...@jorgechamorro.com> wrote:
> On 06/05/2011, at 15:56, Gary Katsevman wrote:
>> >
>> > But node has support for webworkers via several modules which allow
>> > you to run in separate threads. Also, a lot of people run two
>> > instances of node and load balance between them. There are modules for
>> > that.
>>
>> Inefficient and slow, WebWorkers are a shame.
>
> That's a bit harsh

Yes, perhaps :-)

> -- WebWorkers as implemented in processes are inefficient (still plenty good for a lot of use cases).

True. WebWorkers are good as long as workers don't communicate too much nor too often.

> But WebWorkers as a pattern could be made much more efficient :)

Exactly!

>> > Writing code to handle shared memory multithreading is annoying.
>>
>> Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?
>
> Well, if it doesn't share anything it may as well be a process, right?

Right. And sometimes a separate process is better than a thread, and it's a must if you want to run the background task in another machine.

> It has to share something, even if that something is immediately forgotten by the host. Calling it threading only serves to confuse folks though -- I guess what you're talking about is more like forking.

Because JavaScripters hear "thread" and they automatically yell "threads suck, threads are evil", but that's no so, not necessarily, it's just a cliché.

What sucks is shared mutable data, and it sucks because it requires synchronization, and synchronization issues often become quite difficult to reason about.

But a thread that shares nothing, is no more evil than a separate process, can be spawned much much faster than a separate process, and 'IPC' between threads is cheaper, faster, easier, and much more efficient than between processes.
--
Jorge.

Jorge

unread,
May 6, 2011, 11:55:34 AM5/6/11
to nod...@googlegroups.com
Mitigate, slightly, perhaps, yes.

But Ryan, no matter how fast you can serialize, passing a pointer (64 bits) to an object, is always going to be many orders of magnitude faster than serializing an object to text + transferring the text + receiving the text into a buffer + parsing it + finally re-creating a copy of the original object.

It's not only faster, it also requires ~nil cpu and almost ~nil memory and almost ~nil resources (memory bus bandwidth, cpu share, etc).
-- 
Jorge.

Ryan Gahl

unread,
May 6, 2011, 12:00:48 PM5/6/11
to nod...@googlegroups.com
On Fri, May 6, 2011 at 10:55 AM, Jorge <jo...@jorgechamorro.com> wrote:
Mitigate, slightly, perhaps, yes.

But Ryan, no matter how fast you can serialize, passing a pointer (64 bits) to an object, is always going to be many orders of magnitude faster than serializing an object to text + transferring the text + receiving the text into a buffer + parsing it + finally re-creating a copy of the original object.

It's not only faster, it also requires ~nil cpu and almost ~nil memory and almost ~nil resources (memory bus bandwidth, cpu share, etc).


Fair enough. I guess I've just not yet reached the tipping point where this has become a major issue for me. I'm looking forward to seeing what solutions the community derives from these discussions though.

Bradley Meck

unread,
May 6, 2011, 12:04:11 PM5/6/11
to nod...@googlegroups.com
for clustered computing this is not true. which i think for most major distributions of node is more important.

the intricacies of passing pointers in js is pretty nasty, particularly since we have c++ addons and do not want to have side effects visible from both sides. basically the only way to implement go's style of shared memory (well similar to, not an exact impl) is to wait for an Object to be GCable and all of its references internally to be GCable, and even if we can detect that well, it must have a fallback to prevent deadlock which is non-trivial (but doable in an actor model).

just like pipelined multi-core cpus may not give performance gains and you should determine the volatility of your objects if you truly want performance, I think more thought should go into the theory of how this is done rather than "x is faster than y in case z" even if z is a general case it can have vast implications of what cases are not covered.

Dean Landolt

unread,
May 6, 2011, 12:39:28 PM5/6/11
to nod...@googlegroups.com
A thread that shares nothing is not a thread in the common nomenclature. It's another construct entirely, just built on top of threads, like green threads or light-weight processes. The ol' "threads suck" is more than cliché when threads ~ shared memory :)

Liam

unread,
May 6, 2011, 1:32:11 PM5/6/11
to nodejs
On May 6, 8:55 am, Jorge <jo...@jorgechamorro.com> wrote:
> On 06/05/2011, at 17:13, Ryan Gahl wrote:
> > Do you disagree that choosing a better serialization format and implementing a simple pooling pattern might mitigate (at least) some of the issues you are bringing up?
>
> Mitigate, slightly, perhaps, yes.

FWIW, V8 JSON is now faster than msgpack: https://github.com/aikar/wormhole/issues/3

Also, if one were to go with a multi-process design (vs threads),
objects could be transferred via mmap'd buffers, no? Presumably that
requires a deep-copy pass, with "new String(obj.str)" etc, which is a
kind of serialization, but perhaps fast enough for most cases?

Arnout Kazemier

unread,
May 6, 2011, 1:45:26 PM5/6/11
to nod...@googlegroups.com
*prepare for brain fart*

V8 is designed to be single threaded, that is one of the major issues here. The current JavaScript engines where never `designed` to be used on the server so going around these
limitation will probably require allot engineering effort from both the V8 team and the Node core team.

The main purpose of going multi threaded is that you want to processes allot CPU intensive information and you want to minimize the overhead this has on the current thread.  So I wonder
can't we just leverage the build in WebWorker technology that is probably already build in the V8 script to do this? 

Bradley Meck

unread,
May 6, 2011, 1:49:46 PM5/6/11
to nod...@googlegroups.com
Unfortunately allocating objects into mmap'ed memory is still going to allow shared memory errors unless all of it is frozen (even c++ parts). and freezing objects slows them, not to mention that freezing them makes an argument as to speed gains in general from immutability

Jorge

unread,
May 6, 2011, 2:16:45 PM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 18:39, Dean Landolt wrote:
>
>
> A thread that shares nothing is not a thread in the common nomenclature. It's another construct entirely (...)

How so ? Threads share the process' memory space, if that's what you mean, but it's up to the programmer to decide what data they share.
--
Jorge.

Tim Smart

unread,
May 6, 2011, 2:27:22 PM5/6/11
to nod...@googlegroups.com
If efficiency is something that really matters to you, then you would
construct the IPC pattern yourself, depending on the communication
that needs to happen. No one said you have to serialize with JSON.

Web Workers are a convenience library on top of child processes and a
pipe, and you usually sacrifice performance for convenience.

Tim.

Liam

unread,
May 6, 2011, 2:35:36 PM5/6/11
to nodejs
Not suggesting *allocating into* mmap buffers. When handing data to a
process, mmap.put() does a deep-copy to shmem, then mmap.get()
rebuilds the object on the other side on the heap. And there's a
signal-driven queue for each process in mmap which emits events.

No JS code would have access to mmap contents.

Liam

unread,
May 6, 2011, 2:48:51 PM5/6/11
to nodejs
At the risk of taking some heat, this quote from the article stood
out...

What's more, Gerrand argues, Go doesn't force developers to embrace
the asynchronous ways of event-driven programming. "With goroutines
and channels, you can say 'I'm going to send a message to another
goroutine or wait for a message from another goroutine', but you don't
have to actually leave the function you're in to do that," Gerrand
says. "That lets you write asynchronous code in a synchronous style.
As people, we're much better suited to writing about things in a
synchronous style."

On May 6, 2:27 am, Matthew Richmond <matthew.j.richm...@gmail.com>
wrote:
> The Register has a nice high level
> discussion<http://www.theregister.co.uk/2011/05/05/google_go/> about
> the Go language, with a direct comparison to Node (incl. quotes by Ry) on page
> 4 <http://www.theregister.co.uk/2011/05/05/google_go/page4.html>.
>
> Has anyone ever tried Go? Any specific thoughts? What are it's pros and cons
> compared to Node (aside from being compiled (pro and con), multithreaded
> (pro))?
>
> Cheers,
>
> Matt

Bradley Meck

unread,
May 6, 2011, 3:06:44 PM5/6/11
to nod...@googlegroups.com
I was referring more to closures / internal properties / c++ objects / gc lifetimes.

Jorge

unread,
May 6, 2011, 3:23:04 PM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 17:18, Thomas Shinnick wrote:

> Some know who these guys are. Some also know that after more than 2.5 years of development they hadn't yet decided what to do about "exceptions". Go smelled to me after discovering that, even though they supposedly fixed that shortly after.
>
> Hype seems attached to Go more than reality. Perhaps after a year more, more real world has happened? I'll go read the article. I'm really tired of hype and fadism.

Hype ?

Perhaps, but when one is 60-something, one's got quite a bunch of decades of experience...
--
Jorge.

Jorge

unread,
May 6, 2011, 3:27:58 PM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 18:04, Bradley Meck wrote:

> for clustered computing this is not true. which i think for most major distributions of node is more important.

It's almost right already for clusters, but it's quite wrong for a single machine with multiple cores.

> the intricacies of passing pointers in js is pretty nasty, particularly since we have c++ addons and do not want to have side effects visible from both sides. basically the only way to implement go's style of shared memory (well similar to, not an exact impl) is to wait for an Object to be GCable and all of its references internally to be GCable, and even if we can detect that well, it must have a fallback to prevent deadlock which is non-trivial (but doable in an actor model).
>
> just like pipelined multi-core cpus may not give performance gains and you should determine the volatility of your objects if you truly want performance, I think more thought should go into the theory of how this is done rather than "x is faster than y in case z" even if z is a general case it can have vast implications of what cases are not covered.

There's a reason why almost every programming language has pass-by-reference. You insist on that pass-by-copy is good enough (even a requisite) for such and such use cases, but that's not the point. Copying is just plainly wrong for many other use cases that *require* pass-by-reference.

So pass and forget is -imo- the only way to go, if we want maximum speed and shared-nothingness. And yes, these objects are special, so we'd need to touch the v8 source for this.
--
Jorge.

Jorge

unread,
May 6, 2011, 3:33:24 PM5/6/11
to nod...@googlegroups.com
On 06/05/2011, at 20:48, Liam wrote:

> At the risk of taking some heat, this quote from the article stood
> out...
>
> What's more, Gerrand argues, Go doesn't force developers to embrace
> the asynchronous ways of event-driven programming. "With goroutines
> and channels, you can say 'I'm going to send a message to another
> goroutine or wait for a message from another goroutine', but you don't
> have to actually leave the function you're in to do that," Gerrand
> says. "That lets you write asynchronous code in a synchronous style.
> As people, we're much better suited to writing about things in a
> synchronous style."


LOL. I was waiting for somebody to comment on that.

Where's Marcel ?
Where's Kyle ?

:-)
--
Jorge.

Nick

unread,
May 6, 2011, 3:49:52 PM5/6/11
to nodejs
In the context of the common use case for Node--a lightweight web
application stack--it doesn't make sense to adopt a multithreaded
model. If the goal is to build an application that can handle high
volume, it makes far more sense to load-balance and segregate than it
does to build a single application that manages that complexity
itself. If your application is a single-threaded process, you can spin
up additional instances on virtual machines as your load increases and
have them communicate with one another as a cluster. An application
instance can access another instance without knowing if that distant
process is on the same machine or on another continent. If each node
is autonomous, individual nodes might go down, while the application
remains running.

This is slower than multithreading, but has a few advantages: it can
easily scale, since your architecture is built to communicate
abstractly with other instances anyway; it's easier to reason about;
and it's easier to prototype against, since they have a well-known
process-agnostic mechanism for interfacing.

If you're running a threaded application on a single machine, you
can't just flip on another copy on a new VM and expect them to act in
concert. You'd have to build the clustering software, as described
above above, *and* deal with the additional complexity of dealing with
concurrent code. If that's really what you want, you should be looking
at solutions that are known to work in this problem domain. Enterprise
Java comes to mind.

Bradley Meck

unread,
May 6, 2011, 4:35:14 PM5/6/11
to nod...@googlegroups.com
I'm not arguing against passing objects. I'm trying to bring up that it is not terribly simple given our programming environment. I have long threads in IRC and the mailing list about basically this; and I believe, until someone figures out the logic so we *can* safely pass by reference in our environment no future dependent assumptions should be made. Freezing and pass by death are the only ways I can see working, so I wanted to bring that up (neither of which would require copying).

As for the cluster computing, if locality is consistent yes it is acceptable. However, if you have large enough documents or enough volatility it is more painful that serialization for performance.

Jorge

unread,
May 7, 2011, 5:08:02 AM5/7/11
to nod...@googlegroups.com

Yes, higher level abstractions are all right, but higher level abstractions are supported by lower level code.

Don't optimize the lower level, make it run orders of magnitude slower than it should, and you'll be doing it Wrong™ anyways.

I mean non-optimized as in, to merely pass an object containing 1MB of data:

A- parse a 1MB object + serialize into +1MB of text + transfer +1MB + copy +1MB + parse +1MB + re-instantiate the 1MB object

versus

B- grab a reference to it (~ copy 8 bytes).

One way (B) is several millions times faster than the other (A).

Perhaps if you want to pass just 3 bytes, it would not matter much.
Perhaps if you want to pass it just once or not too often, it might be all right too.

But don't pretend that's going to be always the case.
--
Jorge.

Bruno Jouhier

unread,
May 7, 2011, 8:18:10 AM5/7/11
to nodejs
How good is Go? There seems to be a discrepancy between what they
advertise in their intro material (http://golang.org/doc/
effective_go.html#concurrency):

"if the communication is the synchronizer, there's still no need for
other synchronization"

and what you find when you look under the hood (http://golang.org/doc/
go_mem.html#tmp_18):

"When multiple goroutines access a shared variable v, they must use
synchronization events to establish happens-before conditions that
ensure reads observe the desired writes" (followed by the doc of a
lock API).

This is actually no surprise: if a goroutine can access mutable
objects from a parent scope, and if multiple gorountines run
concurrently, how could this be safe without locks?

I've not tried Go but from what I see it just looks like Go is just
another threaded system with channels in addition to locks. And I
don't like the way they describe their concurrency model because it
makes it sound like it avoids the well-know pitfalls of threading,
when it really doesn't.

Node.js is based on simple sound principles: no threads, only async
APIs (except for startup and require), share nothing. It would be
very bad to let threads (real ones) creep in.

The real issue then is to have an efficient way to pass messages
between processes. There are obviously two ways: copying the data and
sharing immutable data. Why couldn't we have both? Copying is good
when the data is small but being able to share immutable data is
important when the data is big.

Bruno

Jorge

unread,
May 7, 2011, 8:42:51 AM5/7/11
to nod...@googlegroups.com

On 07/05/2011, at 14:18, Bruno Jouhier wrote:

> How good is Go? There seems to be a discrepancy between what they
> advertise in their intro material (http://golang.org/doc/
> effective_go.html#concurrency):
>
> "if the communication is the synchronizer, there's still no need for
> other synchronization"
>
> and what you find when you look under the hood (http://golang.org/doc/
> go_mem.html#tmp_18):
>
> "When multiple goroutines access a shared variable v, they must use
> synchronization events to establish happens-before conditions that
> ensure reads observe the desired writes" (followed by the doc of a
> lock API).
>
> This is actually no surprise: if a goroutine can access mutable
> objects from a parent scope, and if multiple gorountines run
> concurrently, how could this be safe without locks?
>
> I've not tried Go but from what I see it just looks like Go is just
> another threaded system with channels in addition to locks. And I
> don't like the way they describe their concurrency model because it
> makes it sound like it avoids the well-know pitfalls of threading,
> when it really doesn't.
>
> Node.js is based on simple sound principles: no threads, only async
> APIs (except for startup and require), share nothing. It would be
> very bad to let threads (real ones) creep in.

What would be bad is to let shared mutable data creep in, *not* threads.

> The real issue then is to have an efficient way to pass messages
> between processes. There are obviously two ways: copying the data and
> sharing immutable data.

There's a third way: pass-and-forget mutable data.

> Why couldn't we have both? Copying is good
> when the data is small but being able to share immutable data is
> important when the data is big.

There's no need to copy, and there's no need for immutability.
--
Jorge.

Bruno Jouhier

unread,
May 7, 2011, 9:29:18 AM5/7/11
to nodejs
> What would be bad is to let shared mutable data creep in, *not* threads.

Yes.
>
> > The real issue then is to have an efficient way to pass messages
> > between processes. There are obviously two ways: copying the data and
> > sharing immutable data.
>
> There's a third way: pass-and-forget mutable data.

Yes but that may be a bit tricky to implement (the GC side). And maybe
costly if you want to do it in safe way (you have to guarantee that
all references to all objects of the graph are gone). This probably
means allocating these objects in a special way (via a special pool)
and having some kind of weak reference on them.

>
> > Why couldn't we have both? Copying is good
> > when the data is small but being able to share immutable data is
> > important when the data is big.
>
> There's no need to copy, and there's no need for immutability.

With pass-and-forget, yes.

> --
> Jorge.

Rinie Kervel

unread,
May 7, 2011, 1:28:29 PM5/7/11
to nodejs
But when you share you also need nasty synchronization primitives that
block

Rinie

Jorge

unread,
May 7, 2011, 1:38:13 PM5/7/11
to nod...@googlegroups.com
On 07/05/2011, at 19:28, Rinie Kervel wrote:
> On May 6, 8:16 pm, Jorge <jo...@jorgechamorro.com> wrote:
>> On 06/05/2011, at 18:39, Dean Landolt wrote:
>>
>>> A thread that shares nothing is not a thread in the common nomenclature. It's another construct entirely (...)
>>
>> How so ? Threads share the process' memory space, if that's what you mean, but it's up to the programmer to decide what data they share.
>
> But when you share you also need nasty synchronization primitives that
> block

Exactly, and then it may begin to suck. That's why we want shared-nothing threads.
--
Jorge.

Liam

unread,
May 7, 2011, 2:06:39 PM5/7/11
to nodejs
So how does erlang deal with all this, does it have shared data, super
efficient msg passing?

Since modifying V8 seems unpopular, what's the next fastest method?
Processes and an mmap buffer?

Rinie Kervel

unread,
May 7, 2011, 2:48:52 PM5/7/11
to nodejs
But why then pass data from one to the other, can't they get their own
data?

> --
> Jorge.

Stuart

unread,
May 7, 2011, 7:44:50 PM5/7/11
to nodejs


On May 7, 2:06 pm, Liam <networkimp...@gmail.com> wrote:

> So how does erlang deal with all this, does it have shared data, super
> efficient msg passing?

It has been a long while since I dabbled with Erlang, so I am not 100%
on this, but as far as I recall:

1) Erlang has super efficient message passing (because message passing
is intrinsic to Erlang, and is "baked in" for efficiency)

2) Erlang has no shared data.

3) This is crucial: Erlang has *no* mutable data, so even were data
shared, it wouldn't make a difference because you can't modify it.
Yes, I know that the idea of ALL data being non-mutable is a wildly
bizarre idea, but Erlang proves that you can program with such a
constraint.

4) Design objectives of Joe's (Armstrong) were that Erlang be
massively distributed, concurrent, and utterly fault tolerant. It is
difficult to "back engineer" these objectives into a language/system
not designed for such at the outset.

5) Erlang has tail recursion. Of all the Erlang idiom and concepts
that are valuable that could be brought into V8, my greatest affinity
is to see tail recursion.

I hope I am on the mark. If there are any Erlang experts reading this,
please confirm or correct.

Dean Landolt

unread,
May 8, 2011, 10:37:25 AM5/8/11
to nod...@googlegroups.com
On Sat, May 7, 2011 at 7:44 PM, Stuart <stu...@yellowhelium.com> wrote:


On May 7, 2:06 pm, Liam <networkimp...@gmail.com> wrote:

> So how does erlang deal with all this, does it have shared data, super
> efficient msg passing?

It has been a long while since I dabbled with Erlang, so I am not 100%
on this, but as far as I recall:

1) Erlang has super efficient message passing (because message passing
is intrinsic to Erlang, and is "baked in" for efficiency)

2) Erlang has no shared data.

3) This is crucial:  Erlang has *no* mutable data, so even were data
shared, it wouldn't make a difference because you can't modify it.
Yes, I know that the idea of ALL data being non-mutable is a wildly
bizarre idea, but Erlang proves that you can program with such a
constraint.

It's not bizarre. It's incredibly common and a stable of purely functional languages.
 

4) Design objectives of Joe's (Armstrong) were that Erlang be
massively distributed, concurrent, and utterly fault tolerant. It is
difficult to "back engineer" these objectives into a language/system
not designed for such at the outset.

That's silly. These goals are achieved through properties of the language, all of which can be added to javascript (many of which already are, or are planned to be). The only real issue is legacy and backcompat concerns, but node is already one take on Javascript: the Good Distributed Parts anyway -- I bet there will be many more takes as the language evolves.
 

5) Erlang has tail recursion. Of all the Erlang idiom and concepts
that are valuable that could be brought into V8, my greatest affinity
is to see tail recursion.

Exactly (I wrote the above comment before I read this :)

Just so you know, tail recursion is already in for es-next. 

I hope I am on the mark. If there are any Erlang experts reading this,
please confirm or correct.

I'm no erlang expert, but I've studied a few functional languages. As you pointed out, immutability is key -- it lets the runtime take all kinds of useful shortcuts. People hate on Object.freeze but immutability is already in javascript. It forces you to rethink your algorithms, but people seem to get by in erlang and haskell and such -- we could learn a little from them and do wildly efficient WebWorkers without adding any new primitives to the language.

Still, as has been pointed out by many in this thread, as a network library it's a bit outside node's core competency anyway -- processes will usually be Good Enough.

Dean Landolt

unread,
May 8, 2011, 10:37:59 AM5/8/11
to nod...@googlegroups.com
3) This is crucial:  Erlang has *no* mutable data, so even were data
shared, it wouldn't make a difference because you can't modify it.
Yes, I know that the idea of ALL data being non-mutable is a wildly
bizarre idea, but Erlang proves that you can program with such a
constraint.

It's not bizarre. It's incredibly common and a stable of purely functional languages.

Ugh. s/stable/staple

Nick

unread,
May 8, 2011, 12:46:49 PM5/8/11
to nodejs
On May 7, 5:08 am, Jorge <jo...@jorgechamorro.com> wrote:
> On 06/05/2011, at 21:49, Nick wrote:
>
> I mean non-optimized as in, to merely pass an object containing 1MB of data:
>
> ...
>
> Perhaps if you want to pass just 3 bytes, it would not matter much.
> Perhaps if you want to pass it just once or not too often, it might be all right too.
>
> But don't pretend that's going to be always the case.

Always? No. But often. As I said, the common use case for Node is a
lightweight web application stack. In that context, I don't see how
being able to pass around megabyte-sized pages with a memory pointer
really buys you much for the potential headaches involved. Also
remember that performance is all relative to what you're actually
doing: if you're passing around giant documents between processes, but
you're only doing irregularly (a few dozen times per second?) your
performance will be acceptable in the majority of cases.

Diogo Resende

unread,
May 9, 2011, 9:16:11 AM5/9/11
to nod...@googlegroups.com
On Sun, May 8, 2011 at 5:46 PM, Nick <nhu...@gmail.com> wrote:
> As I said, the common use case for Node is a
> lightweight web application stack.
I don't think lightweight has nothing to do with this. The stack can
still be lightweight and still allow threading.

Imagine a lightweight web app the does audio transcoding. Wouldn't be
good to use 4/8 cores? If your alternative is to add a proxy and
distribute for 4/8 procs, I don't think that's lightweight..

Bradley Meck

unread,
May 9, 2011, 9:39:58 AM5/9/11
to nod...@googlegroups.com
There is a large difference in pipelined performance scaling for threading and non-shared parallelism across cores. Either way, too much theory mongering here, need more product.

David Mitchell

unread,
May 12, 2011, 7:08:55 AM5/12/11
to nod...@googlegroups.com
You're pretty much spot on.

The only thing I'd add to point 4 is that it's OTP, not core Erlang, that adds the massive failover / redundancy.  In general, you don't need to bother with stuff like validating input; if your process gets bad data, you let it fail, then use OTP to recover, log exceptions, etc.  No more defensive code ... mmm.

On top of that, you haven't lived till you've performed hot updates on your code without skipping a transaction.  Relatively easy to do in Erlang/OTP, but it blows your mind the first time you see it in action.  That's how Erlang gets massive uptime.

I'd happily code only in Erlang, except that the enterprise world runs J2EE and .NET and there's easier (not necessarily better) ways to build the plumbing for these than Erlang.

Regards

Dave M.

Nick

unread,
May 12, 2011, 5:53:07 PM5/12/11
to nodejs
I think you're focusing on the wrong part of the sentence: "web
application stack" is the key part. Lightweight is there to
differentiate it from something like Spring.

The use case you mention offers a few questions: How many web apps out
there do audio transcoding? Would you really want to do audio
transcoding in javascript? How often can you farm out the transcoding
process to something like lame? Could you get a serviceable result
with a single thread? Could you get a serviceable result with web
workers? I don't know the answers to any of those questions.

The core of my criticism is this: Adding multithreading to javascript
is mashing a low-level feature into a high-level language that it
wasn't designed for, and there just aren't that many use cases in the
common problem domain where multithreading provides the only path to
acceptable performance. I can't think of many cases where you'll be
passing around three megabyte pages between processes, and those few
that I can think of would be better served by being written in a
language suited to the problem at hand.



On May 9, 9:16 am, Diogo Resende <diog...@gmail.com> wrote:

Jorge

unread,
May 12, 2011, 7:14:45 PM5/12/11
to nod...@googlegroups.com
On 12/05/2011, at 23:53, Nick wrote:

> (...)


>
> The core of my criticism is this: Adding multithreading to javascript
> is mashing a low-level feature into a high-level language that it
> wasn't designed for, and there just aren't that many use cases in the
> common problem domain where multithreading provides the only path to
> acceptable performance. I can't think of many cases where you'll be
> passing around three megabyte pages between processes, and those few
> that I can think of would be better served by being written in a
> language suited to the problem at hand.

Like when you need to process 3000 1kB templates for 3000 concurrent users ?
--
Jorge.

Ryan Gahl

unread,
May 12, 2011, 9:33:31 PM5/12/11
to nod...@googlegroups.com

On Thu, May 12, 2011 at 6:14 PM, Jorge <jo...@jorgechamorro.com> wrote:
Like when you need to process 3000 1kB templates for 3000 concurrent users ?


You really shouldn't be processing templates after the app has started up. I know you were just making a point, but I really hope you don't do that :)

Ryan Gahl

unread,
May 12, 2011, 9:38:02 PM5/12/11
to nod...@googlegroups.com
That came out wrong... sorry... was just meaning to say that your example would most likely be something that is highly un-optimized. 

Jorge

unread,
May 13, 2011, 3:41:38 AM5/13/11
to nod...@googlegroups.com

Each template with each user's data ?

How could I do that before the app starts, and before the clients connect to the server, and before they request the page ?
--
Jorge.

Ryan Gahl

unread,
May 13, 2011, 9:57:51 AM5/13/11
to nod...@googlegroups.com

On Fri, May 13, 2011 at 2:41 AM, Jorge <jo...@jorgechamorro.com> wrote:
Each template with each user's data ?

How could I do that before the app starts, and before the clients connect to the server, and before they request the page ?


This is slightly OT, but for that case you should delegate templating to the client and only send the data from the server. Server side templating should only be used for application composition. That is, if you are going for optimal. If your argument is "threading would make suboptimal systems faster"... well sure, but optimizing the system is probably a better first step.

All I'm saying is for your example there are bigger issues than "not being able to use threads".

Jorge

unread,
May 13, 2011, 4:51:54 PM5/13/11
to nod...@googlegroups.com

The bigger issue IMO is that in node you can't/shouldn't block the main thread, and while you can do I/O in (a) background thread(s), you can't fill a template in (a) background thread(s).

And the solution can't be not to fill templates, when the problem is filling templates.
--
Jorge.

Ryan Gahl

unread,
May 13, 2011, 5:13:27 PM5/13/11
to nod...@googlegroups.com
For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer), this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.). You just don't pass the entire template and user data to the process via IPC, and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.




--
Jorge.

Jorge

unread,
May 13, 2011, 6:26:22 PM5/13/11
to nod...@googlegroups.com
On 13/05/2011, at 23:13, Ryan Gahl wrote:

> For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer)

It's good for web-apps, and good for private data, but when you want to have the contents indexed by google, you better serve the pages totally well constructed.

> this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.).
>
> You just don't pass the entire template and user data to the process via IPC,

No ?

> and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.

So what does the child process do with the result, after filling the template ?
--
Jorge.

ChrisAustinLane

unread,
May 13, 2011, 9:43:15 PM5/13/11
to nod...@googlegroups.com
Seems you could read and parse the template into lambdas that you then call with the appropriate data at run time.

Thanks,
Chris Austin-Lane
Sent from a cell phone

Ryan Gahl

unread,
May 13, 2011, 10:09:31 PM5/13/11
to nod...@googlegroups.com
On Fri, May 13, 2011 at 5:26 PM, Jorge <jo...@jorgechamorro.com> wrote:
On 13/05/2011, at 23:13, Ryan Gahl wrote:

> For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer)

It's good for web-apps, and good for private data, but when you want to have the contents indexed by google, you better serve the pages totally well constructed.

Agreed. This can be accomplished via server side templating that is done outside the request pipeline (i.e. app composition). We do this step when our app is starting up, so the end result is that everything has been pre-processed either into static files, stored in memory, or compiled functions before we start listening for requests. Where dynamic sections (what you may call partials) are concerned, we avoid any in-proc server side template processing. The pages are SEO-kosher.
 

> this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.).
>
> You just don't pass the entire template and user data to the process via IPC,

No ?

No, you pass the stream FD(s) to the process, and the minimal amount of data required to instruct it what you want it to do (just like a function call). These processes can be like your internal service APIs for things that require processing.
 

> and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.

So what does the child process do with the result, after filling the template ?

If the child process has the FD, the child process writes directly to the stream and can even close it. Or you can write to it and send an "i'm done" message back to the parent process. Point being, there is no need to be passing that huge chunk of data across process boundaries.

Jorge

unread,
May 14, 2011, 4:51:39 AM5/14/11
to nod...@googlegroups.com
On 14/05/2011, at 04:09, Ryan Gahl wrote:
On Fri, May 13, 2011 at 5:26 PM, Jorge <jo...@jorgechamorro.com> wrote:
On 13/05/2011, at 23:13, Ryan Gahl wrote:

> For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer)

It's good for web-apps, and good for private data, but when you want to have the contents indexed by google, you better serve the pages totally well constructed.

Agreed. This can be accomplished via server side templating that is done outside the request pipeline (i.e. app composition). We do this step when our app is starting up, so the end result is that everything has been pre-processed either into static files, stored in memory, or compiled functions before we start listening for requests. Where dynamic sections (what you may call partials) are concerned, we avoid any in-proc server side template processing. The pages are SEO-kosher.
 

> this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.).
>
> You just don't pass the entire template and user data to the process via IPC,

No ?

No, you pass the stream FD(s) to the process, and the minimal amount of data required to instruct it what you want it to do (just like a function call). These processes can be like your internal service APIs for things that require processing.
 

> and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.

So what does the child process do with the result, after filling the template ?

If the child process has the FD, the child process writes directly to the stream and can even close it. Or you can write to it and send an "i'm done" message back to the parent process. Point being, there is no need to be passing that huge chunk of data across process boundaries.
 

So, as you can't spawn a thread (which is fast and does not need any expensive IPC), you want to launch a (child) process per client.

And as you still can't block the main thread in the child process, you still can't/shouldn't handle more than one concurrent fd/connection per child process.

And if you've got any other client/session data to keep, you either pass it too via IPC to the delegate process or you lose it: the context of a client is more than its fd/network socket.

You insist that all that is good, but it isn't. Not for every use case. Sometimes what you'd need is not all that jazz, but a simple way to run a simple function(data) in the same (main/parent) process in the background, without paying the cost of spawning a new process form scratch and recreating the context that you already had in the parent.

Node solves wonderfully the problem of blocking IO tasks (runs them in a pool of background threads thanks to libeio), but it needs to find a way to solve too the problem of blocking cpu-bound tasks, by running them in background threads.

Imagine for a second that you were right, and that your proposal were the right thing to do. Why, then, does node use threads for IO instead of child processes ?

Because child processes are not the silver bullet.
-- 
Jorge.

Ryan Gahl

unread,
May 14, 2011, 10:29:46 AM5/14/11
to nod...@googlegroups.com
On Sat, May 14, 2011 at 3:51 AM, Jorge <jo...@jorgechamorro.com> wrote:
So, as you can't spawn a thread (which is fast and does not need any expensive IPC), you want to launch a (child) process per client.

And as you still can't block the main thread in the child process, you still can't/shouldn't handle more than one concurrent fd/connection per child process.

And if you've got any other client/session data to keep, you either pass it too via IPC to the delegate process or you lose it: the context of a client is more than its fd/network socket.

You insist that all that is good, but it isn't. Not for every use case. Sometimes what you'd need is not all that jazz, but a simple way to run a simple function(data) in the same (main/parent) process in the background, without paying the cost of spawning a new process form scratch and recreating the context that you already had in the parent.

Node solves wonderfully the problem of blocking IO tasks (runs them in a pool of background threads thanks to libeio), but it needs to find a way to solve too the problem of blocking cpu-bound tasks, by running them in background threads.

Imagine for a second that you were right, and that your proposal were the right thing to do. Why, then, does node use threads for IO instead of child processes ?

Because child processes are not the silver bullet.


No, not a child process per connection... a pooled collection of processes via a pooling construct that can be tuned. The important bit is to make sure the web server process is handing off work and getting back to its job of handling/routing/responding to requests. We have a generic 'resource pool' construct that we also use to pool server side DOMs (via jsdom)... it's a handy pattern for expensive things.

All the details aside though (the details aren't important for this conversation), all I'm saying is that I believe for this example (server side templating), you can solve your problems by reconstructing the lifecycle of how and when you do things. If someone told me they have a 1k template that must be processed on every request, my first reflex is to say that I'm willing to bet the bulk of that 1K is not unique to the request, and even once the really truly unique bits of that template were compiled into lambdas, you would be able to avoid per-request template processing, and even if some processing were deemed absolutely requierd, that processing can be offloaded from the web server process (in node today, without adding threads).

But you're right, processes are not a silver bullet and I am losing sight of your main point, which was that for a certain class of problems threads would be better. I won't begrudge you that; I'm just trying to offer some alternative ways to solve the subset of those cases where one's first reaction might be "I really need threads for this" when it could be thought about and solved differently (IMHO). Which is also to say I'm in the camp that believes trying to add threading to a javascript interpreter offers more problems than benefits (again, for the majority of use cases).

All that said, I'm not meaning to be argumentative. I really just want to help.

Practically speaking (because we both know threads aren't coming to node any time soon), have you taken a good hard look at zeromq yet? In case not, the node binding is here: https://github.com/JustinTulloss/zeromq.node. No... it's not the answer to "threads are missing", but it's damn fast and offers a really nice on-machine scaling alternative to get to as close to CPU bound as you want to get, with the added benefit of being able to scale out using the same API.

Ryan Gahl

unread,
May 14, 2011, 10:56:41 AM5/14/11
to nod...@googlegroups.com
On Sat, May 14, 2011 at 9:29 AM, Ryan Gahl <ryan...@gmail.com> wrote:
Practically speaking (because we both know threads aren't coming to node any time soon), have you taken a good hard look at zeromq yet? In case not, the node binding is here: https://github.com/JustinTulloss/zeromq.node. No... it's not the answer to "threads are missing", but it's damn fast and offers a really nice on-machine scaling alternative to get to as close to CPU bound as you want to get, with the added benefit of being able to scale out using the same API.


Come to think of it, I would think if one were so inclined, one could create an even tighter binding to http://api.zeromq.org/2-1:zmq-inproc via cpp land and expose a js-land API to get pretty damn close to something that resembles what you're asking for. Hmm... feasible sounding? (again not a node-native solution, but potentially interesting, no?)

Bradley Meck

unread,
May 16, 2011, 9:34:09 AM5/16/11
to nod...@googlegroups.com
Indeed, this is how most of the optimized templating languages do it, and we stream (don't buffer, its slower, sometimes useful though). Just get a templating engine w/ a writer function passed in and viola.
Reply all
Reply to author
Forward
0 new messages