> The Register has a nice high level discussion about the Go language, with a direct comparison to Node (incl. quotes by Ry) on page 4.
>
> Has anyone ever tried Go? Any specific thoughts? What are it's pros and cons compared to Node (aside from being compiled (pro and con), multithreaded (pro))?
Yeah, and this:
<quote>
"Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
</quote>
Is what node.js needs to address, the sooner the better, imho.
Like so:
var newSharedNothingJSThread= newJSThread(
aFunction, // The code to run in the background thread. (NOTE_1)
data, // an object *not* to be copied/duplicated/serialized nor shared: just pass it by reference and forget it here in this context.
);
newSharedNothingJSThread.on('message', messageHandlerFunction);
function messageHandlerFunction (data) {
/* here we get data back, the *same* data object that we passed in newJSThread() above */
/* It may have been mutated by the code in the other thread, and that's exactly the whole point */
}
console.log(data);
// this above should throw, or log null or undefined,
// Because the object that data was pointing to is no loger reachable in this JS context,
// it was forgotten in this context when it was passed to newJSThread: that's what 'pass and forget' means.
NOTE_1: The receiving thread receives aFunction as text ( srcText = aFunction.toString() ), and does an eval( '(' + srcText + ')' )(data); to bootstrap. The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts. The only means it has to communicate with another (its parent) JS context is to pass a message (containing a data object, that will be passed by reference in a 'pass and forget' fashion too).
--
Jorge.
But node has support for webworkers via several modules which allow
you to run in separate threads. Also, a lot of people run two
instances of node and load balance between them. There are modules for
that.
Writing code to handle shared memory multithreading is annoying.
------------------------
Gary Katsevman
Computer Science Undergraduate
Northeastern University
gkatsev.com
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
This is a non-issue really. The solution is a little something called processes. They work great, just as Ryan says in his Google talk. And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).
Inefficient and slow, WebWorkers are a shame.
> Writing code to handle shared memory multithreading is annoying.
Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?
--
Jorge.
Hm.. Yes, I did miss that. But I think spawn.js that Ryan mentioned
above does do exactly that.
On 06/05/2011, at 15:56, Gary Katsevman wrote:Inefficient and slow, WebWorkers are a shame.
> On Fri, May 6, 2011 at 06:50, Jorge <jo...@jorgechamorro.com> wrote:
>> <quote>
>> "Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
>> </quote>
>>
>> Is what node.js needs to address, the sooner the better, imho.
>
> But node has support for webworkers via several modules which allow
> you to run in separate threads. Also, a lot of people run two
> instances of node and load balance between them. There are modules for
> that.
Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?
> Writing code to handle shared memory multithreading is annoying.
--
Jorge.
On Fri, May 6, 2011 at 10:05, Jorge <jo...@jorgechamorro.com> wrote:Hm.. Yes, I did miss that. But I think spawn.js that Ryan mentioned
>> Writing code to handle shared memory multithreading is annoying.
>
> Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?
above does do exactly that.
This is a non-issue really. The solution is a little something called processes.
They work great, just as Ryan says in his Google talk.
And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).
No, right now it's not. Not 'pure hubris'. Not at all.Because the current mechanisms for IPC are too slow (serialize to text, usually as JSON, and re-create the object) and too expensive (the text is passed by copy, and you've got to have 4 copies of the object just to pass it once: the original, the serialized as text copy that you want to pass, the serialized as text copy that the receiving end receives, and the instantiated as a real object in the receiving end, how's that 'hubris' ? ) to pass anything but the smallish data at any reasonable prize/speed and efficiency.
On Fri, May 6, 2011 at 9:02 AM, Ryan Gahl <ryan...@gmail.com> wrote:This is a non-issue really. The solution is a little something called processes. They work great, just as Ryan says in his Google talk. And with web-workers and other abstractions like https://github.com/livelycode/spawn.js prevalent, to make a blanket statement that connotes that it's somehow "hard" to mitigate CPU "boundedness" in your app is pure hubris (IMHO).er, that is, hubris to do so in the context of "this is why Go/XYZ is better than Node..." - bottom line is it's really not that difficult in node.
Er, Ryan, do you realize who these 'go' guys are ?I for one take it as a given that coming from *them*, 'go' must have *a*lot* of goodness and know-how into it. No doubt.
On Fri, May 6, 2011 at 9:47 AM, Jorge <jo...@jorgechamorro.com> wrote:Er, Ryan, do you realize who these 'go' guys are ?I for one take it as a given that coming from *them*, 'go' must have *a*lot* of goodness and know-how into it. No doubt.Heh, yeah, I do realize who these 'go' guys are. I'm not sure I understand how that changes things in the context of the discussion around how easy (or not) it is to create a high performance multi process solution using node today with fairly minimal work and maybe some integration with things like msgpack and 0mq?
I wonder if Peter Griess still hangs out here and might speak to how Yahoo Mail uses node, and how these issues were resolved? AFAIK they are a shining example of a real world high volume use case, and I do know they (at least at one point) were making heavy use of web-workers.Do you disagree that choosing a better serialization format and implementing a simple pooling pattern might mitigate (at least) some of the issues you are bringing up?
Yes, perhaps :-)
> -- WebWorkers as implemented in processes are inefficient (still plenty good for a lot of use cases).
True. WebWorkers are good as long as workers don't communicate too much nor too often.
> But WebWorkers as a pattern could be made much more efficient :)
Exactly!
>> > Writing code to handle shared memory multithreading is annoying.
>>
>> Perhaps you missed the phrase "The newly spawned thread newSharedNothingJSThread does *not* share anything with any other JS contexts" ?
>
> Well, if it doesn't share anything it may as well be a process, right?
Right. And sometimes a separate process is better than a thread, and it's a must if you want to run the background task in another machine.
> It has to share something, even if that something is immediately forgotten by the host. Calling it threading only serves to confuse folks though -- I guess what you're talking about is more like forking.
Because JavaScripters hear "thread" and they automatically yell "threads suck, threads are evil", but that's no so, not necessarily, it's just a cliché.
What sucks is shared mutable data, and it sucks because it requires synchronization, and synchronization issues often become quite difficult to reason about.
But a thread that shares nothing, is no more evil than a separate process, can be spawned much much faster than a separate process, and 'IPC' between threads is cheaper, faster, easier, and much more efficient than between processes.
--
Jorge.
Mitigate, slightly, perhaps, yes.But Ryan, no matter how fast you can serialize, passing a pointer (64 bits) to an object, is always going to be many orders of magnitude faster than serializing an object to text + transferring the text + receiving the text into a buffer + parsing it + finally re-creating a copy of the original object.It's not only faster, it also requires ~nil cpu and almost ~nil memory and almost ~nil resources (memory bus bandwidth, cpu share, etc).
How so ? Threads share the process' memory space, if that's what you mean, but it's up to the programmer to decide what data they share.
--
Jorge.
Web Workers are a convenience library on top of child processes and a
pipe, and you usually sacrifice performance for convenience.
Tim.
> Some know who these guys are. Some also know that after more than 2.5 years of development they hadn't yet decided what to do about "exceptions". Go smelled to me after discovering that, even though they supposedly fixed that shortly after.
>
> Hype seems attached to Go more than reality. Perhaps after a year more, more real world has happened? I'll go read the article. I'm really tired of hype and fadism.
Hype ?
Perhaps, but when one is 60-something, one's got quite a bunch of decades of experience...
--
Jorge.
> for clustered computing this is not true. which i think for most major distributions of node is more important.
It's almost right already for clusters, but it's quite wrong for a single machine with multiple cores.
> the intricacies of passing pointers in js is pretty nasty, particularly since we have c++ addons and do not want to have side effects visible from both sides. basically the only way to implement go's style of shared memory (well similar to, not an exact impl) is to wait for an Object to be GCable and all of its references internally to be GCable, and even if we can detect that well, it must have a fallback to prevent deadlock which is non-trivial (but doable in an actor model).
>
> just like pipelined multi-core cpus may not give performance gains and you should determine the volatility of your objects if you truly want performance, I think more thought should go into the theory of how this is done rather than "x is faster than y in case z" even if z is a general case it can have vast implications of what cases are not covered.
There's a reason why almost every programming language has pass-by-reference. You insist on that pass-by-copy is good enough (even a requisite) for such and such use cases, but that's not the point. Copying is just plainly wrong for many other use cases that *require* pass-by-reference.
So pass and forget is -imo- the only way to go, if we want maximum speed and shared-nothingness. And yes, these objects are special, so we'd need to touch the v8 source for this.
--
Jorge.
> At the risk of taking some heat, this quote from the article stood
> out...
>
> What's more, Gerrand argues, Go doesn't force developers to embrace
> the asynchronous ways of event-driven programming. "With goroutines
> and channels, you can say 'I'm going to send a message to another
> goroutine or wait for a message from another goroutine', but you don't
> have to actually leave the function you're in to do that," Gerrand
> says. "That lets you write asynchronous code in a synchronous style.
> As people, we're much better suited to writing about things in a
> synchronous style."
LOL. I was waiting for somebody to comment on that.
Where's Marcel ?
Where's Kyle ?
:-)
--
Jorge.
Yes, higher level abstractions are all right, but higher level abstractions are supported by lower level code.
Don't optimize the lower level, make it run orders of magnitude slower than it should, and you'll be doing it Wrong™ anyways.
I mean non-optimized as in, to merely pass an object containing 1MB of data:
A- parse a 1MB object + serialize into +1MB of text + transfer +1MB + copy +1MB + parse +1MB + re-instantiate the 1MB object
versus
B- grab a reference to it (~ copy 8 bytes).
One way (B) is several millions times faster than the other (A).
Perhaps if you want to pass just 3 bytes, it would not matter much.
Perhaps if you want to pass it just once or not too often, it might be all right too.
But don't pretend that's going to be always the case.
--
Jorge.
> How good is Go? There seems to be a discrepancy between what they
> advertise in their intro material (http://golang.org/doc/
> effective_go.html#concurrency):
>
> "if the communication is the synchronizer, there's still no need for
> other synchronization"
>
> and what you find when you look under the hood (http://golang.org/doc/
> go_mem.html#tmp_18):
>
> "When multiple goroutines access a shared variable v, they must use
> synchronization events to establish happens-before conditions that
> ensure reads observe the desired writes" (followed by the doc of a
> lock API).
>
> This is actually no surprise: if a goroutine can access mutable
> objects from a parent scope, and if multiple gorountines run
> concurrently, how could this be safe without locks?
>
> I've not tried Go but from what I see it just looks like Go is just
> another threaded system with channels in addition to locks. And I
> don't like the way they describe their concurrency model because it
> makes it sound like it avoids the well-know pitfalls of threading,
> when it really doesn't.
>
> Node.js is based on simple sound principles: no threads, only async
> APIs (except for startup and require), share nothing. It would be
> very bad to let threads (real ones) creep in.
What would be bad is to let shared mutable data creep in, *not* threads.
> The real issue then is to have an efficient way to pass messages
> between processes. There are obviously two ways: copying the data and
> sharing immutable data.
There's a third way: pass-and-forget mutable data.
> Why couldn't we have both? Copying is good
> when the data is small but being able to share immutable data is
> important when the data is big.
There's no need to copy, and there's no need for immutability.
--
Jorge.
Exactly, and then it may begin to suck. That's why we want shared-nothing threads.
--
Jorge.
It has been a long while since I dabbled with Erlang, so I am not 100%
On May 7, 2:06 pm, Liam <networkimp...@gmail.com> wrote:
> So how does erlang deal with all this, does it have shared data, super
> efficient msg passing?
on this, but as far as I recall:
1) Erlang has super efficient message passing (because message passing
is intrinsic to Erlang, and is "baked in" for efficiency)
2) Erlang has no shared data.
3) This is crucial: Erlang has *no* mutable data, so even were data
shared, it wouldn't make a difference because you can't modify it.
Yes, I know that the idea of ALL data being non-mutable is a wildly
bizarre idea, but Erlang proves that you can program with such a
constraint.
4) Design objectives of Joe's (Armstrong) were that Erlang be
massively distributed, concurrent, and utterly fault tolerant. It is
difficult to "back engineer" these objectives into a language/system
not designed for such at the outset.
5) Erlang has tail recursion. Of all the Erlang idiom and concepts
that are valuable that could be brought into V8, my greatest affinity
is to see tail recursion.
I hope I am on the mark. If there are any Erlang experts reading this,
please confirm or correct.
3) This is crucial: Erlang has *no* mutable data, so even were data
shared, it wouldn't make a difference because you can't modify it.
Yes, I know that the idea of ALL data being non-mutable is a wildly
bizarre idea, but Erlang proves that you can program with such a
constraint.It's not bizarre. It's incredibly common and a stable of purely functional languages.
Imagine a lightweight web app the does audio transcoding. Wouldn't be
good to use 4/8 cores? If your alternative is to add a proxy and
distribute for 4/8 procs, I don't think that's lightweight..
> (...)
>
> The core of my criticism is this: Adding multithreading to javascript
> is mashing a low-level feature into a high-level language that it
> wasn't designed for, and there just aren't that many use cases in the
> common problem domain where multithreading provides the only path to
> acceptable performance. I can't think of many cases where you'll be
> passing around three megabyte pages between processes, and those few
> that I can think of would be better served by being written in a
> language suited to the problem at hand.
Like when you need to process 3000 1kB templates for 3000 concurrent users ?
--
Jorge.
Like when you need to process 3000 1kB templates for 3000 concurrent users ?
Each template with each user's data ?
How could I do that before the app starts, and before the clients connect to the server, and before they request the page ?
--
Jorge.
Each template with each user's data ?
How could I do that before the app starts, and before the clients connect to the server, and before they request the page ?
The bigger issue IMO is that in node you can't/shouldn't block the main thread, and while you can do I/O in (a) background thread(s), you can't fill a template in (a) background thread(s).
And the solution can't be not to fill templates, when the problem is filling templates.
--
Jorge.
--
Jorge.
> For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer)
It's good for web-apps, and good for private data, but when you want to have the contents indexed by google, you better serve the pages totally well constructed.
> this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.).
>
> You just don't pass the entire template and user data to the process via IPC,
No ?
> and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.
So what does the child process do with the result, after filling the template ?
--
Jorge.
Thanks,
Chris Austin-Lane
Sent from a cell phone
On 13/05/2011, at 23:13, Ryan Gahl wrote:It's good for web-apps, and good for private data, but when you want to have the contents indexed by google, you better serve the pages totally well constructed.
> For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer)
No ?
> this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.).
>
> You just don't pass the entire template and user data to the process via IPC,
So what does the child process do with the result, after filling the template ?
> and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.
On Fri, May 13, 2011 at 5:26 PM, Jorge <jo...@jorgechamorro.com> wrote:On 13/05/2011, at 23:13, Ryan Gahl wrote:It's good for web-apps, and good for private data, but when you want to have the contents indexed by google, you better serve the pages totally well constructed.
> For the sake of the discussion, assuming the problem is filling templates and not reworking the system to avoid filling templates on the server (which IMHO is absolutely the right answer)
Agreed. This can be accomplished via server side templating that is done outside the request pipeline (i.e. app composition). We do this step when our app is starting up, so the end result is that everything has been pre-processed either into static files, stored in memory, or compiled functions before we start listening for requests. Where dynamic sections (what you may call partials) are concerned, we avoid any in-proc server side template processing. The pages are SEO-kosher.No ?
> this problem still does not require threads at all. This is exactly the kind of work that is served perfectly by processes (web-workers, et.al.).
>
> You just don't pass the entire template and user data to the process via IPC,
No, you pass the stream FD(s) to the process, and the minimal amount of data required to instruct it what you want it to do (just like a function call). These processes can be like your internal service APIs for things that require processing.So what does the child process do with the result, after filling the template ?
> and you want to implement a pooling pattern to avoid process spawn penalties at runtime. It sounds like you're saying that you're being forced to pass 3000 1K templates around via IPC and/or being forced to do this processing in a blocking manner because threads don't exist, neither of which is true. Either that or I'm still missing something.
If the child process has the FD, the child process writes directly to the stream and can even close it. Or you can write to it and send an "i'm done" message back to the parent process. Point being, there is no need to be passing that huge chunk of data across process boundaries.
So, as you can't spawn a thread (which is fast and does not need any expensive IPC), you want to launch a (child) process per client.And as you still can't block the main thread in the child process, you still can't/shouldn't handle more than one concurrent fd/connection per child process.And if you've got any other client/session data to keep, you either pass it too via IPC to the delegate process or you lose it: the context of a client is more than its fd/network socket.You insist that all that is good, but it isn't. Not for every use case. Sometimes what you'd need is not all that jazz, but a simple way to run a simple function(data) in the same (main/parent) process in the background, without paying the cost of spawning a new process form scratch and recreating the context that you already had in the parent.Node solves wonderfully the problem of blocking IO tasks (runs them in a pool of background threads thanks to libeio), but it needs to find a way to solve too the problem of blocking cpu-bound tasks, by running them in background threads.Imagine for a second that you were right, and that your proposal were the right thing to do. Why, then, does node use threads for IO instead of child processes ?Because child processes are not the silver bullet.
Practically speaking (because we both know threads aren't coming to node any time soon), have you taken a good hard look at zeromq yet? In case not, the node binding is here: https://github.com/JustinTulloss/zeromq.node. No... it's not the answer to "threads are missing", but it's damn fast and offers a really nice on-machine scaling alternative to get to as close to CPU bound as you want to get, with the added benefit of being able to scale out using the same API.