--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
My believe is that this pain was because the guys who developed this
middleware tried to make EVERYTHING asynchronous. They rewrote most
aspect of Java to make it asynchrnous. Needless to say, testing the
application was a huge challenge.
The takeaway for me was this: a language can't be completely
asynchronous. You can't use asynchrnous communication everywhere.
There need to be a healthy balance between synchornous communication
and asynchronous communication. The designer of the application should
be responsible for identifying the parts of his software that should
use one or the other. This is why I'm in favor of keeping asynchrnous
communication explicit, and instead educate developers and help them
make informed decisions so they can use the right concept for the
right situation. In my view, hiding asynchronous communication
underneath synchronous communication doesn't help in making the right
decision.
Max.
On Sun, Jan 9, 2011 at 6:02 PM, Bruno Jouhier <bjou...@gmail.com> wrote:
Why try just to _ease_ the pain? The pain will disappear completely if
you stop using the tool which causes it.
You are trying to fight problem which is not a problem by rather a
_design_decision_.
If you have function foo which is in CPS and bar which is not then you
cannot substitute foo for bar and bar for foo. They are not
interchangeable. You have to accept it.
One can pile wrappers on top of wrappers and source-to-source
transformations on top of source-to-source transformations but that
would not fix the "problem". It will only hide it. It will make
debugging code more difficult: no nice stacktraces (in case of
transformations) or stack poluted with "async-framework" internals and
no debugger support (in case of tranformations). It will make
reasoning about code difficult because transformations you are
applying are far from trivial. "Dummy" will have a hard time figuring
out what is going behind the scenes and where he/she have to add _ to
make magic work.
--
Vyacheslav Egorov
Here's another async directory walker function https://gist.github.com/742540 that compared to https://github.com/Sage/streamlinejs/blob/master/examples/diskUsage2.js , is, in my opinion, much easier to understand because the program flow is obvious/immediately apparent, instead of hidden behind an additional syntax layer, that of an async helper/abstraction framework/mechanism (noise ?).
I my head, asynchronous program flow becomes a quite straightforward thing, once you realize that most of the times -for maximum clarity- you can use a plain, named callback function declaration (instead of an inlined anonymous lambda) as if it were the *label* of the code block to which the program flow is going to jmp or goto next.
--
Jorge.
1. Agree: there is currently NO WAY to abstract this shit out. And
since there is no way to have a solid abstraction, it's better for
folks to understand what's going on, rather than working around
horrible bugs that pop up with halfway abstractions.
2. Disagree: Nobody (including heavy programmers) likes to use
continuation-passing style everywhere. And that's why everybody tries
to abstract it out.
And because (1) and (2) combined, we'll still be talking about this 10
years from now, if nothing changes deep down in the language.
CPS makes total sense for a variety of problems, UI development being
the most obvious. My favorite example — CodeMirror [1] — parses your
code in chunks of X lines, but is able to interrupt when you start
typing. It saves one closure for each line of text, so that if it
parsed 1000 lines and you start typing on line 700, instead of
re-parsing everything it restarts from that line. It's blazing fast.
You can't do this in a single thread without saving continuations.
But parsers are a pain to write.
I know it's not exactly up to the Node community, but given that
JavaScript is so utterly important here, there are 2 things you should
lobby for: macros and threads. My [2] cents...
Cheers,
-Mihai
[1] http://codemirror.net/ and then http://www.ymacs.org/
[2] http://mihai.bazon.net/blog/whats-missing-from-node
> --
> You received this message because you are subscribed to the Google Groups
> "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en.
>
--
Mihai Bazon,
http://mihai.bazon.net/blog
On Mon, Jan 10, 2011 at 10:57 PM, Tim Caswell <t...@creationix.com> wrote:
> I don't think node needs macros, the language is plenty flexible as it is
> and they bring plenty of headaches of their own.
I'll submit that this is a sensitive subject — it depends on who
you're talking to. If you talk to a PHP programmer, macros suck
("what r they and y should I care?"). Talk to a C programmer, macros
suck (because he was bitten). A Java programmer will say that macros
suck because he was taught that he will be bitten.
Talking to a C++ programmer, things start to change. C++ templates
are macros in disguise. A good C++ programmer appreciates templates,
so even if he doesn't know it yet, he will appreciate macros. ;-)
Talk to a Lisp or even a Perl programmer — "how can you live without 'em?".
> As far as threads, that goes directly against the purpose of node and
> JavaScript. Threads are an alternative way to do concurrency and
> completely separate from the event loop. Threads allow blocking IO at the
> expense of the OS having to manage concurrency for you. It's proven
> very inefficient for programs that are mostly IO bound and have high
> concurrency (like a webserver)
Not against the purpose of JavaScript, right? Nobody ever said that
JavaScript is supposed not to have threads, it just happens that it
doesn't.
The machine is definitely able to do more than one thing at a time,
and I can't see why you, the programmer, should account for all the
lost CPU ticks. Place more work on the hardware — that's why C won,
that's why Perl won, that's why PHP won. ... and that's why Lisp
lost. But things start to change. ;-) JavaScript is the new kid on
the block. If it is to survive, it must have threads at some point.
Real threads, not the pathetic "worker" substitute...
Cheers,
-Mihai
"Node's goal is to provide an easy way to build scalable network programs...This is in contrast to today's more common concurrency model where OS threads are employed. Thread-based networking is relatively inefficient and very difficult to use."
Node only works with V8. If another VM becomes better at some point,
bad luck (or good, depending how you take it). "Porting" Node to
another JavaScript engine would practically mean rewriting it.
> and the lack of blocking I/O in the language (hence less to
> unlearn).
Lack of *any* I/O, we could say.
That's part of what makes JavaScript such a great choice — you have
total freedom to "define" things that weren't defined by any
specification, because the specification doesn't care about how you do
I/O — it only defines the language.
> Browser JavaScript is single threaded and event based. Node is
> single threaded and event based.
That's just because browser JavaScript is single threaded and event
based — and this makes total sense for an UI-oriented language, but
it's insufficient for a server-side language. The set of requirements
for the two are so different that really, I doubt that JavaScript can
survive as is on the server market.
> JavaScript is a relatively easy to code
> language and gaining momentum. I think it was a great choice.
With this I totally agree. A beautiful language. The #1 reason why
people look into Node at all. JavaScript is cool. Too bad it misses
this and that. :-(
Cheers,
Q and promised-io are great implementations of promises. But, you
still have to understand the underlying mechanics to some degree to
use them properly. So they don't so much solve this problem as much
as just shove it over to somewhere else.
The perceived problems could be alleviated to some degree by creating
a language built around asynchrony. It *would* be neat to have a
language completely built around promises such that you could just
take two promises-that-resolves-to-a-number and add them together with
+ and get another promise-that-resolves-to-a-number. But that
language isn't JavaScript, and that is a major undertaking if done
properly.
--i
I don't think (a-sinchronicity, or asynchronous program flows) are a difficult problem, not in JavaScript, and I don't think it requires any "helper" false-friend framework.
Of course, everything one does not understand becomes a "difficult problem" until one groks it : prototypes, closures, regexps, event-driven programming, async flows, etc., some things require some training, but none of these is "difficult", really.
It's not difficult (in JavaScript, thanks to the closures) to have a number of simultaneously alive program contexts (that one could think of as individual, switchable in-process processes) that can be exited at any time and easily re-entered in the future as many times as needed, simply via the callbacks (that I prefer to be named and used as code block labels) as if the code flow had never exited the context before.
People just need to learn to use functions as if they were ~ ordinary code blocks { } , and need to learn how and where to setup the context (disappointingly simple 99% of the times: in an outer closure) they'll want to re-enter in the future (i.e. asynchronously).
These two are the important (yet simple) concepts to grasp, but not any funky frameworks, nor other people's abstractions, just the proper use of the set of built-in facilities that JavaScript provides, which btw as a set are ~ unique to JS, which happens to be -imo- the reason why they're often so strange for programmers coming from other languages.
--
Jorge.
> I my head, asynchronous program flow becomes a quite straightforward
> thing, once you realize that most of the times -for maximum clarity-
> you can use a plain, named callback function declaration (instead of
> an inlined anonymous lambda) as if it were the *label* of the code
> block to which the program flow is going to jmp or goto next.
Agree 100%. i'm surprised this is not seen more often.. imo named
functions help newbs a lot. Surprised more getting started type
tutorials do not do it that way.
cheers
--
shaun
Server-side languages absolutely require threading or some parallel
idea. You can't deploy to an 8-core server and only be utilizing
12.5% of the CPU.
The only way to do this now is to launch 8 servers which, although may
work, is a very clunky work-around and you lose a good deal of
simplicity (you can't, for instance, now implement a session object as
a straight JS object).
Yep. Workers are half-baked threads. We'd need to be able to pass real objects back and forth, not only serialized-as-a-text-message objects, which is clumsy, limiting and very expensive (expensive twice: in both directions).
If real objects were passed by copy, it would be less but still quite expensive.
If passed by reference, then we'd have real threads with real shared data and real threads synchronization issues... but I'd love to have thread synchronization issues to deal with: it would mean that I have threads :-)
What would you need the macros for ?
Aren't macros just a source preprocessor ?
Can't you preprocess a source .js file in JS, and achieve the same without real built-in macros ?
--
Jorge.
If you want your program to use all 8 cores, you have lots of options.
1. Use child processes. It's not like that's a new or radical idea.
Use require("child_process") to do this in node.
2. Use threads. Write a binding. Not very hard.
3. Use a different platform/language. It's not like we don't have options here.
Node is what it is. It's not ideal for every problem, but it is ideal
for many problems. It's not the simplest or the most powerful thing,
but it's pretty darn powerful for how simple it is, and that's a good
niche for a platform to occupy.
We are not going to add shared-state concurrency to node, ever.
--i
> There are only 2 hard problems in computer science: shared state
> concurrency, naming things, and off-by-one errors.
The #1 is semicolons.
> If you want your program to use all 8 cores, you have lots of options.
>
> 1. Use child processes. It's not like that's a new or radical idea.
> Use require("child_process") to do this in node.
> 2. Use threads. Write a binding. Not very hard.
> 3. Use a different platform/language. It's not like we don't have options here.
>
> Node is what it is. It's not ideal for every problem, but it is ideal
> for many problems. It's not the simplest or the most powerful thing,
> but it's pretty darn powerful for how simple it is, and that's a good
> niche for a platform to occupy.
No matter what , with or without threads, fools always find a way to shoot themselves in their foot.
By-reference, in shared memory, is the fastest way there is to pass an object, and it comes for free with threads, but not for processes (possible, but more complicated).
The problem with threads is not the shared data per se, but the synchronization of *modifications* to shared data.
If it were possible to pass an object by reference to a worker *and* at the same time de-reference it in the sending end, synchronization issues would be impossible.
Just to avoid the need to pass-by-copy. Let's call it "exclusive-ored-memory-data-sharing" :-).
Right now the only way to pass objects (to/from workers) is :
- construct the object
- serialize to text
- pass it as a text message
- de-serialize to a newly created object
- use it
- re-serialze to text
- pass it as a text message
- de-serialize to a newly created object
- again and again
It's expensive, it's inefficient, and it only "works" (and not-very-well) for lightweight data.
> We are not going to add shared-state concurrency to node, ever.
As Mihai said, "it's not exactly up to the Node community".
--
Jorge.
That is what I meant by saying that foo and bar are not interchangeable.
When you hide CPS under transformation it might get harder to reason
about the code and find places which has to be changed if you switch
sync function for async function or async function for sync function.
Of course I have checked examples of transformations before writing my
first message :-) Frankly speaking I like the idea. But IMHO it might
lead to some confusing bugs (especially in a loosely typed language
like JS). It is not a panacea. Especially not for dummies.
--
Vyacheslav Egorov
Overall, agreed. The reason to use threads instead of separate processes is to share the data instead of to send data (via IPC).
Sharing data is fast and cheap, sending data is expensive and requires a copy.
The workers in the browsers use threads (not processes) and a sort of IPC (message passing). Workers are no doubt much better than what we had before (nothing).
But it's still a half-backed "threads" model that does not suit well for the cases where you want to pass an object (structured data), nor big chunks of data (say, an image), because messages are text, have no structure, and are immutable.
I understand the goal was to achieve shared-nothing threads. But *perhaps* shared-nothingness could be achieved as well in another way, say:
var a= {a:1, b:2}; // an object
var b= new Worker( ... ); // or require("child_process").spawn( ... )
// this passes to the worker a new reference to the object that `a` points to (pass-by-reference)
// a reference to the real object, not a serialization of the real object.
// and, it nulls a in this context, somehow.
b.postMessage(a);
a
-> undefined, null, or {}, I don't care.
What's the point ?
- From the point of view of this context, the object ceases to exist as soon as it's passed to b.
- As it's been passed, it's not being shared (100% shared-nothingness: no problems with threads concurrency).
- As a has not been copied : it's been pretty fast.
- The worker has received a real object, not a serialization to de-serialize.
- a can be used and modified in the receiving worker, and returned back again if you wish.
A way to achieve this would be awesome.
--
Jorge.
I understand the goal was to achieve shared-nothing threads. But *perhaps* shared-nothingness could be achieved as well in another way, say:
var a= {a:1, b:2}; // an object
var b= new Worker( ... ); // or require("child_process").spawn( ... )
// this passes to the worker a new reference to the object that `a` points to (pass-by-reference)
// a reference to the real object, not a serialization of the real object.
// and, it nulls a in this context, somehow.
b.postMessage(a);
a
-> undefined, null, or {}, I don't care.
What's the point ?
- From the point of view of this context, the object ceases to exist as soon as it's passed to b.
- As it's been passed, it's not being shared (100% shared-nothingness: no problems with threads concurrency).
- As a has not been copied : it's been pretty fast.
- The worker has received a real object, not a serialization to de-serialize.
- a can be used and modified in the receiving worker, and returned back again if you wish.
A way to achieve this would be awesome.
--
Jorge.
No, by hand, I would write something like this:
function zoo (cb_) {
var errState = null
, expect = 2
, sum = 0
foo(cb)
bar(cb)
function cb (er, num) {
if (errState) return
if (er) return cb_(errState = er)
sum += num
if (-- expect === 0) return cb_(null, sum)
}
}
Or, if this was in npm, or some program that uses slide:
asyncMap([foo, bar], function (fn,cb) {fn(cb)}, function (er, res) {
res = Math.sum.apply(Math, res)
// now do whatever you were gonna do with it...
})
I don't have a function that calls a list of functions in parallel,
but that'd be a neat thing to add. Maybe it could look like something
this:
function asyncCall (list, cb) {
return asyncMap(list, function (f, cb) { return f(cb) }, cb)
}
--i
> I like this one way posting style you are proposing, however, I think a more lazy style approach would be easier on both programming and logic.
>
> This could instead of clearing up the references (would require weak ref access in v8) to a, why not just give it to b when it is about to be garbage collected (or if it is available for it).
I think objects are reference counted by the JS engine. We'd need to have at least one reference ( objectRef ) in order to pass it to b.postMessage( objectRef ) so a refCount of at least 1. Anything > 1 would indicate that other references exist somewhere else ( in addition to objectRef ).
Unless the only way to pass it were .postMessage( { } /* an object literal*/ ); (would that have a refCount of zero ?)
A way to go could be to disallow (to throw?) any attempts to pass an object whose refCount (or the refCount of any of its children) is > 1.
Another possibility could be to mark (with an internal flag) as 'unavailable' (or as "return null") objects (and its children) that have been passed, to disallow access based on it. Then it won't matter what their refCount is :
o= {};
a= { a:27, b:o };
b= new Worker( ... );
o //allowed
-> {}
a.a //allowed
-> 27
b.postMessage(a); // the object that a points to (and its children) is now marked (internally) as unavailable
a
-> probably throw or return null or (I don't know) but this object should be no longer available in this context.
o
-> idem (it's a child of a)
> Unfortunately you wouldnt be able to assure time wise that it is going to be sent though if it is copied about in closures etc, but I think that it would be an acceptable alternative. Even then, we just need to wait for v8 to allow passing of objects.
I imagine most objects are a composition of pointers to other objects, scattered around the heap. How difficult is it to pass an object from a running v8 instance to another in a separate thread ? I have no idea. But I'm sure it would be easier than to pass it to another v8 instance running in another *process*.
> Meaning:
> -if a can be gced and all its "children" can be collected it is valid to be passed.
> -if a is shared somewhere (and unavailable for gc) it will never be posted
> --should you wait until available to pass it or just have a failure flag?
Better to throw, I guess.
> Pros:
> -well defined sharable situation without leaving references
>
> Cons:
> -checking all children's collectability is costly
> -how to deal with sharing C++ based ojects?
hehe, no idea :-)
--
Jorge.
> I like this one way posting style you are proposing, however, I think a more lazy style approach would be easier on both programming and logic.
>
> This could instead of clearing up the references (would require weak ref access in v8) to a, why not just give it to b when it is about to be garbage collected (or if it is available for it).
I think objects are reference counted by the JS engine. We'd need to have at least one reference ( objectRef ) in order to pass it to b.postMessage( objectRef ) so a refCount of at least 1. Anything > 1 would indicate that other references exist somewhere else ( in addition to objectRef ).
Unless the only way to pass it were .postMessage( { } /* an object literal*/ ); (would that have a refCount of zero ?)
A way to go could be to disallow (to throw?) any attempts to pass an object whose refCount (or the refCount of any of its children) is > 1.
Another possibility could be to mark (with an internal flag) as 'unavailable' (or as "return null") objects (and its children) that have been passed, to disallow access based on it. Then it won't matter what their refCount is :
o= {};
a= { a:27, b:o };
b= new Worker( ... );
o //allowed
-> {}
a.a //allowed
-> 27
b.postMessage(a); // the object that a points to (and its children) is now marked (internally) as unavailable
a
-> probably throw or return null or (I don't know) but this object should be no longer available in this context.
o
-> idem (it's a child of a)
> Unfortunately you wouldnt be able to assure time wise that it is going to be sent though if it is copied about in closures etc, but I think that it would be an acceptable alternative. Even then, we just need to wait for v8 to allow passing of objects.
I imagine most objects are a composition of pointers to other objects, scattered around the heap. How difficult is it to pass an object from a running v8 instance to another in a separate thread ? I have no idea. But I'm sure it would be easier than to pass it to another v8 instance running in another *process*.
> Meaning:
> -if a can be gced and all its "children" can be collected it is valid to be passed.
> -if a is shared somewhere (and unavailable for gc) it will never be posted
> --should you wait until available to pass it or just have a failure flag?
Better to throw, I guess.
> Pros:
> -well defined sharable situation without leaving references
>
> Cons:
> -checking all children's collectability is costly
> -how to deal with sharing C++ based ojects?
I like that we're exploring different solutions that aren't in the runtime.
I don't think it's about beeing leet and not letting beginners in. I think it's quite the opposite. If beginners have to choose between the 15 different wrappers and transformations and "tools" that make it so you don't have to understand callbacks (at least for the first couple hours you spend learning their library), and then when something goes wrong (as it ALWAYS does with programming), they have to understand both what your library does AND how the callbacks and closures under the hood work, well, you get the point. That's not easier.The best thing for beginners is to learn callbacks and closures properly, then write their own async utility or choose a third party one that's already written by someone else. Less abstraction is almost always better. Especially for beginners.