This looks like a good approach to me.
Some (potentially radical) suggestions/ideas and further questions for discussion follow.
Suggestions
node.send should not be used in this case as its use will stop the runtime from being able to correlate message received with message sent. We probably won't enforce this - tbd.
As a transition, perhaps node.send can be deprecated, and remain for backwards compatibility for a release or two. For new nodes, part of the .html documentation would inform users whether or not the node supports this API. Obviously nodes that don’t support it can’t be used by the ‘Success’ node.
If the traditional node.send() is used by a node, then the NR runtime API can know the node input event handler will not call done() and not wait for it.
What if done is never called?
This is really tricky.
At the moment, I don’t believe NR retains a FIFO type flow of msgs between nodes, particularly when asynchronous functions involving I/O are called by ‘input’ handlers in nodes. Race conditions could occur, especially if some msgs wait on I/O while others don’t.
So detecting a done() call on a subsequent msg, before the former message as a failure probably won’t work.
What if, as well as an arbitrary timeout period, we have an arbitrary number of subsequent done() calls? The runtime could count the number of done() calls subsequent to outstanding msgs awaiting their done() call, and on exceeding a threshold, generate an error? This might be more reliable than a timeout alone, as time delay could be a function of latency in the system (i.e. resulting from load), while subsequent done() calls would suggest otherwise.
It could also potentially signal a buggy node implementation that could then be disabled by NR runtime and alert the user (i.e. no done calls).
Thoughts?
Alternate event handler callback API
Just an alternate suggestion. See what you think.
How about:
this.on('input', function(msg, action) {
// do something with 'msg'
if (!err) {
// send can be called as many time as needed (including not at all)
action.send(msg);
action.send(msg);
action.send(msg);
// Once complete, done is called
action.done();
} else {
// If an error occurs, call done providing the error.
action.done(err);
}
});
With this type of interface, it would be easier to evolve and expand in the future (say with a .rollback() call - see further down), by simply adding more properties to action. This leaves the remainder of the function declaration available to other potentially unrelated parameters in the future.
Disadvantage is you have to know those property names (or your IDE would have to extract it from jsdoc/introspection) to call them. A bit trickier for a new node developer.
Thoughts?
Questions
Which nodes should report success?
Should all nodes report success/failure on processing messages, or should only outbound (output) nodes? If the former, will this scale? How much additional cpu load will this add to NR? Assuming almost everything happens on the event thread, could NR become cpu bound, especially on SoC type hardware like a Raspberry Pi?
Should the runtime track the node of origin for msgs?
Would it be useful for msgs to contain some form of identifier of the node that created it, remembering it could be an input or an input/output node. Even just its registered node name might be useful for downstream nodes in the flow.
Should the runtime track new msgs derived from others?
So a node splits a message up, should the new messages have a new _msgid, but a relationship with the _msgid whence it was created? Would this allow subsequent nodes to report done() on the original _msgid, plus the new _msgid's?
Should the API allow a rollback()?
Would atomicity be useful?
What if a node generates multiple messages, and some related messages (but not all) are sent before an error condition occurs? If its before the done() method, then a rollback() could be performed which would throw away those messages. This would mean that the messages would be held in a queue until done() is called and they would be forwarded on together.
Would this be useful?
Would a taxonomy of errors be useful?
When done() is called in an error state, would it be useful to make provision in the API for both user-contributed node-specific error messages (like an error specific to node-red-contrib-*), but also a standardised NR error code? This may involve some thinking about the types of errors that are common and that a user would wish to act upon.
Perhaps better explained with an example:
Off the top of my head, some NR error codes that the user of a contributed node could pass as an NR error condition:
ERROR_RETRANS - retry this message later (perhaps with time)
ERROR_DISCONNECTED - retry when CONNECTION restored
ERROR_RUNTIME - perhaps a syntax error in a function node
ERROR_CONFIG - error because of misconfiguration of node
So on and so forth. Perhaps rather than the source of the error, instead a hint as to how to respond:
NOTIFY_USER - Misconfiguration or runtime error, so alert the user to the problem
RETRANS_TIMER - Retransmit after certain period of time
RETRANS_CONDITION - Retransmit after connection state change
This type of model might be useful to nodes that listen to ‘success’ so they get a standardised hint of what to do given the type of error.
Thoughts?
D.