Node-red Message reliability and queueing

1,734 views
Skip to first unread message

damo....@gmail.com

unread,
Jun 4, 2017, 1:46:47 AM6/4/17
to Node-RED

Hi Everyone,

I've been using Node-Red for some time and have contributed two nodes now.  My sincere thanks to all those who have contributed to the software.

I'm posting this message to start a discussion on potential reliability features for message delivery, and introduce an experimental node to facilitate this, at least in part.

The experiment
My usage case is that I have a mobile node-red installation that has transient/poor network connectivity.  It sends data via mqtt to a broker, but when the connection to the broker is lost, messages that are sent through node-red in the meantime are lost. 

I have written node-red-contrib-msg-queue and published it on npm.  At present, it does not have the 'node-red' keyword. 

This node can be placed ahead of an mqtt outbound node for instance (see illustration on npmjs.com), and it can monitor the mqtt node's status.  When the status is shown as 'connected', then queue will pass messages on (forward) to mqtt as they are received.  If however, the status is not 'connected', queue writes messages it receives to an on-disk queue (using sqlite).  When the mqtt outbound node returns to a 'connected' state, queue then forwards on messages stored on disk, and in the same order they were received. 

Problems
This node works well on its own when, for instance, the MQTT broker is shutdown.  In this instance, a tcp-close is initiated by the broker, and almost immediately, the mqtt outbound node is aware that the connection has been closed.  It updates its status quickly. 

However, when network connectivity is lost, the mqtt outbound is not immediately aware, and may send a handful of messages to the broker, before a timeout occurs in the tcp connection indicating the connection is down.  In the instance of MQTT, the outbound client does maintain an in-memory list of messages with a QoS of 1/2 to confirm they are delivered. If node-red is shutdown before the connection is re-established, then those in-memory messages are lost. 

Other Challenges/Problems
Stateless outbound nodes (e.g. UDP-based)

The other situation is when there is no connection state, and it is therefore not possible to know ahead of time whether a message can be sent/received.  There are nodes, such as influxdb out which deliver messages to an influxdb server over UDP.  Part of the application protocol for this software includes the ability to determine when data wasn't received.  influxdb out will throw an error with the message not received.  This could be caught by queue, and retried in the future.

Node-Red API Proposal
With the gaps in the above implementation, I would like to know what interest there is in the node-red community for a message delivery validation API? 

I have seen related ideas discussed in the past on the forums, although not to the extent of adding to the node-red API. 
Here, here and here

The API could be implemented similarly to the status and throw/catch APIs where the uuid generated for each message can be emitted via a 'acknowledgement' or 'nacknowledgement' communications channel that can be listened to (like status node). 

Any node part of a particular pathway through node-red (such as queue) could maintain a register of messages that pass through, along with their uuid, and 'check-off' when an 'acknowledgement' message is generated.  Those without an acknowledgement could be retransmitted after a timeout period, or on a 'nacknowledgement' when it is known something has gone wrong.

With such an API, messages forwarded on by a node like queue before the outbound node is aware that a tcp connection is lost will be tracked, and it will be known they weren't delivered.  They won't be lost.

If you are interested in this idea/discussion or have an alternate/better suggestion, please try out the queue node, and share your thoughts.  Queue is on npm but does not yet have the 'node-red' keyword, so you won't be able to install via the UI.

Damien.

urs.epp...@switch.ch

unread,
Jun 5, 2017, 2:07:50 AM6/5/17
to Node-RED
Hello Damien

You are facing an interesting problem.

My approach would be to create a topic where I post a sequence number, just after each post to the broker.
If your mobile node-red instance receives the sequence number back from the subscribed topic, all is fine.
That way you know up to which number the messages have been successfully received.
This does not handle all cases, but might be sufficient enough.

Slightly more complex is how TCP handles this. This is described in many articles, maybe you can use the basic ideas from there.

Kind regards,

Urs.

Dave C-J

unread,
Jun 5, 2017, 2:29:00 AM6/5/17
to node...@googlegroups.com
If you are only looking at Mqtt then I'd also look at the Mqtt bridge (eg http://www.steves-internet-guide.com/mosquitto-bridge-configuration/). By using a local broker connected between you and the remote broker you can use it's qos and persistence to be the queue you are taking about.

damo....@gmail.com

unread,
Jun 5, 2017, 5:06:34 AM6/5/17
to Node-RED
Thanks Urs.

Reply inline below.


On Monday, June 5, 2017 at 4:07:50 PM UTC+10, urs.epp...@switch.ch wrote:
Hello Damien

You are facing an interesting problem.

My approach would be to create a topic where I post a sequence number, just after each post to the broker.
If your mobile node-red instance receives the sequence number back from the subscribed topic, all is fine.
That way you know up to which number the messages have been successfully received.
This does not handle all cases, but might be sufficient enough.

This is actually quite a clever idea for the MQTT case.  Thanks.
 

Slightly more complex is how TCP handles this. This is described in many articles, maybe you can use the basic ideas from there.


I guess I am wondering whether there is a general want/need for the node-red API to provide delivery confirmations no matter what connection-oriented, or even connectionless protocols are in use. 

Consider an outbound http request node for instance that 'POST's data to a web-based API.  If its RESTful, it will generate appropriate status codes that validate whether the request was successful or not. 

If node-red had a standardised (i.e. API) for outbound nodes to report success/failure of messages, whatever protocol is used, then flows could handle failure events more reliably, and minimise data loss.  To simply discard undelivered messages, and often silently to me seems like quite a gap in node-red.

But that is just me.  More broadly, the question is whether there an interest from the node-red community and the project leaders in this functionality?

Damo.

Kind regards,

Urs.

damo....@gmail.com

unread,
Jun 5, 2017, 5:08:46 AM6/5/17
to Node-RED


On Monday, June 5, 2017 at 4:29:00 PM UTC+10, Dave C-J wrote:
If you are only looking at Mqtt then I'd also look at the Mqtt bridge (eg http://www.steves-internet-guide.com/mosquitto-bridge-configuration/). By using a local broker connected between you and the remote broker you can use it's qos and persistence to be the queue you are taking about.

Thanks Dave.  The local broker I have been using is mosquitto, and last I checked, it doesn't provide persistent secondary storage (i.e. HDD) of undelivered messages.  I believe it is on the roadmap of features. 

D.

Nick O'Leary

unread,
Jun 5, 2017, 10:30:32 AM6/5/17
to Node-RED Mailing List
Hi Damien,

we have already got a design proposal that I think would address this.


There are a couple outstanding questions that need a bit more thought, but I think the main idea is good.

Let me know what you think.

Nick


--
http://nodered.org
 
Join us on Slack to continue the conversation: http://nodered.org/slack
---
You received this message because you are subscribed to the Google Groups "Node-RED" group.
To unsubscribe from this group and stop receiving emails from it, send an email to node-red+unsubscribe@googlegroups.com.
To post to this group, send email to node...@googlegroups.com.
Visit this group at https://groups.google.com/group/node-red.
To view this discussion on the web, visit https://groups.google.com/d/msgid/node-red/85b3e88a-3bae-4d81-b9ee-1b2944bac992%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Guilherme Francescon Cittolin

unread,
Jun 5, 2017, 10:30:35 AM6/5/17
to Node-RED
Regarding the MQTT problem: I've ran into similar issues, and if we take a look at the underlying library used, it already supports pluggable storage mechanism to store QoS 1/2 messages that we did not receive their ack yet. They even ship with an in-memory storage, and there's a very good disk-based one for separate installation. I've even implemented this locally, but not published anywere yet, as I wanted to raise the duscussion here before proposing anythig. If the Node-RED core team agree, I could contribute with the code for it.

Regarding ack messages in general: I work with a legacy 15-year-old message based system (written in Pascal!) for the industry that has a very similar concept as Node-RED, but exchanges XML messages between modules instead. They had implemented some special "ack" messages for special modules, and it works somewhat well, but it has a problem: it's not flexible. Even in a closed-source software, where in theory you have control of the functionality and interface of each module, things are way too dynamic, and it's very easy to break things: modules do not reply in some specific conditions and then break everything, maybe you need a prefetch of messages... it get's very complicated go manage this.
In their system, what most of the modules that need some kind of reliability do is to store the messages on disk by themselves, and expect some kind of confirmation message, depending on the requirements of the process, how many messages to store, etc.
But: I do agree that, when possible, all the information the nodes have regarding delivery, connection status, a.s.o. should be exposed, like in MQTT, so that, if any application wants to make use of it, it's available, like when implementing a more reliable message delivery through MQTT.

TL;DR
I would think twice before implementing an ack mechanism in Node-RED, maybe there are other ways to achieve higher levels of reliability, or maybe Node-RED isn't even the tool for that. For implementing a queue on the mqtt node; here's my upvote! :)

Kind regards,

Guilherme

Damien Clark

unread,
Jun 6, 2017, 5:07:32 AM6/6/17
to node...@googlegroups.com

> On 6 Jun 2017, at 12:30 am, Guilherme Francescon Cittolin <gfcit...@gmail.com> wrote:
>
> Regarding the MQTT problem: I've ran into similar issues, and if we take a look at the underlying library used, it already supports pluggable storage mechanism to store QoS 1/2 messages that we did not receive their ack yet. They even ship with an in-memory storage, and there's a very good disk-based one for separate installation. I've even implemented this locally, but not published anywere yet, as I wanted to raise the duscussion here before proposing anythig. If the Node-RED core team agree, I could contribute with the code for it.

I personally don’t see any harm with an alternate implementation of an MQTT node, and perhaps even merging the changes to node-red.

But it solves a specific instance of a broader issue in my mind.

>
> Regarding ack messages in general: I work with a legacy 15-year-old message based system (written in Pascal!) for the industry that has a very similar concept as

Wow, pascal. One of the first programming languages I learned while in high school, longer ago than I care to say.

> Node-RED, but exchanges XML messages between modules instead. They had implemented some special "ack" messages for special modules, and it works somewhat well, but it has a problem: it's not flexible. Even in a closed-source software, where in theory you have control of the functionality and interface of each module, things are way too dynamic, and it's very easy to break things: modules do not reply in some specific conditions and then break everything, maybe you need a prefetch of messages... it get's very complicated go manage this.

I wonder if a taxonomy of error conditions might assist in dealing with this sort of complexity.


Thanks Guilherme.

D.

Damien Clark

unread,
Jun 6, 2017, 6:45:12 AM6/6/17
to node...@googlegroups.com
Hey Nick,

On 6 Jun 2017, at 12:30 am, Nick O'Leary <nick....@gmail.com> wrote:

Hi Damien,

we have already got a design proposal that I think would address this.


There are a couple outstanding questions that need a bit more thought, but I think the main idea is good.

Let me know what you think.

Nick

This looks like a good approach to me.  

Some (potentially radical) suggestions/ideas and further questions for discussion follow.

Suggestions

node.send should not be used in this case as its use will stop the runtime from being able to correlate message received with message sent. We probably won't enforce this - tbd.
As a transition, perhaps node.send can be deprecated, and remain for backwards compatibility for a release or two.  For new nodes, part of the .html documentation would inform users whether or not the node supports this API.  Obviously nodes that don’t support it can’t be used by the ‘Success’ node.  

If the traditional node.send() is used by a node, then the NR runtime API can know the node input event handler will not call done() and not wait for it.

What if done is never called?
This is really tricky.

At the moment, I don’t believe NR retains a FIFO type flow of msgs between nodes, particularly when asynchronous functions involving I/O are called by ‘input’ handlers in nodes.  Race conditions could occur, especially if some msgs wait on I/O while others don’t.

So detecting a done() call on a subsequent msg, before the former message as a failure probably won’t work.

What if, as well as an arbitrary timeout period, we have an arbitrary number of subsequent done() calls?  The runtime could count the number of done() calls subsequent to outstanding msgs awaiting their done() call, and on exceeding a threshold, generate an error?  This might be more reliable than a timeout alone, as time delay could be a function of latency in the system (i.e. resulting from load), while subsequent done() calls would suggest otherwise.

It could also potentially signal a buggy node implementation that could then be disabled by NR runtime and alert the user (i.e. no done calls).

Thoughts?

Alternate event handler callback API
Just an alternate suggestion.  See what you think.

How about:

this.on('input', function(msg, action) {
    // do something with 'msg'
    if (!err) {
        // send can be called as many time as needed (including not at all)
        action.send(msg);
        action.send(msg);
        action.send(msg);
        // Once complete, done is called
        action.done();
    } else {
        // If an error occurs, call done providing the error.
        action.done(err);
    }
});
With this type of interface, it would be easier to evolve and expand in the future (say with a .rollback() call - see further down), by simply adding more properties to action.  This leaves the remainder of the function declaration available to other potentially unrelated parameters in the future.

Disadvantage is you have to know those property names (or your IDE would have to extract it from jsdoc/introspection) to call them.  A bit trickier for a new node developer.  

Thoughts?

Questions

Which nodes should report success?
Should all nodes report success/failure on processing messages, or should only outbound (output) nodes?  If the former, will this scale?  How much additional cpu load will this add to NR?  Assuming almost everything happens on the event thread, could NR become cpu bound, especially on SoC type hardware like a Raspberry Pi?

Should the runtime track the node of origin for msgs?
Would it be useful for msgs to contain some form of identifier of the node that created it, remembering it could be an input or an input/output node.  Even just its registered node name might be useful for downstream nodes in the flow.

Should the runtime track new msgs derived from others?
So a node splits a message up, should the new messages have a new _msgid, but a relationship with the _msgid whence it was created?  Would this allow subsequent nodes to report done() on the original _msgid, plus the new _msgid's?

Should the API allow a rollback()?
Would atomicity be useful?

What if a node generates multiple messages, and some related messages (but not all) are sent before an error condition occurs?  If its before the done() method, then a rollback() could be performed which would throw away those messages.  This would mean that the messages would be held in a queue until done() is called and they would be forwarded on together.

Would this be useful?

Would a taxonomy of errors be useful?
When done() is called in an error state, would it be useful to make provision in the API for both user-contributed node-specific error messages (like an error specific to node-red-contrib-*), but also a standardised NR error code? This may involve some thinking about the types of errors that are common and that a user would wish to act upon.  

Perhaps better explained with an example:

Off the top of my head, some NR error codes that the user of a contributed node could pass as an NR error condition:
ERROR_RETRANS - retry this message later (perhaps with time)
ERROR_DISCONNECTED - retry when CONNECTION restored
ERROR_RUNTIME - perhaps a syntax error in a function node
ERROR_CONFIG - error because of misconfiguration of node

So on and so forth.  Perhaps rather than the source of the error, instead a hint as to how to respond:

NOTIFY_USER - Misconfiguration or runtime error, so alert the user to the problem
RETRANS_TIMER - Retransmit after certain period of time
RETRANS_CONDITION - Retransmit after connection state change

This type of model might be useful to nodes that listen to ‘success’ so they get a standardised hint of what to do given the type of error.  

Thoughts?

D.

damo....@gmail.com

unread,
Aug 22, 2017, 1:11:26 AM8/22/17
to Node-RED
Hi Nick,

Now that 0.17 has settled, just wondering if you have had some time to think about the new message passing API - RE ideas/questions below?

Is any of the following helpful/useful in formulating the new API?  Happy to discuss further if you have time and you see value.

Cheers,
Damien.


On Tuesday, June 6, 2017 at 8:45:12 PM UTC+10, Damien Clark wrote:
Hey Nick,
Reply all
Reply to author
Forward
0 new messages