buffering msg while server is down

212 views
Skip to first unread message

Nick

unread,
Dec 29, 2021, 9:38:42 AM12/29/21
to nats
Dear All,

I'm trying to simulate message buffering while STAN server is unreachable.
My setup consists of a NATS streaming server and a client each run in a separate docker container.

I perform the following actions : 
  1. The client subscribes channel X
  2. I stop the container running nats server (this would make the server unreachable for the client) 
  3.  The client publishes several messages on STAN channel X  
  4.  I start the container running the nats server
  5. The client reconnects successfully to the STAN server 

At this point, I would expect the messages buffered in point 2 to be published as soon as the STAN server is online. 
However, the messages are not published.  The server down time I'm  simulating is ~2  min. I'm using the C client and the connection parameters are:

        natsOptions_SetSendAsap( 1);
        natsOptions_SetMaxReconnect( -1);
        natsOptions_SetReconnectWait( 1000);      
        natsOptions_SetReconnectBufSize( 8388608);
        stanConnOptions_SetPings( 5, 1000);
        stanConnOptions_SetPubAckWait(1000000);

What am I missing? 
Thanks in advance

Nick

Colin Sullivan

unread,
Dec 29, 2021, 10:51:41 AM12/29/21
to nats
Nick, thanks for using NATS!

Expected behavior would be as follows:

1) the client detects a disconnect, lower level publish calls are buffered and resent when the connection is established.  This occurs at a low level in the client and isn't 100% guaranteed though.
2) The higher level STAN publish calls would fail with a pub ack timeout indicating your application should resend the message.

In this case, the pubAckWait option is very high, I'd suggest lower this significantly to something like 100-500ms to ensure you encounter the error in your application, and then resend your message.  This is the best approach.

Depending on how you stop the server, docker may not gracefully close the socket.  When this happens it takes the NATS client library time to detect this (based on pings) - this would affect the low level buffering.

Note that if this is greenfield development, I'd suggest checking out JetStream, our successor to NATS Streaming which is being deprecated.

Best regards,
Colin

Nick

unread,
Dec 29, 2021, 11:54:09 AM12/29/21
to nats
Dear Colin,

Thank you for your reply. 
I was totally relying on the client to be able to buffer messages and publish them once the connection is up. 

On Wednesday, December 29, 2021 at 4:51:41 PM UTC+1 co...@nats.io wrote:
Nick, thanks for using NATS!

Expected behavior would be as follows:

1) the client detects a disconnect, lower level publish calls are buffered and resent when the connection is established.  This occurs at a low level in the client and isn't 100% guaranteed though.
 
Could you elaborate a little bit more on this? I mean,  what are the architectural reasons for not being able to republish messages automatically in 100% of the cases? 
 
2) The higher level STAN publish calls would fail with a pub ack timeout indicating your application should resend the message.

In this case, the pubAckWait option is very high, I'd suggest lower this significantly to something like 100-500ms to ensure you encounter the error in your application, and then resend your message.  This is the best approach.
 
Depending on how you stop the server, docker may not gracefully close the socket.  When this happens it takes the NATS client library time to detect this (based on pings) - this would affect the low level buffering.

Note that if this is greenfield development, I'd suggest checking out JetStream, our successor to NATS Streaming which is being deprecated.
Thank you, I'm aware that STAN will be deprecated. 
For now, I'm stuck with it as I need to maintain a project using it. In a near future, I hope that I will be able to find time to include JetStream in my project. 

Best Regards

Nick

Colin Sullivan

unread,
Dec 29, 2021, 3:01:44 PM12/29/21
to nats
Ahh, you'll need to add a bit of logic in your application to resend if the NATS streaming publish fails.  More detail inline...

On Wednesday, December 29, 2021 at 9:54:09 AM UTC-7 Nick wrote:
Dear Colin,

Thank you for your reply. 
I was totally relying on the client to be able to buffer messages and publish them once the connection is up. 

On Wednesday, December 29, 2021 at 4:51:41 PM UTC+1 co...@nats.io wrote:
Nick, thanks for using NATS!

Expected behavior would be as follows:

1) the client detects a disconnect, lower level publish calls are buffered and resent when the connection is established.  This occurs at a low level in the client and isn't 100% guaranteed though.
 
Could you elaborate a little bit more on this? I mean,  what are the architectural reasons for not being able to republish messages automatically in 100% of the cases? 

COLIN:  Sure.  It boils down to a few things.  Core NATS (underneath NATS streaming) provides a guarantee of "at most once" delivery.  Early on in the clients we made the architectural decision to buffer messages and flush periodically on a connection for performance reasons.  Most clients behave this way.  If the connection is terminated in the process of writing a message to the socket, you may have a partial message written, or possibly have written the entire message to the socket buffer, but the client doesn't know if the message made it to the server.  Even if it made it to the server, there's no guarantee it'll make it to the final destination(s).

When the client flushes the buffer after reconnect, the NATS server will discard partial messages (protocol error) resulting a dropped message, or if the message never made it to the server, then it disappeared during the socket disconnect event.  Complete messages in the client's reconnect buffer will succeed in being processed by the server assuming all is going well at that point.

Since core NATS guarantees at most once delivery, we won't re-send messages from a publisher.  JetStream (and NATS Streaming) however do provide at least once semantics (and exactly once for JetStream) using acknowledgments atop core NATS.  A successful publish acknowledgment guarantees that message has been stored in NATS, and we leave it up to the application to attempt resend any messages that didn't receive a successful publish acknowledgment.  Different use cases will employ various retry strategies, determine when to give up, and dictate what to do when a message cannot be published into the NATS system.

The simplest method is to keep retrying publish calls on a publish acknowledgment failure until reconnected.  You can use the disconnected / reconnected callbacks to pause publishing or check connection state in most clients.

Nick

unread,
Dec 30, 2021, 5:18:10 AM12/30/21
to nats
Thank you Colin, now everything looks a little bit clearer to me.

Best Regards
Nick
Reply all
Reply to author
Forward
0 new messages