STOMP Client Failing every morning

393 views
Skip to first unread message

Woody Willis

unread,
Sep 3, 2023, 4:05:26 PM9/3/23
to A gathering place for the Open Rail Data community
Hello everyone!

I've been using the LDBWS previously and have been wanting to move to the push port but im have a little trouble. Every morning, it seems to be around 2 ish, the client disconnects and refuses to reconnect. I assume that Push Port restarts every morning, or could it just be my code?

Many thanks and I look forward to replies,
Woody

Peter Hicks (Poggs)

unread,
Sep 3, 2023, 4:08:02 PM9/3/23
to A gathering place for the Open Rail Data community
Hello

> On 3 Sep 2023, at 21:04, Woody Willis <woodyo...@gmail.com> wrote:
>
> I've been using the LDBWS previously and have been wanting to move to the push port but im have a little trouble. Every morning, it seems to be around 2 ish, the client disconnects and refuses to reconnect. I assume that Push Port restarts every morning, or could it just be my code?

You shouldn’t get disconnected as a result of Darwin services restarting, but you may see zero messages for several minutes. You might have a timeout occurring on your client which is disconnecting you.

What error message(s) do you get? It’d help to know what language you’re using too.


Peter

Evelyn Snow

unread,
Sep 4, 2023, 8:04:02 AM9/4/23
to openrail...@googlegroups.com
Hi Woody,

I've had a similar issue before in the quiet hours of the feed, I found that
increasing the tolerance for heartbeat receive timeout to about 10000ms was
enough to reliably resolve it. I haven't experimented much, 4000 was too low,
10 seconds may be excessive.

I haven't looked at the underlying cause on the ActiveMQ end, but it's either
not aggressive enough in sending heartbeats, or it's negotiating heartbeating
intervals it can't keep to. In either case, increasing your client's tolerance
should help.

My answer here is based on a number of assumptions about the capabilities of
the library you're using. If it doesn't support heartbeating, this is not going
to help you, and this says nothing about why your code isn't attempting
(or succeeding?) in reconnecting

Evelyn

Woody Willis

unread,
Sep 5, 2023, 3:07:58 AM9/5/23
to A gathering place for the Open Rail Data community
Hi again,

Sorry for the late reply, I've tried a few things out. My heartbeat interval timeout is set at 15000 and it stops work due to a heartbeat timeout. I'm using a modified version of the python STOMP client.

Many thanks,
Woody

Peter Hicks (Poggs)

unread,
Sep 5, 2023, 3:49:32 AM9/5/23
to A gathering place for the Open Rail Data community

On 5 Sep 2023, at 08:07, Woody Willis <woodyo...@gmail.com> wrote:

Sorry for the late reply, I've tried a few things out. My heartbeat interval timeout is set at 15000 and it stops work due to a heartbeat timeout. I'm using a modified version of the python STOMP client.

The next thing to do is put a callback every time a heartbeat is sent and a response received - that will show if it’s definitely the heartbeat failing or not.

You should also look at your networking setup - if you’re running behind a NAT device or firewall which might time out connections after a period of no activity (e.g. Virgin Media hub/router, not picking on them specifically though), that will possibly cause grief for you during periods of low/no traffic.

Finally, if you’re getting disconnected due to successive failed heartbeats, how quickly do you try to reconnect?  I’d recommend either an exponential back-off (e.g. wait 1s, then 2s, then 4s, 8s, 16s, 32s…) or simply waiting for 30 seconds each time maybe as a test.

If you can post a minimal example on GitHub, if it’s Python then I bet somebody here can run it and check whether it’s a code issue or a networking issue.


Peter

Jack Brewer

unread,
Sep 5, 2023, 7:13:26 AM9/5/23
to openrail...@googlegroups.com
To give some feedback on my experience with the Python STOMP library.

Stomp.py defaults to using STOMP protocol 1.1, so instead of using stomp.Connection use stomp.Connection12 to force the STOMP 1.2 protocol. 

If I remember correctly, the default is for the a reconnection attempt every second (+-0.6 sec) for 30 seconds. This sometimes doesn’t give the server long enough to restart before your script gives up trying to reconnect. Try setting something like the following for better reconnection reliability:
reconnect_sleep_initial=1, reconnect_sleep_increase=2, reconnect_sleep_jitter=0.6, reconnect_sleep_max=60.0, reconnect_attempts_max=60

I have also noticed that there is a tendency for a heartbeat not to be received when it should be. This often isn’t an issue as another message is sent before a heartbeat would be expected anyway. However during quieter periods this becomes a problem and your script will disconnect due to a heartbeat timeout. I couldn’t find exactly where the issue lies, but if you set heart_beat_receive_scale=2.5, you give the script a chance to miss receiving a heartbeat, giving greater tolerance. It does mean you run the risk of increasing the length of time your script isn’t aware the server is down, but I’ve found it an acceptable trade off.

When subscribing be sure to set ack = "client-individual" and on receipt of a message acknowledge with ack(id=headers["ack"]).

Hope this helps,

Jack

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openraildata-talk/AD33EADD-8134-43E9-BC5D-87B94D56BA1C%40poggs.co.uk.

Woody Willis

unread,
Sep 5, 2023, 1:24:50 PM9/5/23
to A gathering place for the Open Rail Data community
Hey everyone,

I've just added the settings that Jack suggested and I'll let you know how it goes. When subscribing I'm using ack = 'auto'. Should this be 'client-individual'?

Many thanks for all your help,
Woody

Woody Willis

unread,
Sep 6, 2023, 2:21:37 AM9/6/23
to A gathering place for the Open Rail Data community
Hello again,

The issue is still there. Here is the message: WARNING heartbeat timeout: diff_receive=41.60944127000403, time=827267.509801804, lastrec=827225.900360534

Hopefully someone can decode this.

Many thanks,
Woody

Reply all
Reply to author
Forward
0 new messages