StartExecutionOperation invocation failed to complete due to operation-heartbeat-timeout.

17 views
Skip to first unread message

Lukáš Herman

unread,
Aug 6, 2019, 8:54:50 AM8/6/19
to hazelcast-jet
Hello Jet Team,
we have noticed that during certain failure scenarios, when a node becomes unresponsive for several minutes, all jobs are terminated with OperationTimeoutException: StartExecutionOperation invocation failed to complete due to operation-heartbeat-timeout right after a cluster is formed from remaining members. 
Is this expected behavior? Because we have already seen and tested scenarios with topology changes where the jobs were restarted automatically. 

(Jet 0.7.2)

regards

Lukas Herman

Can Gencer

unread,
Aug 6, 2019, 9:08:24 AM8/6/19
to Lukáš Herman, hazelcast-jet
Hi Lukáš,

Please see this PR: 


Previously the timeout wasn't being treated as a restarable exception, but after this PR they should be treated as restartable exceptions. This was merged into 3.1.

--
You received this message because you are subscribed to the Google Groups "hazelcast-jet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast-je...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast-jet/907fd5c4-ecfc-4c6c-9e18-9d3cd46e6359%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages