Flink Job Fails to Recover When Using Asynchronous IO Mode in Nussknacker

Joice Jacob

unread,

Nov 19, 2025, 8:26:16 AMNov 19

to Nussknacker

Hi,
I have the following restart strategy configured in application-customizations.conf:
restartStrategy {
default {
strategy: fixed-delay
attempts: 10
delay: 10s
}
}

I am submitting a scenario to a Flink cluster with 2 TaskManagers and a scenario parallelism of 2.
Under normal conditions, the scenario runs correctly across both TaskManagers.

However, the problem occurs when one TaskManager goes down.
In this situation, the scenario fails after a few restart attempts, even though one TaskManager is still running and free slots are available. Please find the attached error log.

In the scenario properties, the IO mode is set to Asynchronous.
If I switch the IO mode to Synchronous, the issue does not occur — the Flink job automatically restarts and continues running on the remaining TaskManager.

Because Synchronous mode has significantly lower performance, I would like to continue using Asynchronous IO mode.

Question

Is there any configuration or recommended approach in Nussknacker or Flink that can ensure stable job recovery when using Asynchronous IO mode?
Specifically, how can we prevent recovery failures or state-restore errors when one TaskManager stops?

Any guidance or best practices would be appreciated.

error.txt

Joice Jacob

unread,

Nov 19, 2025, 11:21:01 PMNov 19

to Nussknacker

Hi,

I would like your help to resolve this issue.

Thanks & Regards,
Joice Jacob

--
You received this message because you are subscribed to the Google Groups "Nussknacker" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nussknacker...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/nussknacker/c8229f71-9239-4cb6-aea9-1f18877f978fn%40googlegroups.com.

Joice Jacob

unread,

Nov 21, 2025, 3:08:24 AMNov 21

to Nussknacker, Arkadiusz Burdach

Hi
I would really appreciate any guidance on this issue. Even high-level suggestions on recommended Flink or Nussknacker configurations, best practices, or tuning options for improving stability with Asynchronous IO mode would be extremely helpful.

Thanks & Regards,
Joice Jacob

Reply all

Reply to author

Forward