Flink Job Fails to Recover When Using Asynchronous IO Mode in Nussknacker

1 view
Skip to first unread message

Joice Jacob

unread,
Nov 19, 2025, 8:26:16 AM (2 days ago) Nov 19
to Nussknacker
Hi,
  I have the following restart strategy configured in application-customizations.conf:
restartStrategy {
  default {
    strategy: fixed-delay
    attempts: 10
    delay: 10s
  }
}

I am submitting a scenario to a Flink cluster with 2 TaskManagers and a scenario parallelism of 2.
Under normal conditions, the scenario runs correctly across both TaskManagers.

However, the problem occurs when one TaskManager goes down.
In this situation, the scenario fails after a few restart attempts, even though one TaskManager is still running and free slots are available. Please find the attached error log.

In the scenario properties, the IO mode is set to Asynchronous.
If I switch the IO mode to Synchronous, the issue does not occur — the Flink job automatically restarts and continues running on the remaining TaskManager.

Because Synchronous mode has significantly lower performance, I would like to continue using Asynchronous IO mode.

Question

Is there any configuration or recommended approach in Nussknacker or Flink that can ensure stable job recovery when using Asynchronous IO mode?
Specifically, how can we prevent recovery failures or state-restore errors when one TaskManager stops?

Any guidance or best practices would be appreciated.  
error.txt

Joice Jacob

unread,
Nov 19, 2025, 11:21:01 PM (2 days ago) Nov 19
to Nussknacker
Hi,

  I would like your help to resolve this issue.  

Thanks & Regards, 
Joice Jacob 



--
You received this message because you are subscribed to the Google Groups "Nussknacker" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nussknacker...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/nussknacker/c8229f71-9239-4cb6-aea9-1f18877f978fn%40googlegroups.com.

Joice Jacob

unread,
3:08 AM (2 hours ago) 3:08 AM
to Nussknacker, Arkadiusz Burdach
Hi
  I would really appreciate any guidance on this issue. Even high-level suggestions on recommended Flink or Nussknacker configurations, best practices, or tuning options for improving stability with Asynchronous IO mode would be extremely helpful.  

Thanks & Regards, 
Joice Jacob 


Reply all
Reply to author
Forward
0 new messages