Hello Cask,
I have a cdap spark program which is included in a workflow in an app. The case is, when the spark job encountered the below error logs, it got killed. However, per our platform team (actually, I also found some discussions in google) investigation, it is a connection error to zookeeper, and this is just noisy error logs, it should not make the spark job killed. So, I assumed that if the workflow will kill the spark job encapsulated in it when it got such error logs ? If yes, how to ignore such case and keep the spark job running since there is no other impact ? Could you please give any advice ?
Error logs for your reference:
AssociationError
[akka.tcp://spark...@10.204.153.5:44791] ->
[akka.tcp://sparkE...@c822cgn.int.westgroup.com:35943]: Error [Association failed with
[akka.tcp://sparkE...@c822cgn.int.westgroup.com:35943]] [ akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkE...@c822cgn.int.westgroup.com:35943]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused:
c822cgn.int.westgroup.com/10.204.153.17:35943 ]
Thanks,
Fanchao