CDAP Spark Got Killed

19 views
Skip to first unread message

fancha...@gmail.com

unread,
Mar 23, 2017, 9:56:29 PM3/23/17
to CDAP User
Hello Cask,

I have a cdap spark program which is included in a workflow in an app. The case is, when the spark job encountered the below error logs, it got killed. However, per our platform team (actually, I also found some discussions in google) investigation, it is a connection error to zookeeper, and this is just noisy error logs, it should not make the spark job killed. So, I assumed that if the workflow will kill the spark job encapsulated in it when it got such error logs ? If yes, how to ignore such case and keep the spark job running since there is no other impact ? Could you please give any advice ?

Error logs for your reference:

AssociationError [akka.tcp://spark...@10.204.153.5:44791] -> [akka.tcp://sparkE...@c822cgn.int.westgroup.com:35943]: Error [Association failed with [akka.tcp://sparkE...@c822cgn.int.westgroup.com:35943]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkE...@c822cgn.int.westgroup.com:35943] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: c822cgn.int.westgroup.com/10.204.153.17:35943 ]


Thanks,
Fanchao

fancha...@gmail.com

unread,
Mar 23, 2017, 10:00:37 PM3/23/17
to CDAP User, fancha...@gmail.com

rus...@cask.co

unread,
Mar 31, 2017, 8:47:26 PM3/31/17
to CDAP User, fancha...@gmail.com
Hi Fanchao, Sorry for the delay in responding to you. Were you able to figure out what the issue was? From the logs, it looks like the Driver and the Executors are having trouble connecting to each other. Have you checked your firewall rules to ensure the machines can find each other?

Thanks,
Russ

On Thursday, March 23, 2017 at 7:00:37 PM UTC-7, fancha...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages