The
problem is using k8s to manage a cluster consisting of our app,
some databases, and Spark (one master, one driver, several
executors). The problem is that some kind of callback from Spark is
trying to use the pod ID in the callback and is failing to connect
because of that. We have tried deployMode “client” and “cluster”
but get the same error
This
came from the deployMode = “client: and the port is the driver
port, which should be on the launching pod. For some reason it is
using a pod ID instead of a real address. Doesn’t the driver run in
the launching app’s process? The launching app is on the pod ID
harness-64d97d6d6-6n7nh but it has the k8s DNS address of
harness-api. I can see the correct address fro the launching pod
with "kubectl get services"
Spark Executor Command:
"/usr/lib/jvm/java-1.8-openjdk/bin/java" "-cp"
"/spark/conf/:/spark/jars/*:/etc/hadoop/" "-Xmx1024M"
"-Dspark.driver.port=46337"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url"
"spark://CoarseGrainedScheduler@harness-64d97d6d6-6n7nh:46337"
"--executor-id" "138" "--hostname" "10.31.31.174" "--cores" "8"
"--app-id" "app-20190213210105-0000" "--worker-url"
"spark://
Wor...@10.31.31.174:37609"
========================================
Exception in thread "main"
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:63)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by:
org.apache.spark.SparkException: Exception thrown in
awaitResult:
at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at
org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:63)
at
java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to
connect to harness-64d97d6d6-6n7nh:46337
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException:
harness-64d97d6d6-6n7nh
at
java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at
java.net.InetAddress.getAllByName(InetAddress.java:1193)
at
java.net.InetAddress.getAllByName(InetAddress.java:1127)
at
java.net.InetAddress.getByName(InetAddress.java:1077)
at
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at
java.security.AccessController.doPrivileged(Native Method)
at
io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at
io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
at
io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
at
io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
at
io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
at
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
at
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
at
io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more