Hi,
I deployed JupyterHub and Enterprise Gateway on a Kubernetes cluster.
I also have a remote YARN cluster (Cloudera), it is not on my Kubernetes cluster.
I'm trying to launch "Spark - Python (YARN Cluster Mode)" kernel
(after configured EG_YARN_ENDPOINT)
The problem is that the enterprise gateway is listening on an internal IP with a random port and waits that the remote kernel will communicate with him.
from the log: (remark - the 10.244.1.195 IP is the IP of enterprise gateway pod, inside Kubernetes cluster)
"++ exec /usr/hdp/current/spark2-client/bin/
spark-submit --master yarn --deploy-mode cluster --name 6c20ed44-0394-4548-9f88-bddb6e5752e8 --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/jovyan/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=/.local/lib/python3.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/conda/bin:/opt/conda/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py --RemoteProcessProxy.kernel-id 6c20ed44-0394-4548-9f88-bddb6e5752e8
--RemoteProcessProxy.response-address 10.244.1.195:38997 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy"
then it copies launch_ipykernel.py to YARN cluster
"INFO yarn.Client: Uploading resource file:/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py -> hdfs://master1.cluster2.local:8020/user/hdfs/.sparkStaging/application_1590301476347_0115/launch_ipykernel.py"
On the YARN cluster, launch_ipykernel.py runs and tries to communicate with the IP address it received (
10.244.1.195:38997) but fails
I thought of adding Kubernetes service with remote IP (LoadBalacer) and to set EG_RESPONSE_IP to it, but because EG uses a random port, I can't do it
Can anyone advise what can I do?
thanks