JupyterHub + Kubernetes + Remote Spark HELP

604 views
Skip to first unread message

Mariano Simone

unread,
Feb 2, 2019, 9:33:01 AM2/2/19
to Project Jupyter
Hello guys,

I have a JupyterHub deployed in a Kubernetes cluster.

I can spawn notebooks just fine. The problem arise when I try to spawn a spark session and connect to a remote server.

I can see the application on the remote spark server but the server can't connect back to the driver.

How can I fix this? I tried --net=host on docker but it doesn't work.

Any way to get this working?

Kevin Bates

unread,
Feb 3, 2019, 2:19:59 PM2/3/19
to Project Jupyter
Hi Mariano,

Per Luciano's response to the "Docker jupyter kernel" thread you may want to check out Enterprise Gateway.  When Notebook is configured to point to EG via the NB2KG server extension, the kernel management is proxied to EG.  In Kubernetes, EG launches kernels in their own pods across the cluster.  We provide kernel images configured with Spark 2.4 where EG uses spark-submit cluster mode to launch the kernel.  The kernel pod is the spark driver and spark is executed within the k8s cluster using K8s as the resource manager.

Since it sounds like your Spark cluster is external to Kubernetes and assuming you can't use Spark on K8s, there are a couple more options you could take via EG.  You could do something similar to the YARN-based kernelspecs we provide.  In this case, the kernel would be launched in cluster mode - so it's running as the spark driver in the remote cluster.  Or you may want to take a look at using our spark-based kernel images but launched as regular kernels (as opposed to the spark-submit launch).  You can then either create the spark context from within the notebook cell (as you're probably doing) or convey the necessary information to the pod's launch to have the image's startup script create the spark context.  In this case, the kernel would be the Spark driver running in client mode.  In any case, we'd be happy to work with you.  I agree that you're likely running into a container network issue.

Best regards,
Kevin.

Mariano Simone

unread,
Feb 3, 2019, 4:56:09 PM2/3/19
to jup...@googlegroups.com
Hello Kevin, 

Thanks for taking a moment to reply.

Yes, you are correct, my spark cluster is remote. I’m deploying in client mode.

The only problem I can not manage to fix is that the applications spawned on the remote spark cluster can’t connect back to the driver inside the notebook.

I tried adding net=host in hub.extraConfig but I don’t think that works.

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/4452688c-f46f-46df-8bd7-b01357bf601e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ray Hilton

unread,
Feb 3, 2019, 5:03:28 PM2/3/19
to jup...@googlegroups.com
In my experience, Spark in k8s inherited the pods hostname (of the form podname.namespace) which is not resolvable by default, and so definitely won’t be outside the cluster.  I set the driver.host to the IP of the pod, and assuming you can route to the cluster nodes, that should work.

Despite this being spark on k8s, it’s still setting up spark in client mode, so I imagine it would be much the same.

Ray

 

From: jup...@googlegroups.com on behalf of Mariano Simone <mljs...@gmail.com>
Sent: Sunday, February 3, 2019 1:33 am
To: Project Jupyter
Subject: [jupyter] JupyterHub + Kubernetes + Remote Spark HELP
 
--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages