I would like to run a spark master from my Jupyter notebook. Currently I have Jupyter Hub running in Kubernetes successfully, and Kubespawner creates notebooks and it is awesome.
The problem is, in order to run spark master (driver) from my notebook to a Yarn cluster (not spark-submit, where the driver would be running in the cluster) the yarn executors cannot reach back to my notebook because the ports aren't exposed as a service.
Looking at Kubespawner code, I see there is a function it looks like to make a service with each pod, but I'm not sure how it would be invoked.
def make_service(
name,
port,
servername,
owner_references,
labels=None,
annotations=None,
):
"""
Make a k8s service specification for using dns to communicate with the notebook.
Parameters
----------
name:
Name of the service. Must be unique within the namespace the object is
going to be created in.
env:
Dictionary of environment variables.
labels:
Labels to add to the service.
annotations:
Annotations to add to the service.
"""
And then there's a call to make an ingress as well:
What are my missing? I can't see anything in the Kubespawner documentation about creating service ports (ideally using a LoadBalancer IP).
Side question: should I be using Enterprise Gateway? I'm slightly confused at what the mainstream way to deploy Jupyter notebooks in a multi-user environment should be. I hadn't heard of the EG until I started researching this issue, and now I'm wondering if I'm deploying the right thing at all.
Thanks in Advance,
|> Greg