This seems like it should be pretty basic, but I'm having a hard time installing a module to the pyspark notebook. I must be missing something...
If I choose a python3 notebook, the module (matplotlib) is already there and works as expected. I notice that root has default python as /opt/conda/bin/python (python 3.6.3). The jovyan user has python as /usr/bin/python (python 2.7.12).
If I try to use the module matplotlib in the pyspark kernel, I get this:
No module named matplotlib.pyplot
Traceback (most recent call last):
ImportError: No module named matplotlib.pyplot
I've tried several things including installing pip for /usr/bin/python, and installing the matplotlib module there.. but still it is a problem.
I have a docker contanier that inherits from all-spark-notebook, so I can modify the container OS if needed.
I look at the kernel.json and see:
/usr/local/share/jupyter/kernels/pysparkkernel/kernel.json
{"argv":["python","-m","sparkmagic.kernels.pysparkkernel.pysparkkernel", "-f", "{connection_file}"],
"display_name":"PySpark"
}
I assume that since python is not fully qualified that is picks up /usr/bin/python from the path.
Any ideas?
Thanks,
Tim