At the first cell I got problems, and a message was:
ImportError: No module named 'sparkdl'.I installed the module in my cluster library. Got
ImportError: No module named 'keras'.Installed this as well. Then the same thing with tensorflow. At this point I got
ConnectException error: This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.
I tried different order for module installation. In particular since keras is build on the top of tensorflow I put the last one before first. Finally I got a list of all required modules: sparkdl, tensorflow, tensorflowonspark, tensorframes, kafka, jieba, keras.
I wish somewhere would be a list with them. Still was getting an error message:
AttributeError: module 'tensorflow' has no attribute 'Session'As far as I know `Session` method is a base `tensorflow` method. Googling did not yield a solution for PySpark.
I found Spark Deep Learning repository on github and read current recommendations, just in case it's here: https://github.com/databricks/spark-deep-learning/blob/master/README.md
We see the advice "To work with the latest code, Spark 2.3.0 is
required and Python 3.6 & Scala 2.11 are recommended". Thus I need
to create a cluster with these versions. But there is no such option
for me when I create a cluster, see attached picture.
I may use only Spark 2.4.* or 2.2.*
Can somebody please help me?
Best,
Mya