Hi Jovyan,
I saw that there is a lot of questions about PySpark/Spark and how to setup Jupyter to work with spark.
So I started a quest to see how long and hard it would be to install apache-spark/pyspark from scratch
and have it working. (Disclamer, I actually never have installed or used Spark or PySpark before)
Here are my findings:
it's small enough that I will quote it here:
### Spark
- Install apache-spark (`$ brew install apache-spark`)
- fire a notebook (`jupyter notebook`)
enter the following:
```python
import findspark
import os
findspark.init() # you need that before import pyspark.
import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()
```
execute, and you get the immediate result :
```
221
```
Yayyyyy ! It works ! (installing java took 20min of the 30 to set that up :-P )
### comments:
You do not need a custom profile, nor do you need to use IPython, or the notebook to do that.
You do not either need a specific kernel. This is just using spark as any other library, which make it
extremely convenient to just start prototyping something in python and think "Oh, I need spark", and just use it.
No complex set-up, no kernelspec manipulation, no convoluted choices to make if you just want to try with spark.
Of course you might need some tweak to actually have things to scale, but at least you get it to work, and you can prototype.
Hope that will help, while Auberon is working on making things even easier to install.
--
M