SparkConf() gives error only in Jupyter but not in console

336 views
Skip to first unread message

David Arenburg

unread,
May 11, 2017, 8:24:01 AM5/11/17
to Project Jupyter
Hello all,

I've been trying to figure this out for a week with no success.

I'm simply trying to initiate SparkContext in Jupyter but getting the following error when running SparkConf() :


```
from pyspark import SparkConf
SparkConf()

```

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-12-0c80a6a098f7> in <module>()
      1 #import statements
      2 from pyspark import SparkConf
----> 3 SparkConf()
      4 

/root/david/spark/python/pyspark/conf.pyc in __init__(self, loadDefaults, _jvm, _jconf)
    102         else:
    103             from pyspark.context import SparkContext
--> 104             SparkContext._ensure_initialized()
    105             _jvm = _jvm or SparkContext._jvm
    106             self._jconf = _jvm.SparkConf(loadDefaults)

/root/david/spark/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    241         with SparkContext._lock:
    242             if not SparkContext._gateway:
--> 243                 SparkContext._gateway = gateway or launch_gateway()
    244                 SparkContext._jvm = SparkContext._gateway.jvm
    245 

/root/david/spark/python/pyspark/java_gateway.pyc in launch_gateway()
     74             def preexec_func():
     75                 signal.signal(signal.SIGINT, signal.SIG_IGN)
---> 76             proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
     77         else:
     78             # preexec_fn not supported on Windows

/mnt/anaconda/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    709                                 p2cread, p2cwrite,
    710                                 c2pread, c2pwrite,
--> 711                                 errread, errwrite)
    712         except Exception:
    713             # Preserve original exception in case os.close raises.

/mnt/anaconda/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
   1341                         raise
   1342                 child_exception = pickle.loads(data)
-> 1343                 raise child_exception
   1344 
   1345 

OSError: [Errno 2] No such file or directory


This is my startJupyter.sh script which I use to lunch Jupyter

 #!/bin/bash
if ps -ef |grep $USER| grep python > /dev/null
then
        echo "Jupyter is Running - Restarting"
        echo "Killing jupyter-notebook process"

        running_id=$(ps -ef |grep $USER| grep python)
        stringarray=($running_id)
        echo ${stringarray[1]}
        kill -9 ${stringarray[1]}

        export SPARK_HOME='/usr/lib/spark/'
        export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH:$SPARK_HOME/python/lib/py4j-0.9-src.zip

        #jupyter nbextension enable --py widgetsnbextension
        /mnt/anaconda/bin/jupyter notebook &

else
        echo "Jupyter is Not Running"
        echo "Starting Jupyter-NoteBook"
        export SPARK_HOME='/usr/lib/spark/'
        export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH:$SPARK_HOME/python/lib/py4j-0.9-src.zip

        #jupyter nbextension enable --py widgetsnbextension
        /mnt/anaconda/bin/jupyter notebook &
fi


When running the same in python in console (not in Jupyter), it works fine

Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
>>> from pyspark import SparkConf
>>> SparkConf()
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
<pyspark.conf.SparkConf object at 0x7f482f78b6d0>


I've validated both the python version and the module path in console and in Jupyter and they seem to match

>>> import sys
>>> sys.version
'2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]'

>>> import inspect
>>> import pyspark
>>> inspect.getfile(pyspark)
'/root/david/spark/python/pyspark/__init__.pyc'

I can't think of anything else I could do wrong, please help
Thanks, David

My specs:

NAME="Amazon Linux AMI"
VERSION="2017.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2017.03"
PRETTY_NAME="Amazon Linux AMI 2017.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2017.03:ga"
Amazon Linux AMI release 2017.03

Thomas Kluyver

unread,
May 11, 2017, 8:30:51 AM5/11/17
to Project Jupyter
Can you check your PATH environment variable in both the console and in Jupyter? From Python, you can do that like this:

import os
os.environ['PATH']

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/a1924dad-a893-447f-8b79-735bb2ff0fb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Arenburg

unread,
May 11, 2017, 8:40:50 AM5/11/17
to Project Jupyter
Hi takowl,

They indeed a bit different, Jupyter has:

'/mnt/anaconda/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/usr/local/bin:/root/bin'

 While from console I get:

'/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/usr/local/bin:/root/bin'

Though I've tried doing in 
import os
os.environ['PATH'] = '/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/usr/local/bin:/root/bin'
from pyspark import SparkConf
SparkConf()

In Jupyter but still getting the same error

Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.

Thomas Kluyver

unread,
May 11, 2017, 8:48:24 AM5/11/17
to Project Jupyter
So the traceback indicates that it's trying to launch a command in a subprocess, and failing to find that command. Can you work out what command it's trying to launch, and where the file for that command is on your filesystem?

Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.

David Arenburg

unread,
May 11, 2017, 8:53:01 AM5/11/17
to Project Jupyter
It's just running `SparkConf()` and launches whatever it has in its source code. I don't have any additional informaiton rather digging into `SparkConf()` source code. But I don't debugging `SparkConf()` make sense, not to mention the same code works in console usying the same python and the same libs path. There mush be some configuration issue that Im missing

Thomas Kluyver

unread,
May 11, 2017, 9:11:14 AM5/11/17
to Project Jupyter
On 11 May 2017 at 13:53, David Arenburg <david.a...@gmail.com> wrote:
I don't have any additional informaiton rather digging into `SparkConf()` source code. But I don't debugging `SparkConf()` make sense, not to mention the same code works in console usying the same python and the same libs path. There mush be some configuration issue that Im missing

Understood, but I'm suggesting that the best way to figure out the configuration issue is to dig into SparkConf() source code and see what it's trying to launch.
Reply all
Reply to author
Forward
0 new messages