DEAP with SCOOP: error while loading shared libraries: libpython2.7.so.1.0

535 views
Skip to first unread message

Bradley Barnhart

unread,
Feb 16, 2016, 11:36:51 AM2/16/16
to scoop-users
Hi All!

I'm having an issue using scoop 0.7.1.1 and deap 1.1.0 for python 2.7.8.
The code works on my laptop but does not when I submit via qsub to a remote cluster [Red Hat 4.4.7-3].
For some reason, one or more nodes are not finding the libpython2.7.so.1.0 file even though I set LD_LIBRARY_PATH in the PBS script. Do you know of some other way (without admin privs) to make sure all nodes have access to this .so?

Here's the error
----------------------------
[2016-02-12 18:03:03,560] launcher  INFO    SCOOP 0.7 1.1 on linux2 using Python 2.7.8
(default, Oct 23 2014, 11:43:30) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)], API: 1013
[2016-02-12 18:03:03,560] launcher  INFO    Detected PBS environment.
[2016-02-12 18:03:03,560] launcher  INFO    Deploying 2 worker(s) over 2 host(s).
[2016-02-12 18:03:03,560] launcher  INFO    Worker distribution:
[2016-02-12 18:03:03,560] launcher  INFO       c1u3-ib0:    0 + origin
[2016-02-12 18:03:03,560] launcher  INFO       c1u3-ib0:    0 + origin
ERROR:root:Error while launching SCOOP subprocesses:
ERROR:root:Traceback (most recent call last):
  File "/work/CALHYDRO/python-modules/scoop-0.7.1.1/scoop/launcher.py", line 480, in main
    rootTaskExitCode = thisScoopApp.run()
  File "/work/CALHYDRO/python-modules/scoop-0.7.1.1/scoop/launcher.py", line 261, in run
    backend=self.backend,
  File "/work/CALHYDRO/python-modules/scoop-0.7.1.1/scoop/launch/brokerLaunch.py", line 158, in __init__
    "SSH process stderr:\n{stderr}".format(**locals()))
Exception: Could not successfully launch the remote broker.
Requested remote broker ports, received:

Port number decoding error:
need more than 1 value to unpack
SSH process stderr:
/usr/local/apps/Python/2.7.8/bin/python:
error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory


[2016-02-12 18:03:03,701] launcher  INFO    Finished cleaning spawned subprocesses.
INFO:launcherLogger:Finished cleaning spawned subprocesses.
--------------------------------------
------------------------------
Here's my PBS script.
--------------------------
#!/bin/csh
#PBS -l procs=2
#PBS -j oe

module load python/2.7.8
cd /work/CALHYDRO/calExpHydro
setenv PYTHONPATH /work/CALHYDRO/python-modules/scoop-0.7.1.1
setenv LD_LIBRARY_PATH /usr/local/apps/Python/2.7.8/lib ##THIS IS WHERE libpython2.7.so.1.0 IS!!

python -m scoop --pythonpath PYTHONPATH -n 2 --hostfile $PBS_NODEFILE ga.py
-----------


Thanks in advance!
Brad

Yannick Hold-Geoffroy

unread,
Feb 16, 2016, 11:55:28 AM2/16/16
to scoop...@googlegroups.com
Hello,

The issue seems to be 2 things: 1) the worker split seems off (twice the origin?) because a node is there twice in the nodefile, and 2) it tries to launch the broker on a remote node, but it shouldn't.

Have you tried not specifying an --hostfile in your SCOOP invocation (last line in the submit file)? SCOOP should recognize PBS and work directly with it. I believe this flag causes the problem.

By the way, shouldn't PYTHONPATH in the last line have a dollar sign in front of it?

Have a nice day,
Yannick

--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "scoop-users".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse scoop-users...@googlegroups.com.
Pour obtenir davantage d'options, consultez la page https://groups.google.com/d/optout.

Brad Barnhart

unread,
Feb 16, 2016, 12:18:26 PM2/16/16
to scoop...@googlegroups.com
Thanks very much for your reply!

I believe the first issue was solved by removing the --hostfile, as you suggested. The second remains. Do you suspect it is not an issue with the .so file not being available?

PBS Script

-------------------
#!/bin/csh
#PBS -l procs=2
#PBS -j oe

module load python/2.7.8

cd /work/CALHYDRO/calExpHydro
setenv PYTHONPATH /work/CALHYDRO/python-modules/scoop-0.7.1.1
setenv LD_LIBRARY_PATH /usr/local/apps/Python/2.7.8/lib:LD_LIBRARY_PATH

#python -m scoop --pythonpath PYTHONPATH -n 2 --hostfile $PBS_NODEFILE ga.py
python -m scoop --pythonpath $PYTHONPATH -n 2 ga.py
------------------------------------

Error Output
-----------------------
[2016-02-16 12:11:33,827] launcher  INFO    SCOOP 0.7 1.1 on linux2 using Python 2.7.8 (default, Oct 23 2014, 11:43:30) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)], API: 1013
[2016-02-16 12:11:33,827] launcher  INFO    Detected PBS environment.
[2016-02-16 12:11:33,828] launcher  INFO    Deploying 2 worker(s) over 1 host(s).
[2016-02-16 12:11:33,828] launcher  INFO    Worker distribution:
[2016-02-16 12:11:33,828] launcher  INFO       c1u1-ib0:    1 + origin

ERROR:root:Error while launching SCOOP subprocesses:
ERROR:root:Traceback (most recent call last):
  File "/work/CALHYDRO/python-modules/scoop-0.7.1.1/scoop/launcher.py", line 480, in main
    rootTaskExitCode = thisScoopApp.run()
  File "/work/CALHYDRO/python-modules/scoop-0.7.1.1/scoop/launcher.py", line 261, in run
    backend=self.backend,
  File "/work/CALHYDRO/python-modules/scoop-0.7.1.1/scoop/launch/brokerLaunch.py", line 158, in __init__
    "SSH process stderr:\n{stderr}".format(**locals()))
Exception: Could not successfully launch the remote broker.
Requested remote broker ports, received:

Port number decoding error:
need more than 1 value to unpack
SSH process stderr:
/usr/local/apps/Python/2.7.8/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory


[2016-02-16 12:11:33,964] launcher  INFO    Finished cleaning spawned subprocesses.

INFO:launcherLogger:Finished cleaning spawned subprocesses.
----------------------

Thanks,
Brad

--
Vous recevez ce message, car vous êtes abonné à un sujet dans le groupe Google Groupes "scoop-users".
Pour vous désabonner de ce sujet, visitez le site https://groups.google.com/d/topic/scoop-users/X8wL3nzLUZ8/unsubscribe.
Pour vous désabonner de ce groupe et de tous ses sujets, envoyez un e-mail à l'adresse scoop-users...@googlegroups.com.

Ben Elliston

unread,
Feb 16, 2016, 1:02:55 PM2/16/16
to scoop...@googlegroups.com
Hi Brad

> SSH process stderr:
> /usr/local/apps/Python/2.7.8/bin/python: error while loading shared
> libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory

My suggestion is to assume that this error message is correct. Check and
double check that this shared library is in the right location. You
have set $LD_LIBRARY_PATH to /usr/local/apps/Python/2.7.8/lib. Try this:

file /usr/local/apps/Python/2.7.8/lib/libpython2.7.so.1.0

Also, you need a $ between : and LD_LIBRARY_PATH in your setenv command.

Cheers, Ben

--
Ben Elliston
Centre for Energy and Environmental Markets
University of New South Wales

Brad Barnhart

unread,
Feb 16, 2016, 2:14:55 PM2/16/16
to scoop...@googlegroups.com
Thank you very much, Ben.

I checked, and the file does exist in /usr/local/apps/Python/2.7.8/lib/.

However, I've played around a bit with using ldd from the command line.
'ldd /usr/loca/apps/Pthon/2.7.8/bin/python' gives something to the effect of
"linux-vdso.so.1 => (0x00007fffbf17300)
libdl.so.2 => /lib64/libpthread.so.0 (0x00000038d7800000)
....
libpython2.7.so.1.0 => not found.
---

But if I then say
'module load python/2.7.8'
'ldd /usr/loca/apps/Pthon/2.7.8/bin/python'

Then it properly points to the /lib .so file and all others.

This makes me think that my simple 'module load python/2.7.8' line in the PBS script is not being called by all nodes.
Does anyone know of a way to specify that all nodes run 'module load python/2.7.8' or any module for that matter?

Thanks all for your time,
Brad




--
Vous recevez ce message car vous êtes abonné à un sujet dans le groupe Google Groupes "scoop-users".

Pour vous désabonner de ce sujet, visitez le site https://groups.google.com/d/topic/scoop-users/X8wL3nzLUZ8/unsubscribe.
Pour vous désabonner de ce groupe et de tous ses sujets, envoyez un e-mail à l'adresse scoop-users...@googlegroups.com.
Pour plus d'options, visitez le site https://groups.google.com/d/optout .

Ben Elliston

unread,
Feb 16, 2016, 2:21:04 PM2/16/16
to scoop...@googlegroups.com
On 17/02/16 06:14, Brad Barnhart wrote:

> But if I then say
> 'module load python/2.7.8'
> 'ldd /usr/loca/apps/Pthon/2.7.8/bin/python'
>
> Then it properly points to the /lib .so file and all others.

I think you're on the right track here. Glad we could at least partly
troubleshoot the problem!

Cheers, Ben

Yannick Hold-Geoffroy

unread,
Feb 16, 2016, 7:41:05 PM2/16/16
to scoop-users, b.ell...@unsw.edu.au
Hello,

Thanks Ben for your input on the matter, it's appreciated.

You are right, Bradley, the scheduler executes the submission script only on the first node that was allocated to your job.

One thing you can do is put "module load python/2.7.8" in your shell launch script (.cshrc from what I see). While this can have undesired side effects, it should solve the problem quickly.
Another option is to use the --prolog flag of SCOOP, allowing to execute a setup shell script on every worker before executing the task.

Hope it can help,
Yannick

Brad Barnhart

unread,
Feb 17, 2016, 10:58:41 AM2/17/16
to scoop...@googlegroups.com, b.ell...@unsw.edu.au
Thanks to everyone for your helpful comments.
Putting "module load python/2.7.8" in the .cshrc file in my home directory solved this error. I will look into using the --prolog flag of SCOOP, as it seems very helpful.
Thanks again for your time,
Brad

--
Vous recevez ce message, car vous êtes abonné à un sujet dans le groupe Google Groupes "scoop-users".

Pour vous désabonner de ce sujet, visitez le site https://groups.google.com/d/topic/scoop-users/X8wL3nzLUZ8/unsubscribe.
Pour vous désabonner de ce groupe et de tous ses sujets, envoyez un e-mail à l'adresse scoop-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages