Could not succesfully launch remote worker

Skip to first unread message

Filip Jorissen

Nov 30, 2017, 11:00:25 AM11/30/17
to scoop-users

I'm running DEAP with scoop on a computer cluster. I'm using the PBS script to start a job and I also tried to start a python/scoop script on one of the nodes that are part of a job. In both cases I get following error message:

[2017-11-30 15:24:33,270] launcher  INFO    SCOOP 0.7 1.1 on linux2 using Python 2.7.10 (default, Dec 14 2016, 15:57:29) [GCC 4.9.2], API: 1013
[2017-11-30 15:24:33,271] launcher  INFO    Detected PBS environment.
[2017-11-30 15:24:33,271] launcher  INFO    Deploying 6 worker(s) over 2 host(s).
[2017-11-30 15:24:33,271] launcher  INFO    Worker distribution:
[2017-11-30 15:24:33,271] launcher  INFO       r2i2n16:    3 + origin
[2017-11-30 15:24:33,271] launcher  INFO       r5i0n8:    2
[2017-11-30 15:24:33,478] workerLaunch ( WARNING Could not successfully launch the remote worker on r5i0n8.
Requested remote group process id, received:

Group id decoding error:
invalid literal
for int() with base 10: ''
SSH process stderr

This is for a job that consists of 2 nodes. The second node cannot be reached, apparently. The first/master node does start running workers. How can I debug this?

Thank you!

Filip Jorissen

Dec 1, 2017, 5:24:21 AM12/1/17
to scoop-users
I installed the latest version of scoop, instead of using pip. This gives different error messages that seem to be related to the python .so not being found, which of course causes a failure of the algorithm on the remote host. So this seems to be something that needs to be sorted out on our end.
Reply all
Reply to author
0 new messages