SSH process stderr

86 views
Skip to first unread message

Devarajulu Deva

unread,
Apr 11, 2016, 7:36:49 AM4/11/16
to scoop-users
Hi Good evening Yannick,
I am attempting to use SCOOP with a SGE-based cluster, but am encountering some problems. I am try to fire jobs in remoter server, yesterday  I was able to fire jobs from "system1" today I fallowed same thing what I did yesterday but I could not fire jobs from "system2".

$ /root/da/sw/play/python/src/python/2.7.6/bin/python2.7 -m scoop --host 192.168.10.108 -n 6 -vv /root/da/rel/scripts/scoop_fireJob.py calibre_drc.sh
[2016-04-11 16:44:33,039] launcher  INFO    SCOOP 0.7 1.1 on linux2 using Python 2.7.6 (default, Apr  6 2016, 12:39:21) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)], API: 1013
[2016-04-11 16:44:33,039] launcher  INFO    Deploying 6 worker(s) over 1 host(s).
[2016-04-11 16:44:33,039] launcher  DEBUG   Using hostname/ip: "192.168.10.108" as external broker reference.
[2016-04-11 16:44:33,039] launcher  DEBUG   The python executable to execute the program with is: /root/da/sw/play/python/src/python/2.7.6/bin/python2.7.
[2016-04-11 16:44:33,040] launcher  INFO    Worker distribution:
[2016-04-11 16:44:33,040] launcher  INFO       192.168.10.108:    5 + origin
[2016-04-11 16:44:33,040] brokerLaunch DEBUG   Launching remote broker: ssh -x -n -oStrictHostKeyChecking=no 192.168.10.108 /root/da/sw/play/python/src/python/2.7.6/bin/python2.7 -m scoop.broker.__main__ --echoGroup --echoPorts --backend ZMQ
ERROR:root:Error while launching SCOOP subprocesses:
ERROR:root:Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/scoop/launcher.py", line 479, in main
    rootTaskExitCode = thisScoopApp.run()
  File "build/bdist.linux-x86_64/egg/scoop/launcher.py", line 260, in run
    backend=self.backend,
  File "build/bdist.linux-x86_64/egg/scoop/launch/brokerLaunch.py", line 157, in __init__
    "SSH process stderr:\n{stderr}".format(**locals()))
Exception: Could not successfully launch the remote broker.
Requested remote broker ports, received:
..............................................

Port number decoding error:
need more than 1 value to unpack
SSH process stderr:
Killed by signal 15.


[2016-04-11 16:44:33,487] launcher  INFO    Finished cleaning spawned subprocesses.
INFO:launcherLogger:Finished cleaning spawned subprocesses

Please help me.

Yannick Hold-Geoffroy

unread,
Apr 18, 2016, 10:46:05 PM4/18/16
to scoop-users
Hello,

There seems to be something in your .bashrc or similar that outputs something to the shell before giving the prompt (".............................................." in this case, and probably something after). When launching a broker on a remote system, SCOOP needs to read the port of the remotely launched broker to work, so the broker writes it to its stdout as soon as it launches, and the launcher reads it. If there is something written to the stdout before the broker writes its port, the launcher will try to interpret it as the port and will give a crash as you saw.

There are two ways around your problem: either 1) remove everything that emits something to in stdout in your .bashrc (or similar) or 2) what I recommend would be to let the broker on the node you perform the launch. You may dispatch 0 workers to this node 0 so it gets no job done, but this will simplify the launch of the broker (communication system), solving your issue.

Hope it helps,
Yannick Hold
Message has been deleted

Devarajulu Deva

unread,
Apr 22, 2016, 9:18:01 AM4/22/16
to scoop-users
Hi Good Evening Yannick,
As per your suggestion I modifying my .cshrc now it is working fine, earlier in my .cshrc has some echo statements are there, that could be a problem.
Thanks for your valuable response.
Reply all
Reply to author
Forward
0 new messages