Multi-nodes Parallel_stereo with SLURM, SSH connection issue

176 views
Skip to first unread message

Dave Gingras

unread,
Sep 27, 2022, 9:50:15 PM9/27/22
to Ames Stereo Pipeline Support

Hi all,

I am trying to run parallel_stereo on a cluster using SLURM. I have got errors related to SSH (please see below). Whatever I try, I am keep getting ssh-related errors as soon as I use more than one node. I carefully followed the SLURM example provide in the ASP doc. In this doc, it is mentioned that the nodes must be able to communicate with each other over ssh without a password. This is unfortunate, but I do not think our cluster allows that. I ran a simple job that tried to ssh from one node to another and the connection was blocked. Is there any work around for that in ASP? Is parallel_stereo able to use another communication mechanism, different than ssh?

Many thanks!

Best regards,

-        David

 

parallel: Warning: ssh to ib16be-114 only allows for 0 simultaneous logins.

parallel: Warning: You may raise this by changing

parallel: Warning: /etc/ssh/sshd_config:MaxStartups and MaxSessions on ib16be-114.

parallel: Warning: Using only -1 connections to avoid race conditions.

parallel: Warning: ssh to ib16be-116 only allows for 0 simultaneous logins.

parallel: Warning: You may raise this by changing

parallel: Warning: /etc/ssh/sshd_config:MaxStartups and MaxSessions on ib16be-116.

parallel: Warning: Using only -1 connections to avoid race conditions.

parallel: Error: Cannot run any jobs.

Oleg Alexandrov

unread,
Sep 27, 2022, 11:11:47 PM9/27/22
to Dave Gingras, Ames Stereo Pipeline Support
ASP is hard-wired to use ssh because it employs the GNU parallel tool for the parallelization logic. We don't do MPI or sockets, etc. 

You should talk to your system administrator. It doesn't make sense for a cluster not to allow ssh, I think. (BTW, ASP also expects shared storage, such as over NFS.)

--
You received this message because you are subscribed to the Google Groups "Ames Stereo Pipeline Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ames-stereo-pipeline...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ames-stereo-pipeline-support/fb30bef6-3ead-4934-867d-120b00ff0599n%40googlegroups.com.

Dave Gingras

unread,
Sep 28, 2022, 8:55:41 AM9/28/22
to Ames Stereo Pipeline Support
Hi Oleg,
Thanks for your rapid reply. Yes, we have shared storage. I will check with our system administrator.
Thanks again, and have a good day,
- David

Dave Gingras

unread,
Sep 29, 2022, 3:04:22 PM9/29/22
to Ames Stereo Pipeline Support

Hi Oleg,

We found that the problem was not that the cluster blocks SSH as we initially thought. This is rather that the nodes response to none-default SSH ports that change (e.g., 2200, 2201). I scratched my head the whole day trying to figure how to go around that none-default port issue. May you have any clue about that?

I tried to pass a node list that looks like:

 

hostname1:port1

hostname2:port2

Or

-p port1 hostname1

-p port2 hostname2


Please note that knowing the port # is not a problem. I use $cat /app/config/ssh-port.txt, to get it. It is really to pass the ports to parallel_stereo with hope that GNU Parallel would be able to connect to the nodes. From the GNU Parallel doc, I have not seen any hook allowing us to set the SSH port.

Many thanks,

Best,

-        David

Oleg Alexandrov

unread,
Sep 29, 2022, 4:06:25 PM9/29/22
to Dave Gingras, Ames Stereo Pipeline Support
This is a tough one. Not sure how gnu parallel handles ports. You may want to search around. ALSO, Its code is open, written in perl, and we ship it, likely in the libexec or bin directory. 

The latest ASP parallel_stereo program can also pass ssh options to gnu parallel. Not sure that can help. Our tool also has the code shipped with ASP, it is in python. 

If you find a solution, you are welcome to share it. Maybe later we can add an ssh port option.

Here on travel and I would not be able to dig deeper into this. 

Dave Gingras

unread,
Oct 2, 2022, 9:14:44 PM10/2/22
to Ames Stereo Pipeline Support
Hi Oleg,
Thanks for your reply and for the hints. If I find a solution, I will let you know.
Best,
- David

Dave Gingras

unread,
Oct 3, 2022, 1:56:43 PM10/3/22
to Ames Stereo Pipeline Support
Hi Oleg,
I am happy to report that we have a solution to use a none-default ssh port. To do that, the nodes list file to pass to parallel_stereo should look like:

ssh -p port1 hostname1
ssh -p port2 hostname2
...

Instead of the typical:

hostname1
hostname2
...

To build the nodes list, I've used the two lines below:

nodesList=$(mktemp -p $(pwd)) 
srun bash -c 'echo ssh -p $(cat /app/config/ssh-port.txt) $(hostname)' >> $nodesList

It worked for me. Hopefully it would help others.

Best,

- David

Oleg Alexandrov

unread,
Oct 3, 2022, 3:18:32 PM10/3/22
to Dave Gingras, Ames Stereo Pipeline Support
This is a neat solution, thanks! I verified that it works to specify a custom ssh command for each node in parallel_stereo. I added this to the doc at https://stereopipeline.readthedocs.io/en/latest/examples.html (one will need a hard browser reload to see the latest version).

I see that there are a lot more tweaks one can make to the nodes list, based on the --sshlogin option of GNU parallel (https://www.gnu.org/software/parallel/man.html).



Reply all
Reply to author
Forward
0 new messages