OS Login issues

436 views
Skip to first unread message

Austin P

unread,
Nov 6, 2019, 5:21:58 PM11/6/19
to google-cloud-slurm-discuss
Hi,

I just set up a slurm cluster on GCP but am having trouble ssh-ing to any of the compute nodes. (I'd like to be able to this to monitor performance while jobs are running)
I do not have external IPs on the compute nodes, and I do have OSLogin enabled. My account has the roles/iam.ServiceAccountUser and roles/compute.osAdminLogin, as specified here.

When I try to ssh to a compute node, I get the following error:
[austi..._g_harvard_edu@slurm4-login1 ~]$ gcloud compute ssh slurm4-compute3
External IP address was not found; defaulting to using IAP tunneling.
ERROR: (gcloud.compute.start-iap-tunnel) Python version 2.7.5 does not support SSL/TLS SNI needed for certificate verification on WebSocket connection.
ssh_exchange_identification: Connection closed by remote host
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

It seems like this is because CentOS is stuck with python 2.7.5. I can install a newer version of python on the login node and specify this for gcloud to use, and I get the following result:

[austi..._g_harvard_edu@slurm4-login1 ~]$ ./miniconda3/bin/python --version
Python 3.7.4
[austi..._g_harvard_edu@slurm4-login1 ~]$ CLOUDSDK_PYTHON=./miniconda3/bin/python gcloud compute ssh slurm4-compute3
External IP address was not found; defaulting to using IAP tunneling.
ERROR: (gcloud.compute.start-iap-tunnel) Error while connecting [4033: 'not authorized'].
ssh_exchange_identification: Connection closed by remote host
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

I originally thought I might need to update the python version used by gcloud on both login and compute nodes, but now I'm not so sure.
Any ideas?

Thanks,
Austin

Joseph Schoonover

unread,
Nov 6, 2019, 5:31:42 PM11/6/19
to Austin P, google-cloud-slurm-discuss
Austin,
I'd recommend starting a job step that enters into a shell, if you absolutely need to ssh into a compute node, e.g.

srun -n1 --pty /bin/bash

If you're looking to monitor how your application utilizes compute nodes, you can use stackdriver to monitor compute node usage ( e.g. CPU utilization ).
If you'd like to profile your application, arm-map is a nice commercial tool. 

The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.



Dr. Joseph Schoonover

Chief Executive Officer

HPC Specialist

j...@fluidnumerics.com








--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/8057bf4e-8687-4365-83cf-43f932671557%40googlegroups.com.

Keith Binder

unread,
Nov 7, 2019, 10:16:31 AM11/7/19
to Austin P, google-cloud-slurm-discuss
Hi Austin,


A couple of questions for you:  

Are you the project owner?

If so, can you try to gcloud ssh via the web console and/or try from cloudshell or your local workstation (not from the login node)?

Thanks
Keith

--

Austin P

unread,
Nov 7, 2019, 12:04:10 PM11/7/19
to google-cloud-slurm-discuss
Thanks for the advice.

I want to run each of my batch jobs with --exclusive, so that slurm doesn't try to schedule multiple on the same host (which was happening even when I specified --nodes=1 and/or -c=2)
But I guess I don't strictly NEED to ssh to each compute node. I'll look into stackdriver to monitor CPU usage.

I am not the project owner, but I am an Editor (with all associated permissions in that bundle), and can ask the Owner for further permissions should I require them.
SSH via console is disabled, and from local workstation won't work since there is no public IP for each compute node.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-discuss+unsub...@googlegroups.com.

Maxime Hugues

unread,
Nov 7, 2019, 12:25:00 PM11/7/19
to Austin P, google-cloud-slurm-discuss
Hi Austin,

Do you mean you can't access this https://ssh.cloud.google.com/cloudshell ?
And do a gcloud compute ssh slurm4-compute3 ?

What happens when you perform this  gcloud compute ssh slurm4-compute3 --internal-ip on the login node ?

Is adding --exclusive to the batch script does not work to allocate an entire compute node to your job on your slurm cluster ?

To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/6a857888-f1da-4ccb-990c-8dfbb5ce2dd7%40googlegroups.com.


--


Maxime Hugues, Ph.D.

HPC Cloud Consultant

737-236-1109 Mobile

500 W 2nd St, Suite 2400, Austin, TX, 78701



Keith Binder

unread,
Nov 7, 2019, 1:11:14 PM11/7/19
to Austin P, google-cloud-slurm-discuss
You can utilize the IAP-proxy capability to SSH into private IP nodes.

As you are not the project owner, this permission is not granted automatically.

You can follow this procedure to grant the role:  IAP-Secured Tunnel User



Keith Binder

kbi...@google.com

Customer Engineer

Mobile: 201-887-6974





To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/6a857888-f1da-4ccb-990c-8dfbb5ce2dd7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages