Slurm GCP market place: Slurm is currently being configured in the background

260 views
Skip to first unread message

Will

unread,
Apr 23, 2021, 9:24:21 PM4/23/21
to google-cloud-slurm-discuss
Hi all,

I tried to deploy the new slurm GCP from the marketplace, and although it deployed without errors, and it created the login and controller machine, when ssh on the login it is always shows: Slurm is currently being configured in the background even 6 hours after deployment. 
I tried another deployment but it still stuck with this message.

I created a simple cluster with 3 partitions. I put the config I set up during the deployment from the marketplace. 
Should I wait more to get the slurm cluster configured or is there an issue somewhere with my configuration?

TIA!

Here is the config I used:
compute_node_scopes
compute_node_service_account default
controller_disk_size_gb 200.0
controller_disk_type pd-standard
controller_labels[]
controller_machine_type n2-standard-48
controller_secondary_disk false
controller_secondary_disk_size_gb100.0
controller_secondary_disk_type pd-standard
controller_service_account default
external_compute_ipsfalse
external_controller_ip true
external_login_ips true
login_disk_size_gb 100.0
login_disk_type pd-standard
login_labels []
login_machine_type n2-standard-48
login_network_storage []
login_node_count1.0
login_node_service_account default
network_storage []
partitions
- compute_disk_size_gb: 30.0 
 compute_disk_type: pd-standard 
 compute_labels: [] 
 enable_placement: false 
 exclusive: false 
 gpu_count: 0.0 
 image_hyperthreads: false 
 machine_type: n1-standard-1 
 max_node_count: 10000.0 
 name: p1cpu 
 network_storage: [] 
 preemptible_bursting: true 
 region: us-central1 
 static_node_count: 0.0 
 vpc_subnet: 
 default zone: us-central1-a 
- compute_disk_size_gb: 50.0 
 compute_disk_type: pd-standard 
 compute_labels: [] 
 enable_placement: false 
 exclusive: false 
 gpu_count: 0.0 
 image_hyperthreads: false 
 machine_type: n2-standard-8 
 max_node_count: 10000.0 
 name: p8cpu 
 network_storage: [] 
 preemptible_bursting: true 
 region: us-central1 
 static_node_count: 0.0 
 vpc_subnet: default 
 zone: us-central1-a 
- compute_disk_size_gb: 150.0 
 compute_disk_type: pd-standard 
 compute_labels: [] 
 enable_placement: false 
 exclusive: false 
 gpu_count: 0.0 
 image_hyperthreads: false 
 machine_type: n2-standard-16 
 max_node_count: 10000.0 
 name: p16cpu 
 network_storage: [] 
 preemptible_bursting: true 
 region: us-central1 
 static_node_count: 0.0 
 vpc_subnet: default 
 zone: us-central1-a
private_google_accesstrue
suspend_time300.0
zoneus-central1-a

Nick Ihli

unread,
Apr 26, 2021, 2:03:32 PM4/26/21
to Will, google-cloud-slurm-discuss
Will,

It should only take a minute or two at most, so something is definitely going on. Can you provide your /root/image-scripts/setup.log?

We can take a look at it and see what might be not getting configured/setup.

Thanks,
Nick





Nick Ihli
Director, Cloud and Sales Engineering
ni...@schedmd.com


--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/0a3531fe-5838-4dc7-8d85-80643912456dn%40googlegroups.com.

Nick Ihli

unread,
Apr 27, 2021, 4:14:32 PM4/27/21
to google-cloud-slurm-discuss
Will,

This would be the better log actually.

sudo journalctl -o cat -u google-startup-scripts

Please provide that and we can look into the issue.

Thanks,
Nick

Will

unread,
May 25, 2021, 10:43:04 AM5/25/21
to google-cloud-slurm-discuss
Hi Nick,

Thanks for your answer!
I found the issue which was on my side. Indeed, I had some network configuration issues that seemed to affect the deployment slurm. 
After fixing my network and firewall settings it works very well!
Thanks!
Will
Reply all
Reply to author
Forward
0 new messages