I tried to deploy the new slurm GCP from the marketplace, and although it deployed without errors, and it created the login and controller machine, when ssh on the login it is always shows: Slurm is currently being configured in the background even 6 hours after deployment.
I tried another deployment but it still stuck with this message.
I created a simple cluster with 3 partitions. I put the config I set up during the deployment from the marketplace.
Should I wait more to get the slurm cluster configured or is there an issue somewhere with my configuration?
Here is the config I used:
compute_node_scopes
compute_node_service_account default
controller_disk_size_gb 200.0
controller_disk_type pd-standard
controller_labels[]
controller_machine_type n2-standard-48
controller_secondary_disk false
controller_secondary_disk_size_gb100.0
controller_secondary_disk_type pd-standard
controller_service_account default
external_compute_ipsfalse
external_controller_ip true
external_login_ips true
login_disk_size_gb 100.0
login_disk_type pd-standard
login_labels []
login_machine_type n2-standard-48
login_network_storage []
login_node_count1.0
login_node_service_account default
network_storage []
partitions
- compute_disk_size_gb: 30.0
compute_disk_type: pd-standard
compute_labels: []
enable_placement: false
exclusive: false
gpu_count: 0.0
image_hyperthreads: false
machine_type: n1-standard-1
max_node_count: 10000.0
name: p1cpu
network_storage: []
preemptible_bursting: true
region: us-central1
static_node_count: 0.0
vpc_subnet:
default
zone: us-central1-a
- compute_disk_size_gb: 50.0
compute_disk_type: pd-standard
compute_labels: []
enable_placement: false
exclusive: false
gpu_count: 0.0
image_hyperthreads: false
machine_type: n2-standard-8
max_node_count: 10000.0
name: p8cpu
network_storage: []
preemptible_bursting: true
region: us-central1
static_node_count: 0.0
vpc_subnet: default
zone: us-central1-a
- compute_disk_size_gb: 150.0
compute_disk_type: pd-standard
compute_labels: []
enable_placement: false
exclusive: false
gpu_count: 0.0
image_hyperthreads: false
machine_type: n2-standard-16
max_node_count: 10000.0
name: p16cpu
network_storage: []
preemptible_bursting: true
region: us-central1
static_node_count: 0.0
vpc_subnet: default
zone: us-central1-a
private_google_accesstrue
suspend_time300.0
zoneus-central1-a