batching jobs from cloud functions

102 views
Skip to first unread message

kuber

unread,
Jul 29, 2020, 11:40:37 AM7/29/20
to google-cloud-slurm-discuss
hi,

i followed without problems the tutorial on setting up my first slum cluster on google cloud platform.
once logged in the slurm login virtual machine instance, batching jobs with sbatch was not a problem either.

my question is: how can i batch jobs from inside a node.js cloud function (that i control via REST) ?

if i spawn a command like:

gcloud compute ssh --zone <my-zone> <my-cluster> --tunnel-through-iap --project <my-project> --command="sbatch <my-script>"

i receive an error because "gcloud" is not present on the cloud function environment.

it's ok for me to ssh'd on the login machine and run the command, but i cannot manage to have a proper external ip for the that instance.

can you please suggest the right way to batch jobs from inside cloud functions?

thank you!

Ward Harold

unread,
Jul 29, 2020, 11:48:07 AM7/29/20
to kuber, google-cloud-slurm-discuss
A simpler, but similar, approach might be to use Cloud Run rather than Cloud Functions. Using Cloud Run you can build a container that has the Cloud SDK installed so that you can use 'gcloud compute ssh ...' 

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/95680575-ef26-431d-b07c-c2cc8edd417fo%40googlegroups.com.


--
... WkH
Ward Harold | Solutions Architect | w...@google.com | 512-751-9198

Marco Di Benedetto

unread,
Jul 29, 2020, 1:56:10 PM7/29/20
to Ward Harold, google-cloud-slurm-discuss
Thank you, Ward.

Your solution is quite simple and fast :)

I am thinking: isn't there a more direct solution, without putting into place another VM? Some more insight on how to communicate via a standard http message (possibly through cloud functions) ?

Thanks again,
M.

Ward Harold

unread,
Jul 29, 2020, 2:36:24 PM7/29/20
to Marco Di Benedetto, google-cloud-slurm-discuss
With the latest version of Slurm there is a daemon that provides an HTTP API. I don't have hand-on experience with it (yet) but I believe you could make HTTP requests to it from your Cloud Function.

Also, note that Cloud Run doesn't actually create another VM. The container gets activated on the underlying infrastructure, either Google's managed infrastructure or your own GKE cluster. 

Keith Binder

unread,
Jul 29, 2020, 2:42:31 PM7/29/20
to Ward Harold, Marco Di Benedetto, google-cloud-slurm-discuss

As Ward mentions, there is a slurmrestd (requires version 20 of slurm to be installed), which can be used to make rest calls.


This presentation from the annual Slurm Users Group provides some insight.


Marco Di Benedetto

unread,
Jul 31, 2020, 12:27:30 PM7/31/20
to Keith Binder, Ward Harold, google-cloud-slurm-discuss
thank you all for your answers,

i am using the gcp slum version, so enabling slurmrestd seems one right choice.

i modified the slurm_version parameter of slurm-cluster.yaml to the latest one (20.02), but i still have some problems.
basically, there seems that the process of deploying the cluster hangs somewhere, the slurm cluster results "still installing" for an hour or so.

what am i doing wrong?

then i checked the installation script, precisely script at scripts/setup.py
i replaced line 521 from:
util.run("../configure --prefix={} --sysconfdir={}/etc"
to
util.run("../configure --prefix={} --sysconfdir={}/etc" --enable-slurmrestd 

but the process hanged, everything still locked to "installing slurm" (or something like that).
of course i am doing something wrong.

can you please instruct me on the right process?
or maybe the latest slurm for gcp is not still ready for slurmrestd?

thank you so much,
m.





--
Marco Di Benedetto, Ph.D.

Co-Founder at Transform and Lighting S.r.l.

Researcher at Italian National Research Council (CNR)

Brian Christiansen

unread,
Aug 3, 2020, 11:26:25 AM8/3/20
to Marco Di Benedetto, Keith Binder, Ward Harold, google-cloud-slurm-discuss
You can try this branch that we are testing:

e.g.
export $(/apps/slurm/current/bin/scontrol token lifespan=999)
curl  -s -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_JWT http://localhost/slurm/v0.0.35/diag

Marco Di Benedetto

unread,
Aug 5, 2020, 5:58:18 AM8/5/20
to Brian Christiansen, Keith Binder, Ward Harold, google-cloud-slurm-discuss
thank you very much brian, it seems i did not have a look at the git branches.
i'll try using it, after i figure out how to connect from a cloud function to the compute node.

thanks,
m.

Reply all
Reply to author
Forward
0 new messages