[slurm-users] Can SLURM queue different jobs to start concurrently?

6 views
Skip to first unread message

Dan Healy via slurm-users

unread,
Jul 8, 2024, 4:20:00 PM (7 days ago) Jul 8
to slurm...@schedmd.com
Hi there,

I've received a question from an end user, which I presume the answer is "No", but would like to ask the community first. 

Scenario: The user wants to create a series of jobs that all need to start at the same time. Example: there are 10 different executable applications which have varying CPU and RAM constraints, all of which need to communicate via TCP/IP. Of course the user could design some type of idle/statusing mechanism to wait until all jobs are randomly started, then begin execution, but this feels like a waste of resources. The complete execution of these 10 applications would be considered a single simulation. The goal would be to distribute these 10 applications across the cluster and not necessarily require them all to execute on a single node.

Is there a good architecture for this using SLURM? If so, please kindly point me in the right direction.

--
Thanks,

Daniel Healy

Lloyd Brown via slurm-users

unread,
Jul 8, 2024, 4:34:36 PM (7 days ago) Jul 8
to slurm...@lists.schedmd.com

I'm confused.  Why can't they just use a multi-node job, and have the job script farm out the individual tasks to the various workers through some mechanism (srun, mpirun, ssh, etc.)?  AFAIK, there's nothing preventing a job from using resources on multiple hosts.  The job just needs to have some way of pushing the work out to those hosts.

Lloyd

-- 
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://rc.byu.edu

Davide DelVento via slurm-users

unread,
Jul 8, 2024, 4:36:01 PM (7 days ago) Jul 8
to Dan Healy, slurm...@schedmd.com
I think the best way to do it would be to schedule the 10 things to be a single slurm job and then use some of the various MPMD ways (the nitty gritty details depend if each executable is serial, OpenMP, MPI or hybrid).

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Mike Robbert via slurm-users

unread,
Jul 8, 2024, 4:49:38 PM (7 days ago) Jul 8
to Dan Healy, slurm...@schedmd.com

Dan,

The requirement for varying CPU and RAM requirements sounds like it could be met with the Heterogeneous Jobs feature (https://slurm.schedmd.com/heterogeneous_jobs.html) of Slurm. Take a look at that document and see if it meets your needs.

 

Mike Robbert

Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research Computing

Information and Technology Solutions (ITS)

303-273-3786mrob...@mines.edu  

A close up of a sign

Description automatically generated

 

On 7/8/24, 14:20, "Dan Healy via slurm-users" <slurm...@lists.schedmd.com> wrote:

 

CAUTION: This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe.

Reply all
Reply to author
Forward
0 new messages