[slurm-dev] gpu resource sharing (or not sharing) with gang scheduling

Satrajit Ghosh

unread,

Oct 23, 2014, 10:04:16 PM10/23/14

to slurm-dev

hi all,

is there a way to suspend gpu jobs with gang scheduling? or if not, is there a way to ensure that gpu jobs don't enter suspend state?

here is a simplified description of the problem with a hypothetical cluster of one node with 2 gpus and gang scheduling enabled.

timeline

- 4 gpu jobs submitted with --gres=gpu:1

- 2 gpu jobs start running

- 30s later gang scheduling kicks in

- suspends the two running jobs

- the next two jobs are then started

- these two new jobs are terminated because CUDA_DEVICES are not available.

cheers,

satra

je...@schedmd.com

unread,

Oct 24, 2014, 1:25:21 PM10/24/14

to slurm-dev

Quoting Satrajit Ghosh <sa...@mit.edu>:

> hi all,
>
> is there a way to suspend gpu jobs with gang scheduling?

No, GRES are not released on job suspend.

> or if not, is
> there a way to ensure that gpu jobs don't enter suspend state?

Run them from a non-preemptable partition or QOS per your configuration. See:
http://slurm.schedmd.com/preempt.html

> here is a simplified description of the problem with a hypothetical cluster
> of one node with 2 gpus and gang scheduling enabled.
>
> timeline
> - 4 gpu jobs submitted with --gres=gpu:1
> - 2 gpu jobs start running
> - 30s later gang scheduling kicks in
> - suspends the two running jobs
> - the next two jobs are then started
> - these two new jobs are terminated because CUDA_DEVICES are not available.
>
> cheers,
>
> satra

--
Morris "Moe" Jette
CTO, SchedMD LLC

Satrajit Ghosh

unread,

Oct 24, 2014, 4:28:18 PM10/24/14

to slurm-dev

thanks!

cheers,

satra

Reply all

Reply to author

Forward