[slurm-users] Checking memory requirements in job_submit.lua

401 views
Skip to first unread message

Prentice Bisbal

unread,
Jun 13, 2018, 2:00:06 PM6/13/18
to slurm...@lists.schedmd.com
In my environment, we have several partitions that are 'general access',
with each partition providing different hardware resources (IB, large
mem, etc). Then there are other partitions that are for specific
departments/projects. Most of this configuration is historical, and I
can't just rearrange the partition layout, etc, which would allow Slurm
to apply it's own logic to redirect jobs to the appropriate nodes.

For the general access partitions, I've decided apply some of this logic
in my job_submit.lua script. This logic would look at some of the job
specifications and change the QOS/Partition for the job as appropriate.
One thing I'm trying to do is have large memory jobs be assigned to my
large memory partition, which is named mque for historical reasons.

To do this, I have added the following logic to my job_submit.lua script:

if job_desc.pn_min_mem > 65536 then
    slurm.user_msg("NOTICE: Partition switched to mque due to memory
requirements.")
    job_desc.partition = 'mque'
    job_desc.qos = 'mque'
    return slurm.SUCCESS
end

This works when --mem is specified, doesn't seem to work when
--mem-per-cpu is specified. What is the best way to check this when
--mem-per-cpu is specified instead? Logically, one would have to calculate

mem per node = ntasks_per_node * ( ntasks_per_core / min_mem_per_cpu )

Is correct? If so, are there any flaws in the logic/variable names
above? Also, is this quantity automatically calculated in Slurm by a
variable that is accessible by job_submit.lua at this point, or do I
need to calculate this myself?


--
Prentice


Prentice Bisbal

unread,
Jun 14, 2018, 1:39:20 PM6/14/18
to slurm...@lists.schedmd.com
I've given up on calculating mem per node when --mem-per-cpu is
specified. I was hoping to do this to protect my users from themselves,
but the more I think about this, the more this looks like a fool's errand.

Prentice


Hendryk Bockelmann

unread,
Jun 15, 2018, 2:08:33 AM6/15/18
to slurm...@lists.schedmd.com
Hi,

based on information given in job_submit_lua.c we decided not to use
pn_min_memory any more. The comment in src says:

/*
* FIXME: Remove this in the future, lua can't handle 64bit
* numbers!!!. Use min_mem_per_node|cpu instead.
*/

Instead we check in job_submit.lua for s,th, like

if (job_desc.min_mem_per_node ~= nil) and
(job_desc.min_mem_per_node == 0) then
slurm.log_user("minimum real mem per node specified as %u",
job_desc.min_mem_per_node)
end

For mem-per-cpu things are more confusing. Somehow min_mem_per_cpu =
2^63 = 0x8000000000000000 if sbatch/salloc does not set --mem-per-cpu,
instead of being nil as expected !
But one can still check for

if (job_desc.min_mem_per_cpu == 0) then
slurm.log_user("minimum real mem per CPU specified as %u",
job_desc.min_mem_per_cpu)
end

Maybe this helps a bit.

CU,
Hendryk

Bjørn-Helge Mevik

unread,
Jun 18, 2018, 7:29:15 AM6/18/18
to slurm...@schedmd.com
Prentice Bisbal <pbi...@pppl.gov> writes:

> if job_desc.pn_min_mem > 65536 then
>     slurm.user_msg("NOTICE: Partition switched to mque due to memory
> requirements.")
>     job_desc.partition = 'mque'
>     job_desc.qos = 'mque'
>     return slurm.SUCCESS
> end

Somewhat off-topic, but: So, does slurm.user_msg() now actually print a
message to users when one returns with slurm.SUCCESS? (It didn't use to
work.) Which version of slurm are you running?

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc

Prentice Bisbal

unread,
Jan 14, 2019, 3:13:49 PM1/14/19
to slurm...@lists.schedmd.com
Bjorn,

Sorry for the delayed reply. I didn't see this earlier (6 months ago!).
I'm just seeing it now as I clean up my inbox.

1. Yes, slurm.user_msg does actually print out a message to the user in
this case.

2. I was running 17.11.4 or 17.11.5 at the time. I've since upgraded to
18.08.

--

Prentice

Bjørn-Helge Mevik

unread,
Jan 15, 2019, 3:11:27 AM1/15/19
to Slurm User Community List
Prentice Bisbal <pbi...@pppl.gov> writes:

> Sorry for the delayed reply. I didn't see this earlier (6 months
> ago!). I'm just seeing it now as I clean up my inbox.

That happens for me as well from time to time. :)

> 1. Yes, slurm.user_msg does actually print out a message to the user
> in this case.
>
> 2. I was running 17.11.4 or 17.11.5 at the time. I've since upgraded
> to 18.08.

Thanks for the info! We just upgraded to 18.08 yesterday, so this is
good news.

--
Cheers,
signature.asc
Reply all
Reply to author
Forward
0 new messages