[slurm-users] Job runtime

231 views
Skip to first unread message

Mahmood Naderan

unread,
Mar 14, 2018, 8:05:22 AM3/14/18
to Slurm User Community List
Hi,
I see that slurm reports a 35 min duration for a completed job (g09) like this

[mahmood@rocks7 ~]$ sacct -j 30 --format=start,end,elapsed,time
Start End Elapsed Timelimit
------------------- ------------------- ---------- ----------
2018-03-14T06:07:17 2018-03-14T06:42:30 00:35:13 01:00:00
2018-03-14T06:07:17 2018-03-14T06:42:30 00:35:13


However, the program itself, which logs the run, says

Job cpu time: 0 days 0 hours 48 minutes 5.9 seconds.

the job scripts contains

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2

Is there any idea about that time difference?


Regards,
Mahmood

Shenglong Wang

unread,
Mar 14, 2018, 8:45:59 AM3/14/18
to Slurm User Community List
Gaussian reports CPU time, sacct reports wall time here. Was Gaussian setup to run with 2 CPU cores?

Best,
Shenglong

Mahmood Naderan

unread,
Mar 14, 2018, 10:06:10 AM3/14/18
to Slurm User Community List
I ran again with time command in front of g09.

The console output is

Wed Mar 14 09:15:58 EDT 2018
real 32m14.136s
user 53m56.946s
sys 2m17.855s
Wed Mar 14 09:48:12 EDT 2018


So the wall clock time is 32 minutes roughly.
g09 says

Job cpu time: 0 days 0 hours 47 minutes 56.0 seconds.

If g09 reports user time, then that is different from the time command
(about 5 min difference). On the other hand, slurm says

[mahmood@rocks7 ~]$ sacct -j 32 --format=elapsed,ncpus,cputime,UserCPU
Elapsed NCPUS CPUTime UserCPU
---------- ---------- ---------- ----------
00:32:14 2 01:04:28 53:56.955
00:32:14 2 01:04:28 53:56.955


Slurm also uses time output. But the CPUTime is not clear for me.

Regards,
Mahmood

Chris Samuel

unread,
Mar 17, 2018, 8:38:03 AM3/17/18
to slurm...@lists.schedmd.com
On Thursday, 15 March 2018 1:05:16 AM AEDT Mahmood Naderan wrote:

> Slurm also uses time output. But the CPUTime is not clear for me.

CPU time will generally be less than wall time because it doesn't include time
spent waiting for I/O, etc.

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


Mahmood Naderan

unread,
Mar 17, 2018, 2:48:06 PM3/17/18
to Slurm User Community List
Excuse me but I think that is wrong. "Elapsed" is the wall clock time.
"UserCPU" should be "NCPUS*Elapsed". The "CPUTime" is then not clear
for me. If that is "USERCPU+I/O" and etc, then it should be "user+sys"
as reported by time command. However, in my example the "CPUTime" is
01:04:28 while "user+sys" is about 00:56:00.
Regards,
Mahmood

Chris Samuel

unread,
Mar 18, 2018, 12:13:59 AM3/18/18
to slurm...@lists.schedmd.com
On Sunday, 18 March 2018 5:46:50 AM AEDT Mahmood Naderan wrote:

> Excuse me but I think that is wrong.

I think we're talking at cross-purposes, I thought you were puzzled about why
CPU time was less than the total time in a general context (not purely within
Slurm).

> "Elapsed" is the wall clock time.

Correct.

> "UserCPU" should be "NCPUS*Elapsed".

No, that's CPUTime (as mentioned in the sacct manual page).

UserCPU is a measure of the time that the CPU spent running the program in
user space as reported by the kernel. Note the caveat about it on the sacct
manual page:

https://slurm.schedmd.com/sacct.html

Also note that how you tell Slurm to gather statistics affects its accuracy.
I've pretty much always used the cgroup method as that seems to give better
accuracy to me (especially for memory usage).

All the best,
Chris

Mahmood Naderan

unread,
Mar 18, 2018, 1:31:43 AM3/18/18
to Slurm User Community List
Thanks for the exaplnation Chris. I will read cgroup.

Regards,
Mahmood

Chris Samuel

unread,
Mar 18, 2018, 3:39:53 AM3/18/18
to slurm...@lists.schedmd.com
On Sunday, 18 March 2018 4:30:34 PM AEDT Mahmood Naderan wrote:

> Thanks for the exaplnation Chris. I will read cgroup.

My pleasure! The Slurm docs on it are here:

https://slurm.schedmd.com/cgroups.html

I've been using cgroups for all three abilities (process tracking, task
management and accounting) since 2013 and found it works really well.

Mahmood Naderan

unread,
Apr 13, 2018, 12:28:33 PM4/13/18
to Slurm User Community List
Hi Chris,
I have been confused with the cpu runtime values in the sacct. For a
multinode mpi job, I see these values

[mahmood@rocks7 ~]$ sacct --format=jobid,user,cputime,elapsed,totalcpu,ncpus
JobID User CPUTime Elapsed TotalCPU NCPUS
------------ --------- ---------- ---------- ---------- ----------
24 mahmood 23:01:25 00:11:25 22:57:37 121
24.batch 06:05:20 00:11:25 06:04:16 32
24.0 00:34:15 00:11:25 16:53:21 3


The questions are:
1- Why cputime is greater than totalcpu?
2- Considering cputime, the sum of 24.batch and 24.0 is not equal to 24. Why?

Thanks for your help.


Regards,
Mahmood
Reply all
Reply to author
Forward
0 new messages