I'm also interested in this issue since I've come across the same error today. We built Slurm-18.08.1 with the contribs packages on Ubuntu Bionic and seff is also complaining with
Mike Cammilleri
Systems Administrator
Department of Statistics | UW-Madison
1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu
Hello Mike et al,
This is a known bug in slurm v18.08*. We installed the initial release a short while ago and came across this issue very quickly. We actually use this script at the end of the job epilog to report job efficiency to users, and so it is real shame that it is now broken! The good new is that I am assured by SchedMD that the bug has been fixed in v18.08.3. Having said that we will probably live with this issue rather than disrupt users with another upgrade so soon . We have found a few other minor bugs in the new version of slurm, however I am glad to say that none of them are "life threatening".
If you're keen to have a work around in the meantime then please feel free to use our replacement script, "seff_new" -- a copy is attached with this email. It's not the most elegant of scripts, however it does work.
Best regards,
David
Thanks for this. We'll try the workaround script. It is not mission-critical but our users have gotten accustomed to seeing these metrics at the end of each run and its nice to have. We are currently doing this in a test VM environment, so by the time we actually do the upgrade to the cluster perhaps the fix will be available then.
Mike Cammilleri
Systems Administrator
Department of Statistics | UW-Madison
1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu
Hi and thanks for all your answers and sorry for the delay in my answer. Yesterday I have installed in the controller machine the Slurm-18.08.3 to check if with this last release the Seff command is working fine. The behavior has improve but I still receive a error message:
# /usr/local/slurm-18.08.3/bin/seff 1694112
Use of uninitialized value $lmem in numeric lt (<) at
/usr/local/slurm-18.08.3/bin/seff line 130, <DATA> line
624.
Job ID: 1694112
Cluster: XXXXX
User/Group: XXXXX
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:39:33
CPU Efficiency: 4266.43% of 00:02:20 core-walltime
Job Wall-clock time: 00:01:10
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 3.91 GB (3.91 GB/node)
[root@hydra ~]#
And due to this problem, any job shows me as memory utilized the
value of 0.00 MB.
With slurm-17.11.1 is working fine:
# /usr/local/slurm-17.11.0/bin/seff 1694112
Job ID: 1694112
Cluster: XXXXX
User/Group: XXXXX
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:39:33
CPU Efficiency: 4266.43% of 00:02:20 core-walltime
Job Wall-clock time: 00:01:10
Memory Utilized: 2.44 GB
Memory Efficiency: 62.57% of 3.91 GB
[root@hydra bin]#
Miguel A. Sánchez Gómez System Administrator Research Programme on Biomedical Informatics - GRIB (IMIM-UPF) Barcelona Biomedical Research Park (office 4.80) Doctor Aiguader 88 | 08003 Barcelona (Spain) Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550 e-mail: miguelang...@upf.edu
-- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de