[slurm-dev] Can't get formatted sinfo to work...

5 views
Skip to first unread message

Belgin, Mehmet

unread,
Jun 16, 2017, 2:32:35 PM6/16/17
to slurm-dev
I’m troubleshooting an issue that causes NHC to fail to offline a bad node. The node offline script uses formatted “sinfo" to identify the node status, which returns blank for some reason. Interestingly, sinfo works without custom formatting. 

Could this be due to a bug in the current version (17.02.4)? Would someone mind trying the following commands in an older Slurm version to compare the output? 

[root@devel-vcomp1 nhc]# sinfo --version
slurm 17.02.4

[root@devel-vcomp1 nhc]# sinfo -o '%t %E' -hn `hostname`

(NOTHING!)

[root@devel-vcomp1 nhc]# sinfo -hn `hostname`
test         up   infinite      0    n/a
vtest*       up   infinite      0    n/a

(OK)

Thanks!

-Mehmet


=========================================
Mehmet Belgin, Ph.D.
Scientific Computing Consultant 
Partnership for an Advanced Computing Environment (PACE)
Georgia Institute of Technology
258 4th Street NW, Rich Building, #326 
Atlanta, GA  30332-0700
Office: (404) 385-0665



Loris Bennett

unread,
Jun 19, 2017, 2:28:46 AM6/19/17
to slurm-dev

Hi Mehmet,

"Belgin, Mehmet" <mehmet...@oit.gatech.edu> writes:

> I’m troubleshooting an issue that causes NHC to fail to offline a bad
> node. The node offline script uses formatted “sinfo" to identify the
> node status, which returns blank for some reason. Interestingly, sinfo
> works without custom formatting.
>
> Could this be due to a bug in the current version (17.02.4)? Would
> someone mind trying the following commands in an older Slurm version
> to compare the output?
>
> [root@devel-vcomp1 nhc]# sinfo --version
> slurm 17.02.4
>
> [root@devel-vcomp1 nhc]# sinfo -o '%t %E' -hn `hostname`
>
> (NOTHING!)
>
> [root@devel-vcomp1 nhc]# sinfo -hn `hostname`
> test up infinite 0 n/a
> vtest* up infinite 0 n/a
>
> (OK)
>
> Thanks!
>
> -Mehmet
>

Seem to work as expected with our version:

[root@node003 ~]# sinfo --version
slurm 16.05.10-2
[root@node003 ~]# sinfo -o '%t %E' -hn `hostname`
mix none
[root@node003 ~]# sinfo -hn `hostname`
test up 3:00:00 0 n/a
main* up 14-00:00:0 1 mix node003
gpu up 14-00:00:0 0 n/a

HTH,

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Belgin, Mehmet

unread,
Jun 19, 2017, 11:09:28 AM6/19/17
to slurm-dev
Thank you Loris, it was my bad. I should have used the short hostname, which seems to be working for me as well:

$ sinfo -o '%t %E' -hn `hostname -s`
$ drain Testing

g...@cines.fr

unread,
Jun 19, 2017, 12:16:52 PM6/19/17
to slurm-dev
Hello,

I'm using job_submit plugin (C langage) to manage users job submission on ours systems.

I would like to print an information message at the terminal each time a job is submitted and I don't find how to do.



It works fine in case of error message using err_msg parameter of the job_submit function :

extern int job_submit(struct job_descriptor *job_desc, uint32_t submit_uid, char **err_msg)




Best regards,
Gerard Gil


Centre Informatique National de l'Enseignement Superieur
950, rue de Saint Priest
34097 Montpellier CEDEX 5
FRANCE


Marcin Stolarek

unread,
Jun 19, 2017, 4:52:17 PM6/19/17
to slurm-dev
I don't think it's possible to print a message if job is accepted to the queue..

TO_Webmaster

unread,
Jun 19, 2017, 5:54:53 PM6/19/17
to slurm-dev

Yes, unfortunately, this does not work out of the box. We now use the
AdminComment field to save the information and a small wrapper script
around sbatch that prints the admin comment field if submission
succeeded.

Ole Holm Nielsen

unread,
Jun 20, 2017, 3:50:55 AM6/20/17
to slurm-dev, Belgin, Mehmet

Hi Mehmet,

Perhaps you need to configure NHC to use the short hostname, see the
example in
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-health-check

/Ole

On 06/19/2017 05:09 PM, Belgin, Mehmet wrote:
> Thank you Loris, it was my bad. I should have used the short hostname,
> which seems to be working for me as well:
>
> $ sinfo -o '%t %E' -hn `hostname -s`
> $ drain Testing
>
>
>
>> On Jun 19, 2017, at 2:28 AM, Loris Bennett <loris....@fu-berlin.de
>> <mailto:loris....@fu-berlin.de>> wrote:
>>
>>
>> Hi Mehmet,
>>
>> "Belgin, Mehmet" <mehmet...@oit.gatech.edu
>> Emaillori...@fu-berlin.de
>> <mailto:loris....@fu-berlin.de>
>

Manuel Rodríguez Pascual

unread,
Jun 20, 2017, 6:18:45 AM6/20/17
to slurm-dev

It's maybe a bit of a hack, but I guess it could be done with a Spank
plugin. Just put whatever you want to print on the spank_init method,
as it is called every time a job is submitted. As a drawback, it would
also be printed with srun, and maybe when starting slurmctld (not a
big deal anyway).

Belgin, Mehmet

unread,
Jun 20, 2017, 10:20:15 AM6/20/17
to slurm-dev, slurm-dev, Belgin, Mehmet
Hi Ole, this certainly helps and I’ve found several other gems in this wiki that are extremely useful!

Thanks for sharing.
Reply all
Reply to author
Forward
0 new messages