[slurm-users] How to get command from a finished job

27 views
Skip to first unread message

Gestió Servidors

unread,
Apr 30, 2020, 3:40:09 AM4/30/20
to slurm...@lists.schedmd.com

Hello,

 

I would like to know if there exist any way to get the same information I can get from a running or pending job in the queue with “scontrol show jobid=XXXX” when the job has finished. When it has finished, “scontrol show jobid=XXXX” doesn’t work and “sacct -j jobid” doesn’t show all the information I need. For example, with “scontrol show jobid” I can know what command has been submited, its workir, the stderr file and the stdout one. This information, I think, cannot be obtained when the job is finished and I run “sacct”.

 

Thanks.

Paul Edmon

unread,
Apr 30, 2020, 8:52:41 AM4/30/20
to slurm...@lists.schedmd.com

No, that data is purged from the scheduler after completion.  So records of the job exist in your job completion log or in the sacct database.  The script that it ran is not saved, though I believe there are several bug requests in to SchedMD to add that feature.  People have come up with various home grown solutions to save that data.

You could always increase the length of time the scheduler keeps that data after completion by increasing MinJobAge:

MinJobAge
The minimum age of a completed job before its record is purged from Slurm's active database. Set the values of MaxJobCount and to ensure the slurmctld daemon does not exhaust its memory or other resources. The default value is 300 seconds. A value of zero prevents any job record purging. Jobs are not purged during a backfill cycle, so it can take longer than MinJobAge seconds to purge a job if using the backfill scheduling plugin. In order to eliminate some possible race conditions, the minimum non-zero value for MinJobAge recommended is 2.

-Paul Edmon-

Luca Capello

unread,
Apr 30, 2020, 9:07:12 AM4/30/20
to slurm...@lists.schedmd.com
Hi there,

On 4/30/20 2:52 PM, Paul Edmon wrote:
> No, that data is purged from the scheduler after completion. So records of the job exist in your job completion log or in the sacct database. The script that it ran is not saved, though I believe there are several bug requests in to SchedMD to add that feature. People have come up with various home grown solutions to save that data.

For example, SArchive was presented at this year FOSDEM:

<https://fosdem.org/2020/schedule/event/job_script_archival/>

Thx, bye,
Luca

--
Dr. Luca Capello
Ingénieur HPC
Division du Système et des Technologies de l'Information et de la Communication
Université de Genève | 24 rue Général-Dufour
Tél +41 22 379 72 42 | Bureau 151
https://hpc-community.unige.ch
mailto:luca.c...@unige.ch

signature.asc

Bjørn-Helge Mevik

unread,
Apr 30, 2020, 9:35:12 AM4/30/20
to slurm...@schedmd.com
Gestió Servidors <sysadm...@uab.cat> writes:

> For example, with "scontrol show jobid" I can know what command has
> been submited, its workir, the stderr file and the stdout one. This
> information, I think, cannot be obtained when the job is finished and
> I run "sacct".

The workdir is available with sacct, IIRC. For other types of
information, I believe you can add code to your job_submit.lua that stores
it in the job's AdminComment field, which sacct can display.

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc

Ole Holm Nielsen

unread,
Apr 30, 2020, 12:09:32 PM4/30/20
to slurm...@lists.schedmd.com
On 30-04-2020 15:34, Bjørn-Helge Mevik wrote:
> Gestió Servidors <sysadm...@uab.cat> writes:
>
>> For example, with "scontrol show jobid" I can know what command has
>> been submited, its workir, the stderr file and the stdout one. This
>> information, I think, cannot be obtained when the job is finished and
>> I run "sacct".
>
> The workdir is available with sacct, IIRC. For other types of
> information, I believe you can add code to your job_submit.lua that stores
> it in the job's AdminComment field, which sacct can display.

Yes, the command to print only the workdir is:

sacct -j $jobid -nP -o WorkDir

I have added this to my "showjob" command:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs

The other fields Command, StdErr, StdIn, StdOut are apparently not in
the Slurm database, see "man sacct".

/Ole

Luis Huang

unread,
Apr 30, 2020, 12:30:11 PM4/30/20
to slurm...@lists.schedmd.com
We use the elasticsearch plugin. This information is kept in there.
From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Gestió Servidors <sysadm...@uab.cat>
Sent: Thursday, April 30, 2020 3:39:33 AM
To: slurm...@lists.schedmd.com
Subject: [slurm-users] How to get command from a finished job
 

Hello,

 

I would like to know if there exist any way to get the same information I can get from a running or pending job in the queue with “scontrol show jobid=XXXX” when the job has finished. When it has finished, “scontrol show jobid=XXXX” doesn’t work and “sacct -j jobid” doesn’t show all the information I need. For example, with “scontrol show jobid” I can know what command has been submited, its workir, the stderr file and the stdout one. This information, I think, cannot be obtained when the job is finished and I run “sacct”.

 

Thanks.


This message is for the recipient’s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
Reply all
Reply to author
Forward
0 new messages