Get info about past jobs on Slurm?

41 views
Skip to first unread message

Mark Perri

unread,
Jun 24, 2016, 2:49:50 PM6/24/16
to saga-users
Is it possible to us SAGA-Python to get information about past jobs I've run on Slurm? (Jobs that have long finished and that I don't have a job object for anymore)

I'd like to run a new python script, use jobservice.getjob(3052290) and access information about the job's run time, if it was finished / canceled, etc.  Just as if I had run sacct -j 3052290.

Right now I only know how to do that for jobs that are currently running through jobservice.list() or jobs I've created that I still have a handle to the job container.

Thanks,
Mark


Andre Merzky

unread,
Jun 25, 2016, 1:25:03 PM6/25/16
to saga-...@googlegroups.com
Hi Mark,

the slurm job adaptor has some code in place to query job state via
sacct, see [1]. Apparently that is not addressing your use case --
would you mind clarifying it? Is that you can't create a job instance
with the given job ID, for some reason? If you don't mind, share a
small code snippet which reproduces the problem...

thanks, Andre.


[1] https://github.com/radical-cybertools/saga-python/blob/devel/src/saga/adaptors/slurm/slurm_job.py#L1029
> --
> You received this message because you are subscribed to the Google Groups
> "saga-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to saga-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
99 little bugs in the code.
99 little bugs in the code.
Take one down, patch it around.

127 little bugs in the code...

Mark Perri

unread,
Jun 26, 2016, 11:25:56 PM6/26/16
to saga-users
Hi Andre,

Thanks for getting back to me, I didn't realize that I could pass any job ID to service.get_job.  I can call job.get_state() which returns Done correctly.  Is there any way to get the run time and number of cores that were used from that job?  If I try:

        job=jsComet.get_job("[slurm+ssh://comet.sdsc.edu]-[3183494]")
        print job.get_state()
        jd = job.get_description()
        print jd.processes_per_host
        print jd.total_cpu_count

I get:

Done
None
None

Andre Merzky

unread,
Jun 27, 2016, 3:10:42 AM6/27/16
to saga-...@googlegroups.com
Hi Mark,

thanks for the clarification, that certainly makes sense: if sacct
has that information, we should indeed be able to provide this. I'll
opened a feature request ticket for this item at [1]. Alas, I can't
give you a time estimate on when we get around fixing this...

Thanks, Andre.


[1] https://github.com/radical-cybertools/saga-python/issues/562

Mark Santcroos

unread,
Jun 27, 2016, 3:19:20 AM6/27/16
to saga-...@googlegroups.com
Hi Mark,

In addition to Andre's message, a Pull Request would be most welcome too of course!
The main SLURM system I have access to doesn't provide sacct history though.

Mark
Reply all
Reply to author
Forward
0 new messages