[slurm-users] slurmdbd not showing job accounting

1,367 views
Skip to first unread message

Dave Botsch

unread,
Oct 12, 2018, 4:09:08 PM10/12/18
to slurm...@lists.schedmd.com
Hi.

I am setting up a new slurm cluster instance. And I just went through
what I thought were the right steps to get job accounting going with
slurmdbd.

So I know that slurmdbd itself works as I can use the sacctmgr commands
to add users and accounts, and the users cannot run jobs unless I first
add them with sacctmgr .

What's interesting is that sreport is not showing any of the job
information pieces at all.

Eg:

So, I use srun to run a quick test job, and then:

$ sreport cluster UserUtilizationByAccount End=11/01/18

comes back empty.

sacct show information which is evidently still going to
/var/log/slurm_jobacct.log ...

even though in slurm.conf I have:

AccountingStorageType=accounting_storage/slurmdbd


Now what's even more interesting is that as I was writing this email,
the output of sreport has updated to show more info, as if some
scheduled job someplace updated something. I don't see anything in cron,
though. As far as I can tell, the only thing I did was to use sacctmgr
to delete and readd a user.

Any thoughts on what is causing sreport's output to update/not update?

Thanks!


--
********************************
David William Botsch
Programmer/Analyst
@CNFComputing
bot...@cnf.cornell.edu
********************************

Steven Dick

unread,
Oct 14, 2018, 12:32:00 AM10/14/18
to slurm...@lists.schedmd.com
I've found that when creating a new cluster, slurmdbd does not
function correctly right away. It may be necessary to restart
slurmdbd at several points during the slurm installation process to
get everything working correctly.

Also, slurmctld will buffer the accounting data until slurmdbd starts
functioning correctly, so it is possible if you restart slurmdbd you
will find that all your missing accounting data shows up at once.

One of the critical steps is sacctmgr create cluster $clustername
where $clustername matches the ClusterName parameter in slurm.conf

You probably need to restart slurmdbd after doing that to make sure it
takes full effect..

Ole Holm Nielsen

unread,
Oct 14, 2018, 4:10:46 AM10/14/18
to slurm...@lists.schedmd.com
On 14-10-2018 06:30, Steven Dick wrote:
> I've found that when creating a new cluster, slurmdbd does not
> function correctly right away. It may be necessary to restart
> slurmdbd at several points during the slurm installation process to
> get everything working correctly.
>
> Also, slurmctld will buffer the accounting data until slurmdbd starts
> functioning correctly, so it is possible if you restart slurmdbd you
> will find that all your missing accounting data shows up at once.
>
> One of the critical steps is sacctmgr create cluster $clustername
> where $clustername matches the ClusterName parameter in slurm.conf

Correct, and this is documented in the Slurm accounting setup page:
https://slurm.schedmd.com/accounting.html#database-configuration

/Ole

Steven Dick

unread,
Oct 14, 2018, 6:55:51 AM10/14/18
to Ole.H....@fysik.dtu.dk, slurm...@lists.schedmd.com
It is documented that you need to create the cluster in the database.

It is not documented that the accounting system won't work until you
restart slurmdbd multiple times before it starts collecting accounting
records.

Also, none of the necessary restarts are needed on an upgrade -- only
when slurm is initialized for a new cluster.

Antony Cleave

unread,
Oct 14, 2018, 7:08:54 AM10/14/18
to Slurm User Community List
I have noticed on several clusters that sreport can be upto one hour out of date i.e. it will update on the hour every hour.

sacct does not behave this way and is always up to date.

I cannot see this stated in the docs or see any config settings to control this but it happens on the last 17.02 cluster I checked.

Antony 

Ole Holm Nielsen

unread,
Oct 14, 2018, 7:50:06 AM10/14/18
to slurm...@lists.schedmd.com
On 14-10-2018 12:54, Steven Dick wrote:
> It is documented that you need to create the cluster in the database.
>
> It is not documented that the accounting system won't work until you
> restart slurmdbd multiple times before it starts collecting accounting
> records.
>
> Also, none of the necessary restarts are needed on an upgrade -- only
> when slurm is initialized for a new cluster.

Interesting! It would be good to file a bug report with SchedMD on this
problem. I guess one would need to carefully document the creation of a
new cluster and prove in what way accounting records are absent, and
what difference it makes when slurmdbd is restarted repeatedly. Are you
up for this task?

/Ole

Dave Botsch

unread,
Oct 14, 2018, 4:50:18 PM10/14/18
to Slurm User Community List
This seems to reflect what I am seeing. Someone earlier mentioned
multiple restarts of slurmdbd... those restarts never made data appear
unless right around on the hour.

It's as if instead of data getting sent right through slurmdbd that
something in slurmdbd is just doing an hourly check of the text based
sacct records (which I don't understand why those are even there if not
configured in slurm.conf).

Thanks.

Douglas Jacobsen

unread,
Oct 14, 2018, 5:19:37 PM10/14/18
to Slurm User Community List
Sreport shows data that is summarized hourly. Restarting slurmdbd can delay this process.  If some jobs are missing end records it can massively slow the process because it may need to pick a much earlier start time in the past to summarize.

Sacctmgr show runawayjobs can help identify if you are in this situation
--
Sent from Gmail Mobile

Nathan Harper

unread,
Oct 14, 2018, 5:23:39 PM10/14/18
to Slurm User Community List
Check firewall rules or network comms in both directions. We had an issue with asymmetric routing between our slurmdbd and slurmctld and so connections could only be initiated one way. However, restarting slurmdbd would restart the connection and resync the latest state (or something like that, it was a few years ago)

Dave Botsch

unread,
Oct 14, 2018, 5:40:29 PM10/14/18
to Slurm User Community List
Not following. Both are running on the same host.

Thanks.

Dave Botsch

unread,
Oct 14, 2018, 5:49:52 PM10/14/18
to Slurm User Community List
So the only mention of this I can find is a "rollup" mention in
https://slurm.schedmd.com/slurmdbd.conf.html , now that I specificaly
googled for "slurmdbd rollup" .

So if this hourly summary is the right behaviour, I'd request it be
better documented -- nothing at all is mentioned in
https://slurm.schedmd.com/accounting.html

Thanks.

Chris Samuel

unread,
Oct 17, 2018, 5:51:06 AM10/17/18
to slurm...@lists.schedmd.com
On Sunday, 14 October 2018 3:30:39 PM AEDT Steven Dick wrote:

> I've found that when creating a new cluster, slurmdbd does not
> function correctly right away. It may be necessary to restart
> slurmdbd at several points during the slurm installation process to
> get everything working correctly.

That's... odd. I've never seen that.

Worth trying by hand on a clean install running slurmdbd like this:

slurmdbd -Dvvv

to see if there's anything obvious showing up in the debug logs to indicate
some problems.

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC




Reply all
Reply to author
Forward
0 new messages