[slurm-users] SlurmDB Archive settings?

1,017 views
Skip to first unread message

Timony, Mick

unread,
Jul 13, 2022, 4:56:05 PM7/13/22
to slurm...@lists.schedmd.com
Hi Slurm Users,

Currently we don't archive our SlurmDB and have 6 years' worth of data in our SlurmDB. We are looking to start archiving our database as it starting to get rather large, and we have decided to keep 2 years' worth of data. I'm wondering what approaches or scripts other groups use.

The docs refer to the ArchiveScript setting at:
https://slurm.schedmd.com/slurmdbd.conf.html#OPT_ArchiveScript

I've seen suggestions to import into another database that will require keeping the schema up-to-date which seems like a possible maintenance issue or nightmare if one forgets to update the schema after updating Slurmdb. We also have most of the information in an Elasticsearch instance, which will likely suite our needs for long term historical information.


What do you use to archive this information? CSV files, SQL dumps or something else?


Regards
-- 
Mick Timony
Senior DevOps Engineer
Harvard Medical School
--

Ole Holm Nielsen

unread,
Jul 14, 2022, 2:39:05 AM7/14/22
to slurm...@lists.schedmd.com
On 7/13/22 22:55, Timony, Mick wrote:
> Currently we don't archive our SlurmDB and have 6 years' worth of data in
> our SlurmDB. We are looking to start archiving our database as it starting
> to get rather large, and we have decided to keep 2 years' worth of data.
> I'm wondering what approaches or scripts other groups use.

Which database server and version do you run, MySQL or MariaDB? What's
your Slurm version?

Did you already make appropriate database purges to reduce the size? I
have some notes in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters

/Ole

Paul Edmon

unread,
Jul 14, 2022, 9:11:54 AM7/14/22
to slurm...@lists.schedmd.com

We just use the Archive function built into slurm.  That has worked fine for us for the past 6 years.  We keep 6 months of data in the active archive.


If you have 6 years worth of data and you want to prune down to 2 years, I recommend going month by month rather than doing it in one go.  When we initially started archiving data several years back our first pass at archiving (which at that time had 2 years of data in it) took forever and actually caused issues with the archive process.  We worked with SchedMD, improved the archive script built into Slurm but also decided to only archive one month at a time which allowed it to get done in a reasonable amount of time.


The archived data can be pulled into a different slurm database, which is what we do for importing historic data into our XDMod instance.


-Paul Edmon-

Timony, Mick

unread,
Jul 14, 2022, 12:50:15 PM7/14/22
to slurm...@lists.schedmd.com
​Hi Ole,
Which database server and version do you run, MySQL or MariaDB?  What's
your Slurm version?
​mariadb 5.5.68 and a patched version of slurm 21.08.7

Did you already make appropriate database purges to reduce the size?  I
have some notes in my Wiki page
​No, we have not made any changes yet as am concerned that it will cause performance issues.

Thanks
--Mick

Timony, Mick

unread,
Jul 14, 2022, 12:55:51 PM7/14/22
to slurm...@lists.schedmd.com
Hi Paul

If you have 6 years worth of data and you want to prune down to 2 years, I recommend going month by month rather than doing it in one go.  When we initially started archiving data several years back our first pass at archiving (which at that time had 2 years of data in it) took forever and actually caused issues with the archive process.  We worked with SchedMD, improved the archive script built into Slurm but also decided to only archive one month at a time which allowed it to get done in a reasonable amount of time.
Thanks, that is good advice. We'd had issues with accounting in the past and had to run slurmdb rollups which can take up to 2 weeks. It's good to get feedback like yours. Do you what exactly the Slurm archive script does and how it archives data or what formats it supports?

The docs are a little vague:

https://slurm.schedmd.com/slurmdbd.conf.html#OPT_ArchiveScript

"This script is used to transfer accounting records out of the database into an archive. It is used in place of the internal process used to archive objects. The script is executed with no arguments, and the following environment variables are set."


The archived data can be pulled into a different slurm database, which is what we do for importing historic data into our XDMod instance.
How do you keep track of and implement schema changes to this database?

Thanks
--Mick


Paul Edmon

unread,
Jul 14, 2022, 1:03:10 PM7/14/22
to slurm...@lists.schedmd.com

I've never looked at the internals of how the native Slurm archive script works.  What I can tell you is that we have never had a problem reimporting the data back in that was dumped from older versions into a current version database.  So the import using sacctmgr must do the conversion from the older formats to the newer formats and handle the schema changes.


I will note that if you are storing job_scripts and envs those can eat up a ton of space in 21.08.  It looks like they've solved that problem in 22.05 but the archive steps on 21.08 took forever due to those scripts and envs.


-Paul Edmon-

Timony, Mick

unread,
Jul 14, 2022, 2:35:29 PM7/14/22
to slurm...@lists.schedmd.com


What I can tell you is that we have never had a problem reimporting the data back in that was dumped from older versions into a current version database.  So the import using sacctmgr must do the conversion from the older formats to the newer formats and handle the schema changes.
​That's the bit of info I was missing, I didn't realise that it outputs the data in a format that sacctmgr can read.

I will note that if you are storing job_scripts and envs those can eat up a ton of space in 21.08.  It looks like they've solved that problem in 22.05 but the archive steps on 21.08 took forever due to those scripts and envs.
​Yes, we are storing job_scripts with:

AccountingStoreFlags=job_script

I think when we made that decision, we decided that also saving the job_env would take up too much room as our DB is pretty big at the moment, at approx. 300GB with the o2_step_table and the o2_job_table taking up the most space for obvious reasons:

+----------------------------+-----------+
| Table                      | Size (GB) |
+----------------------------+-----------+
| o2_step_table              |    183.83 |
| o2_job_table               |    128.18 |


That's good advice Paul, much appreciated. 


>took forever and actually caused issues with the archive process
I think that should be highlighted for other users!

For those interested, to find the tables sizes I did this:

SELECT table_name AS "Table", ROUND(((data_length + index_length) / 1024 / 1024 / 1024), 2) AS "Size (GB)" FROM information_schema.TABLES WHERE table_schema = "slurmdbd" ORDER BY (data_length + index_length) DESC;

Replace slurmdbd with the name of your database.

Cheers
--Mick


Paul Edmon

unread,
Jul 14, 2022, 3:01:24 PM7/14/22
to slurm...@lists.schedmd.com

Yeah, a word of warning about going from 21.08 to 22.05, make sure you have enough storage on the database host you are doing the work on and budget a long enough time for the upgrade.  We just converted our 198 GB (compressed, 534 GB raw) database this week.  The initial attempt failed (after running for 8 hours) because we ran out of disk space (part of the reason we had to compress is that the server we use for our slurm master only has 800 GB of SSD on it).  That meant we had to reimport our DB, which took 8 hours, plus then we had to drop the job scripts and job envs, which took another 5 hours, to then attempt the upgrade which took 2 hours.


Moral of the story, make sure you have enough space and budget sufficient time.  You may want to consider nulling out the job scripts and envs for the upgrade as they complete redo the way those are stored in the database in 22.05 so that it is more efficient but getting from here to there is the trick.


For details see the bug report we filed: https://bugs.schedmd.com/show_bug.cgi?id=14514


-Paul Edmon-

Ole Holm Nielsen

unread,
Jul 15, 2022, 2:05:21 AM7/15/22
to slurm...@lists.schedmd.com
On 7/14/22 18:49, Timony, Mick wrote:
> Which database server and version do you run, MySQL or MariaDB?  What's
> your Slurm version?
>
> ​mariadb 5.5.68 and a patched version of slurm 21.08.7

We run the same MariaDB on CentOS 7.9, and Slurm 21.08.8-2.

> Did you already make appropriate database purges to reduce the size?  I
> have some notes in my Wiki page
> https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters
> <https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters>
>
> ​No, we have not made any changes yet as am concerned that it will cause
> performance issues.

IMHO, database purging is an important way to *improve* the performance!
Reducing the database size is beneficial, also for making the daily
database dump faster. As I wrote in the above Wiki page, you should
introduce purging very gently and decrease the purging intervals in small
steps over a number of weeks:

> A monthly purge operation can be a huge amount of work for a database depending on its size, and you certainly want to cut down the amount of work required during the purges. If you did not use purges before, it is probably a good idea to try out a series of daily purges starting with:
>
> PurgeEventAfter=2000days
> PurgeJobAfter=2000days
> PurgeResvAfter=2000days
> PurgeStepAfter=2000days
> PurgeSuspendAfter=2000days
>
> If this works well over a few days, decrease the purge interval 2000days little by little and try again (1800, 1500, etc) until you after many iterations come down to the desired final purge intervals.

Please note that it's acceptable for the slurmdbd to be slow, or even down
for many hours. The slurmctld will cache all data (up to a certain limit)
while the slurmdbd is not responding.

/Ole


Ole Holm Nielsen

unread,
Jul 15, 2022, 2:08:59 AM7/15/22
to slurm...@lists.schedmd.com
Hi Paul,

On 7/14/22 15:10, Paul Edmon wrote:
> We just use the Archive function built into slurm.  That has worked fine
> for us for the past 6 years.  We keep 6 months of data in the active archive.

Could you kindly share your Archive* settings in slurmdbd.conf? I've
never tried to use this, but it sounds like a good idea.

Thanks,
Ole

Timony, Mick

unread,
Jul 15, 2022, 2:06:45 PM7/15/22
to Slurm User Community List
That's great advice. Thank you Ole.

--Mick

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Ole Holm Nielsen <Ole.H....@fysik.dtu.dk>
Sent: Friday, July 15, 2022 2:04 AM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] SlurmDB Archive settings?
 

Paul Edmon

unread,
Jul 18, 2022, 9:36:50 AM7/18/22
to slurm...@lists.schedmd.com
Sure.  Here are our settings:

ArchiveJobs=yes
ArchiveDir="/slurm/archive"
ArchiveSteps=yes
ArchiveResvs=yes
ArchiveEvents=yes
ArchiveSuspend=yes
ArchiveTXN=yes
ArchiveUsage=yes
PurgeEventAfter=6month
PurgeJobAfter=6month
PurgeResvAfter=6month
PurgeStepAfter=6month
PurgeSuspendAfter=6month
PurgeTXNAfter=6month
PurgeUsageAfter=6month

-Paul Edmon-
Reply all
Reply to author
Forward
0 new messages