We just use the Archive function built into slurm. That has worked fine for us for the past 6 years. We keep 6 months of data in the active archive.
If you have 6 years worth of data and you want to prune down to 2 years, I recommend going month by month rather than doing it in one go. When we initially started archiving data several years back our first pass at archiving (which at that time had 2 years of data in it) took forever and actually caused issues with the archive process. We worked with SchedMD, improved the archive script built into Slurm but also decided to only archive one month at a time which allowed it to get done in a reasonable amount of time.
The archived data can be pulled into a different slurm database,
which is what we do for importing historic data into our XDMod
instance.
-Paul Edmon-
Which database server and version do you run, MySQL or MariaDB? What's
your Slurm version?
Did you already make appropriate database purges to reduce the size? I
have some notes in my Wiki page
If you have 6 years worth of data and you want to prune down to 2 years, I recommend going month by month rather than doing it in one go. When we initially started archiving data several years back our first pass at archiving (which at that time had 2 years of data in it) took forever and actually caused issues with the archive process. We worked with SchedMD, improved the archive script built into Slurm but also decided to only archive one month at a time which allowed it to get done in a reasonable amount of time.
The archived data can be pulled into a different slurm database, which is what we do for importing historic data into our XDMod instance.
I've never looked at the internals of how the native Slurm archive script works. What I can tell you is that we have never had a problem reimporting the data back in that was dumped from older versions into a current version database. So the import using sacctmgr must do the conversion from the older formats to the newer formats and handle the schema changes.
I will note that if you are storing job_scripts and envs those can eat up a ton of space in 21.08. It looks like they've solved that problem in 22.05 but the archive steps on 21.08 took forever due to those scripts and envs.
-Paul Edmon-
What I can tell you is that we have never had a problem reimporting the data back in that was dumped from older versions into a current version database. So the import using sacctmgr must do the conversion from the older formats to the newer formats and handle the schema changes.
I will note that if you are storing job_scripts and envs those can eat up a ton of space in 21.08. It looks like they've solved that problem in 22.05 but the archive steps on 21.08 took forever due to those scripts and envs.
Yeah, a word of warning about going from 21.08 to 22.05, make sure you have enough storage on the database host you are doing the work on and budget a long enough time for the upgrade. We just converted our 198 GB (compressed, 534 GB raw) database this week. The initial attempt failed (after running for 8 hours) because we ran out of disk space (part of the reason we had to compress is that the server we use for our slurm master only has 800 GB of SSD on it). That meant we had to reimport our DB, which took 8 hours, plus then we had to drop the job scripts and job envs, which took another 5 hours, to then attempt the upgrade which took 2 hours.
Moral of the story, make sure you have enough space and budget sufficient time. You may want to consider nulling out the job scripts and envs for the upgrade as they complete redo the way those are stored in the database in 22.05 so that it is more efficient but getting from here to there is the trick.
For details see the bug report we filed: https://bugs.schedmd.com/show_bug.cgi?id=14514
-Paul Edmon-