Hello, fellow users:
I have been using Slurm for the past three years, but recently I bumped
into a doubt.
I am using Slurm's (version 23.02.7) collected metrics
(jobacct_gather/linux) to do a performance analysis of an application.
I have read the documentation regarding the metrics
(
https://slurm.schedmd.com/sacct.html) but still find the Ave* metrics
confusing, and more specifically the AveRSS and AveDiskWrite.
AveDiskWrite is defined as "Average number of bytes written by all tasks
in job." So, if I double the workload, which say that it had x
avediskwrite, I should observe 2x. So far it is what I observed. But,
then, if I double the resources while maintaining the workload I observe
again x, and not 2x.
So my suspicion is that the metric is the sum of written bytes across
time, then divided by the number of nodes.
But then with AveRSS, defined as "Average resident set size of all tasks
in job," I observe what I expected with AveDiskWrite. That is, that this
metric scales with the workload irrespective of the resources it has
available.
So I am not sure what the "Ave" references here.
I would be thankful if someone could clarify the behavior, and even more
grateful if someone could point me where in the code these metrics are
aggregated and processed to be stored in the database.
Many thanks,
Manu.
--
Manuel G. Marciani - マルシアニ·マヌエル
First Stage Reseacher at Computational Earth Sciences (CES) - Earth
Sciences Department
Barcelona Supercomputing Center - Centro Nacional de Supercomputación
Ph.D Student at Departament d'Arquitectura de Computadors (DAC) -
Facultat d'Informàtica de Barcelona (FIB)
Universitat Politècnica de Catalunya (UPC)
BSC building
Plaça Eusebi Güell, 1-3, 08034 Barcelona, Spain
Desk BSC-PL0-6-20/8
mail to:
manuel....@bsc.es
mail to:
manuel.gimen...@upc.edu
mail to:
manuel....@a.riken.jp
mail to:
manuel.gimenez.d...@hu-berlin.de
--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com