I think the situation is likely to be a little different. Let’s consider a fortran program that statically or dynamically defines large arrays. This defines a virtual memory size – like declaring that this is the maximum amount of memory you might use if you fill the arrays. That amount of real memory + swap must be available for the program to run – after all, you might use that amount… Speaking loosely, linux has a soft memory allocation policy so memory may not actually be allocated until it is used. If the program happens to read a smaller dataset and the arrays are not filled then the resident set size may be significantly smaller than the virtual memory size. Further, memory swapped doesn’t count to the RSS so it might be even smaller. Effectively RSS for a process is the actual footprint in RAM. It will change over the life of the process/job and slurm will track the maximum (MaxRSS). I’d actually expect MaxRSS to be the maximum of the sum of RSS of known processes as sampled periodically through the job – but I’m guessing. This should apply reasonably to parallel jobs if the sum spans nodes (or it wouldn’t be the first batch system to only effectively account for the first allocated node). The whole linux memory tracking/accounting system has gotchas as shared memory (say for library code) has to be accounted for somewhere, but we can reasonably assume in HPC that memory use is dominated by unique computational working set data – so MaxRSS is a good estimate of how much RAM is needed to run a given job.
Gareth