It depends on a number of factors.
How do your workloads behave? Do they do a lot of fork()? I’ve had cases in the past where users submitted scripts which initially used quite a lot of memory and then used fork() or system() to execute subprocesses. This of course means that temporarily (between the fork() and the exec() system calls) the job uses twice as much virtual memory, although this does not become real because the pages are copy-on-write. Something similar happens if the code performs mmap() on large files.
Whether this has an impact on you needing swap space is down to what your sysctl settings are for vm.overcommit_memory and vm.overcommit_ratio
If you set vm.overcommit_memory to 2, then the OOM killer will never hit you (because malloc() will fail rather than allocate virtual memory that isn’t available), but cases like the above will tend to fail memory allocations unnecessarily, especially if you don’t have any swap allocated.
If you set vm.overcommit_memory to 0 or 1, then you need less swap allocated (possibly even zero) but you run the risk of running out of memory and the OOM killer blowing things up left right and centre.
If you provide swap, it only causes a performance impact if the node actually runs out of physical memory and actively starts swapping.
So bottom line is I think it depends on what you want the failure mode to be.
I now call on someone who understands cgroups properly to explain how this changes when cgroups are in play, because I’m not sure I understand that!
Tim
--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue |
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com
Joseph,
You will likely get many perspectives on this. I disable swap
completely on our compute nodes. I can be draconian that way. For
the workflow supported, this works and is a good thing.
Other workflows may benefit from swap.
Brian Andrus