MPS only works for the first GPU in a system. If you have a server with multiple GPUs, you can only share the first GPU between multiple jobs.
Sharding, on the other hand, works for all GPU's in system. Not that sharding is soft, Slurm will not monitor the actual GPU use, so jobs will have to respect the requested resources.
Sharding works great in our setup (3 servers with 8, 6 and 4 Nvidia GPUs, respectively + a few smaller single GPU boxes). We mainly use 1 shard = 1GB of GPU memory, but other setups may be used.
Cheers,
Esben
From: EPF (Esben Peter Friis) <E...@novozymes.com>
Sent: Friday, February 3, 2023 17:03
To: EPF (Esben Peter Friis) <E...@novozymes.com>
Subject: Fw: [slurm-users] GPU: MPS vs Sharding