I've enabled docker to run on the compute nodes per this discussion
https://groups.google.com/g/google-cloud-slurm-discuss/c/BYeF-bta7XY
This works for the most part, but occasionally the compute nodes are unable to install docker; it's not clear what is causing this. The only way I found to fix this, is by stopping and starting the slurm controller.
Is this a known behavior with scripts added to /slurm/custom-scripts/custom-controller-install? And if so, is there a work around to make this more stable?