Hi all,
I am learning how to run WESTPA (with amber) on multiple GPUs across multiple nodes using zmq work manager. I have read several threads in this google groups about this problem as well as some of the additional westpa tutorials. It seems in many of the examples, people have been ssh'ing into each allocated node, then executing node.sh. Then, in runseg.sh they use the WM_PROCESS_INDEX to assign that segment a single CUDA_VISIBLE_DEVICE. However, I saw in some earlier threads that people have used srun to spawn node.sh tasks on each of the nodes and then again use WM_PROCESS_INDEX to assign CUDA_VISIBLE_DEVICES to a segment.
I was wondering if one of these methods is significantly better than another? Also, if I run srun with multiple tasks-per-node and one gpu-per-task, I don't seem to even need to deal with CUDA_VISIBLE_DEVICES in my scripts, since srun does it 'automaticaly.' This seems less complicated than ssh'ing, so I wanted to know if it has any drawbacks. I apologize if this seems like a dumb question or has already been asked; I don't know much about using SLURM or running anything in parallel.
Thanks!
Schuyler