Hi Anand,
2. ZMQ works on both single node and multi-node settings. I personally prefer using processes on a single node, even with multi GPUs (see
example)
because you no longer have to jump through tcp/ip connections (i.e. ssh), which are could be slow or lossy in HPC settings. One less hoop to jump through, one less factor for jobs to fail. See links in 1. for more explanation on ZMQ.
3. Startup overhead for OpenMM tends to be short compared to the propagation time, unless you have a really short tau. You could implement shared context (say using
NVIDIA MPS) and
that had demonstrated speedup, but not really needed unless you're stretching system limits.
Last thing I'd like to add is: in the WESTPA context, running one segment per GPU (and running more segments/bin) is probably more beneficial/faster than trying to get a single segment to run across multiple GPUs, especially with Nvidia SLI discontinued.
Best,
Jeremy L.
---
Jeremy M. G. Leung, PhD
Postdoctoral Associate, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]