Dear Developers,
I am aiming to run RPC simulations in a multiple-time-step setup with 32 beads using i-pi 3.0. The forcefield I use for the beads is really fast (<1 ms/step). Ideally, when calculating forces only on the beads and running 32 clients for the "cheap" forcefield parallel, one would expect a ~1 ms/step timing. However, the socket communication overhead (~1 ms/bead) adds up for the beads, so I get around ~30 ms/step (which is the normal behavior in i-pi, if I get it right). How could I circumvent this overhead? Here are the things that I have already tried, but didn't help:
- Reducing the latency in ffsocket to 1e-4
- Reducing the output printing of the simulation to minimize I/O
- Using a Python implementation of the "cheap" forcefield through ffdirect to avoid socket communication (both with threaded="True" and threaded="False"). For this, I have only run the i-pi server (i-pi config.xml), and did not launch any driver afterwards. Comparing timings with different numbers of beads, my conclusion was that there is no parallel evaluation in a multiple replica scenario; each replica was evaluated serially.
Is there a way to use ffdirect parallel for many beads, or any other option to evaluate forces without accumulating communication overhead?
Thanks for the help.
Best regards,
Bence Mészáros