Hm. What version of i-PI are you using? And how large is the system you're running?
We accelerated a lot the 3.0 version so I'd start with that. Then there are many little things
you can change to reduce the overhead. Reduce the <latency> setting to 1e-3 or 1e-4;
Outputting a checkpoint at every step is also very slow.
If your system is reasonably stable (meaning you don't expect creashes) you can also
reduce the flushing stride by using the "safe_stride" option in <simulation>.
If you look in the ipi_tests/profiling folder you'll see some input files that have been
optimized to minimize overhead.
Let me know if any of this helps.
M