performance of PIMD with CP2K/i-PI

34 views
Skip to first unread message

Qinghua Liao

unread,
Sep 18, 2025, 12:26:31 PMSep 18
to ipi-users
Hello,
I am running ab initio PIMD using CP2K/i-PI, but the performance is quite bad.
It would be great if someone could give me some suggestions to improve it.

The system is one CH4 on a ice surface, 32 water molecules, 101 atoms in total. The simulation runs with 112 cores on a HPC cluster, 8 beads were used.

For each step with CP2K, it takes around 6 second, but total time for one PIMD step, it takes almost 1 minutes. I guess only one core was used for i-PI, but I assume it should be faster than CP2K with 112 cores.

Thanks very much!

All the best,
Qinghua

cp2k.out
cp2k.sh
cp2k.inp
input.xml

Michele Ceriotti

unread,
Sep 18, 2025, 3:14:59 PMSep 18
to ipi-users
Hello! It looks like you're running a single cp2k instance and you have 8 beads, so the beads will be evaluated sequentially. 6x8=48 it's pretty close to 1 minute. I'd expect the overhead from i-PI to be in the few milliseconds ballpark so you're simply seeing CP2K running one bead at a time.
You're running with verbosity=high so you should see that it runs one bead at a time because it'll tell you to which CP2K instance it's dispatching. 
You can run multiple CP2k instances, and then i-PI will parallelize over them.

Note incidentally that 8 beads are not enough to converge a PI calculation for water at room temperature. If you want you can try to use PIGLET https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.109.100604. Good parameter should be these https://gle4md.org/index.html?page=matrix&kind=piglet&centroid=kh_8-4&cw0=4000&ucw0=cm1&nbeads=8&temp=300&utemp=k&parset=20_8_t&outmode=ipi&aunits=ps&cunits=k

Hope this helps,
Michele

Qinghua Liao

unread,
Sep 18, 2025, 4:45:27 PMSep 18
to ipi-users
Hello Michele,

Thanks so much for your detailed explanation. I just tried, and it works with 8 cp2k instances.

Thanks for the suggestions of thermostat. Initially, I thought both gle and nm_gle (PIGLET) work well with less beads in terms of convergences.
Sure, I will switch to PIGLET. Thank you!


All the best,
Qinghua

Qinghua Liao

unread,
Sep 19, 2025, 4:14:06 AMSep 19
to ipi-users
Hello Michele,

As I tested last evening, PIMD with multiple CP2K instances was working, however, I found another problem.
During the simulations, some of the 8 instances got dead, no updating, I suppose that some of the running instances have to deal with two beads.
Then it will slow down the simulations. How can I solve this technical issue? Thanks very much!

Here is the commands line that I used to run the 8 instances with CP2K:
#
HOST=$(hostname)
source /home/programs/i-PI_316/env.sh

if [ -e simulation.restart ]; then
   sed -i  "s/HOSTNAME/$HOST/g" simulation.restart
   i-pi simulation.restart  >> log.ipi 2>&1 &
else
   sed  "s/HOSTNAME/$HOST/g" pimd.xml > input.xml
   i-pi input.xml &> log.ipi &
fi
sleep 10

sed "s/HOST HOSTNAME/HOST $HOST/g" pimd.inp >  cp2k.inp
cp  -p cp2k.inp rep0/
cp  -p cp2k.inp rep1/
cp  -p cp2k.inp rep2/
cp  -p cp2k.inp rep3/
cp  -p cp2k.inp rep4/
cp  -p cp2k.inp rep5/
cp  -p cp2k.inp rep6/
cp  -p cp2k.inp rep7/

srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep0/cp2k.inp -o rep0/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep1/cp2k.inp -o rep1/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep2/cp2k.inp -o rep2/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep3/cp2k.inp -o rep3/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep0/cp2k.inp -o rep4/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep1/cp2k.inp -o rep5/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep2/cp2k.inp -o rep6/cp2k.out  &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep3/cp2k.inp -o rep7/cp2k.out  &
wait


All the best,
Qinghua
On Thursday, September 18, 2025 at 9:14:59 PM UTC+2 michele....@gmail.com wrote:

Michele Ceriotti

unread,
Sep 19, 2025, 5:18:08 AMSep 19
to ipi-users
Sorry, this is not something I can see how to help with. They might be dying because some SCF does not converge, you go out of memory (IDK how the different jobs are allocated when you stack all those srun) or who knows what. 
I fear you'll have to investigate this with the people who run your computer cluster to understand what's the underlying cause. We discussed a while ago about having i-PI die when one of the clients does, but I don't think we ever ended up implementing it. 
Best,
Michele

Qinghua Liao

unread,
Sep 19, 2025, 7:14:16 AMSep 19
to ipi-users
Thanks very much Michele!

I will ask the cluster administrators to look into it.


All the best,
Qinghua

Qinghua Liao

unread,
Sep 25, 2025, 10:43:01 AM (12 days ago) Sep 25
to ipi-users
Hello Michele,

When I read this line of yours, I was wondering I missed something.
"You're running with verbosity=high so you should see that it runs one bead at a time because it'll tell you to which CP2K instance it's dispatching. "

I found that my log file log.ipi was empty during the simulations (hundreds steps), unless there are some mistakes.
Then I set the number of steps as 10, and I got all the information when the simulation was finished.

I searched the manual, to see if there is some keyword like flush, to write out the output, but I did not find it.
My command line is:
i-pi input.xml &> log.ipi 2>&1 &

Do you know how I can keep the log file up to date of the simulation? Thanks a lot!



All the best,
Qinghua

On Thursday, September 18, 2025 at 9:14:59 PM UTC+2 michele....@gmail.com wrote:

Qinghua Liao

unread,
Sep 26, 2025, 3:23:31 AM (12 days ago) Sep 26
to ipi-users
Hello again Michele,

I guess I solved the multiple cp2k instances issue, with the help of our HPC support.

As I used 8 beads, the default slots from i-PI is 4, then I changed the slots to be 8, so far the simulations are running well.
Even the technician at the HPC cluster suggested to use 32 for the slots.

However, about the performance, I have another question:

Currently, I tried the simulation for 25 steps, the average time for each step reported from the log file (i-PI) is 19.5 +- 0.4,
but the average time from CP2K of the 8 instances are: 6 out of 8, ~10.0 second/step, 2 out of 8, ~14.5 second/step.
For each step, there is a time difference of 4-5 seconds that I don't know where it cost.

Any ideas about where the time gap is coming from? Thanks very much!

All the best,
Qinghua

On Friday, September 19, 2025 at 11:18:08 AM UTC+2 michele....@gmail.com wrote:

Michele Ceriotti

unread,
Sep 26, 2025, 5:30:47 AM (12 days ago) Sep 26
to ipi-users
at each time step the timing will be determined by the SLOWEST replica. so if you have a mean time of 15s and a standard deviation of 5s, the timing will be set by the slowest out of the 8. I'm not shocked it's around 20. Basically, look at what is the DISTRIBUTION of CP2k timings, and see if it's plausible that the slowest out of 8 will take about 20s. 
Michele

Qinghua Liao

unread,
Sep 29, 2025, 3:28:33 AM (9 days ago) Sep 29
to ipi-users
Thanks so much for all the suggestions Michele!
The current performance is acceptable!

All the best,
Qinghua
Reply all
Reply to author
Forward
0 new messages