performance of PIMD with CP2K/i-PI

Qinghua Liao

unread,

Sep 18, 2025, 12:26:31 PMSep 18

to ipi-users

Hello,

I am running ab initio PIMD using CP2K/i-PI, but the performance is quite bad.

It would be great if someone could give me some suggestions to improve it.

The system is one CH4 on a ice surface, 32 water molecules, 101 atoms in total. The simulation runs with 112 cores on a HPC cluster, 8 beads were used.

For each step with CP2K, it takes around 6 second, but total time for one PIMD step, it takes almost 1 minutes. I guess only one core was used for i-PI, but I assume it should be faster than CP2K with 112 cores.

Thanks very much!

All the best,

Qinghua

cp2k.out

cp2k.sh

cp2k.inp

input.xml

Michele Ceriotti

unread,

Sep 18, 2025, 3:14:59 PMSep 18

to ipi-users

Hello! It looks like you're running a single cp2k instance and you have 8 beads, so the beads will be evaluated sequentially. 6x8=48 it's pretty close to 1 minute. I'd expect the overhead from i-PI to be in the few milliseconds ballpark so you're simply seeing CP2K running one bead at a time.

You're running with verbosity=high so you should see that it runs one bead at a time because it'll tell you to which CP2K instance it's dispatching.

You can run multiple CP2k instances, and then i-PI will parallelize over them.

Note incidentally that 8 beads are not enough to converge a PI calculation for water at room temperature. If you want you can try to use PIGLET https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.109.100604. Good parameter should be these https://gle4md.org/index.html?page=matrix&kind=piglet&centroid=kh_8-4&cw0=4000&ucw0=cm1&nbeads=8&temp=300&utemp=k&parset=20_8_t&outmode=ipi&aunits=ps&cunits=k

Hope this helps,

Michele

Qinghua Liao

unread,

Sep 18, 2025, 4:45:27 PMSep 18

to ipi-users

Hello Michele,

Thanks so much for your detailed explanation. I just tried, and it works with 8 cp2k instances.

Thanks for the suggestions of thermostat. Initially, I thought both gle and nm_gle (PIGLET) work well with less beads in terms of convergences.

Sure, I will switch to PIGLET. Thank you!

All the best,

Qinghua

Qinghua Liao

unread,

Sep 19, 2025, 4:14:06 AMSep 19

to ipi-users

Hello Michele,

As I tested last evening, PIMD with multiple CP2K instances was working, however, I found another problem.

During the simulations, some of the 8 instances got dead, no updating, I suppose that some of the running instances have to deal with two beads.

Then it will slow down the simulations. How can I solve this technical issue? Thanks very much!

Here is the commands line that I used to run the 8 instances with CP2K:

#
HOST=$(hostname)
source /home/programs/i-PI_316/env.sh

if [ -e simulation.restart ]; then
sed -i "s/HOSTNAME/$HOST/g" simulation.restart
i-pi simulation.restart >> log.ipi 2>&1 &
else
sed "s/HOSTNAME/$HOST/g" pimd.xml > input.xml
i-pi input.xml &> log.ipi &
fi
sleep 10

sed "s/HOST HOSTNAME/HOST $HOST/g" pimd.inp > cp2k.inp
cp -p cp2k.inp rep0/
cp -p cp2k.inp rep1/
cp -p cp2k.inp rep2/
cp -p cp2k.inp rep3/
cp -p cp2k.inp rep4/
cp -p cp2k.inp rep5/
cp -p cp2k.inp rep6/
cp -p cp2k.inp rep7/

srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep0/cp2k.inp -o rep0/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep1/cp2k.inp -o rep1/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep2/cp2k.inp -o rep2/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep3/cp2k.inp -o rep3/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep0/cp2k.inp -o rep4/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep1/cp2k.inp -o rep5/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep2/cp2k.inp -o rep6/cp2k.out &
srun -n 4 --cpus-per-task=7 cp2k.psmp -i rep3/cp2k.inp -o rep7/cp2k.out &
wait

All the best,

Qinghua

On Thursday, September 18, 2025 at 9:14:59 PM UTC+2 michele....@gmail.com wrote:

Michele Ceriotti

unread,

Sep 19, 2025, 5:18:08 AMSep 19

to ipi-users

Sorry, this is not something I can see how to help with. They might be dying because some SCF does not converge, you go out of memory (IDK how the different jobs are allocated when you stack all those srun) or who knows what.

I fear you'll have to investigate this with the people who run your computer cluster to understand what's the underlying cause. We discussed a while ago about having i-PI die when one of the clients does, but I don't think we ever ended up implementing it.

Best,

Michele

Qinghua Liao

unread,

Sep 19, 2025, 7:14:16 AMSep 19

to ipi-users

Thanks very much Michele!

I will ask the cluster administrators to look into it.

All the best,

Qinghua

Qinghua Liao

unread,

Sep 25, 2025, 10:43:01 AMSep 25

to ipi-users

Hello Michele,

When I read this line of yours, I was wondering I missed something.

"You're running with verbosity=high so you should see that it runs one bead at a time because it'll tell you to which CP2K instance it's dispatching. "

I found that my log file log.ipi was empty during the simulations (hundreds steps), unless there are some mistakes.

Then I set the number of steps as 10, and I got all the information when the simulation was finished.

I searched the manual, to see if there is some keyword like flush, to write out the output, but I did not find it.

My command line is:

i-pi input.xml &> log.ipi 2>&1 &

Do you know how I can keep the log file up to date of the simulation? Thanks a lot!

All the best,

Qinghua

On Thursday, September 18, 2025 at 9:14:59 PM UTC+2 michele....@gmail.com wrote:

Qinghua Liao

unread,

Sep 26, 2025, 3:23:31 AMSep 26

to ipi-users

Hello again Michele,

I guess I solved the multiple cp2k instances issue, with the help of our HPC support.

As I used 8 beads, the default slots from i-PI is 4, then I changed the slots to be 8, so far the simulations are running well.

Even the technician at the HPC cluster suggested to use 32 for the slots.

However, about the performance, I have another question:

Currently, I tried the simulation for 25 steps, the average time for each step reported from the log file (i-PI) is 19.5 +- 0.4,

but the average time from CP2K of the 8 instances are: 6 out of 8, ~10.0 second/step, 2 out of 8, ~14.5 second/step.

For each step, there is a time difference of 4-5 seconds that I don't know where it cost.

Any ideas about where the time gap is coming from? Thanks very much!

All the best,

Qinghua

On Friday, September 19, 2025 at 11:18:08 AM UTC+2 michele....@gmail.com wrote:

Michele Ceriotti

unread,

Sep 26, 2025, 5:30:47 AMSep 26

to ipi-users

at each time step the timing will be determined by the SLOWEST replica. so if you have a mean time of 15s and a standard deviation of 5s, the timing will be set by the slowest out of the 8. I'm not shocked it's around 20. Basically, look at what is the DISTRIBUTION of CP2k timings, and see if it's plausible that the slowest out of 8 will take about 20s.

Michele

Qinghua Liao

unread,

Sep 29, 2025, 3:28:33 AMSep 29

to ipi-users

Thanks so much for all the suggestions Michele!

The current performance is acceptable!

All the best,

Qinghua

Reply all

Reply to author

Forward