CP2K Hangs for CellOpt+kpoints in certain systems.

41 vues
Accéder directement au premier message non lu

Nicholas Winner

non lue,
21 oct. 2022, 19:17:1421/10/2022
à cp2k
Hello all,

I'm doing some cell optimizations using kpoints with CP2K. I have never had a problem with the geo_opt module, but cell_opt is hanging very often for a number of systems. 

Usually, the calculation will proceed for 1-3 optimization steps, and then it will hang at the start of a new SCF loop. I have found that sometimes the behavior is fixed by using direct_p_mixing instead of broyden_mixing, but this is not a consistent fix.

I've noticed the problem using both v9.1 and v2022.1, I've also tried a build on 3 different clusters and found the same behavior. 

Does anyone have experience with this and can offer advice? I've attached an example for Argon, which has this problem very consistently. The output goes until the calculation gets stuck.

Thanks,
Nick
data.tar.gz

Krack Matthias (PSI)

non lue,
22 oct. 2022, 00:08:2922/10/2022
à cp...@googlegroups.com

Hello Nick

 

The failures are possibly caused by the keyword WAVEFUNCTION COMPLEX. I observe that the regression test QS/regtest-kp-2/cc2.inp often fails because of this issue related to zgemv of OpenBLAS.

 

Best

 

Matthias

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/d26b287a-624d-4ab4-88ee-052529111991n%40googlegroups.com.

Nicholas Winner

non lue,
22 oct. 2022, 12:30:4322/10/2022
à cp2k
Hi Matthias, 

That's very strange. I tried a calculation again on Argon, but this does not fix the problem. 

-N

Krack Matthias (PSI)

non lue,
22 oct. 2022, 13:53:4022/10/2022
à cp...@googlegroups.com
On Argon?

Matthias

Am 22.10.2022 um 18:30 schrieb Nicholas Winner <nwi...@berkeley.edu>:

Hi Matthias, 

Nicholas Winner

non lue,
22 oct. 2022, 14:14:2322/10/2022
à cp2k
Hey I have identified the problem area, but not exact cause. I was relaxing argon for this example.

I found that the issue is something in the parallelization of kpoints. When I run without mpirun, it always runs okay. I was running with 128 MPI processes (one per core), and it failed. When I run with Monkhorst-pack grid, the calculation runs fine with 128 MPI processes, but the automatic grid has many more kpoints. So I rerun with the explicit grid but with 4 MPI processes instead, and the calculation proceeds.

There is some kind of "over"-parallelization problem that's happening here. Has this reported in the community at all?
Répondre à tous
Répondre à l'auteur
Transférer
0 nouveau message