Dear Dirac experts,
I've been using the new EOM-CCSD module for a few weeks now as an alternative to FS-CCSD and in generally it works great. I've run into a recurring problem however that only occurs when I use a Hamiltonian with spin-orbit on a molecule with only one symmetry irrep. I've tried 4-c Dirac-Coulomb and 2-c X2C, both the same general result, namely that when it gets into the EOM diagonalization steps, the memory grows and grows until eventually the OS kills the job via oom killer. The issue is not node specific and these compute nodes have nearly 400 GB of RAM. In my last attempt, I was just running over 4 cores and each core was allocated with --mw 1400 --aw 4000, so only about 160 GB. It got through 10 iterations as noted below. The system log showed at this time memory was too low and oom-killer was invoked on dirac.x.
I'm running Dirac19, pretty recent build from github, using OpenMPI (64-bit integers).
The EOM part of the input was very simple:
.EOMCC
*EOMCC
.EE
1 6
regards, -Kirk
Output snippet:
Iteration 10
Number of OMP threads, procs in use: 1 1
Eigenvalue 1 : -0.786041E-02
Eigenvalue 2 : -0.268021E-02
Eigenvalue 3 : 0.118441E-01
Eigenvalue 4 : 0.212588E-01
Eigenvalue 5 : 0.266271E-01
Eigenvalue 6 : 0.311534E-01
DIRAC pam run in /home/kipeters/projects/Bowen/ThAu2/dirac/X2C
==== below this line is the stderr stream ====
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node compute-0-9 exited on signal 9 (Killed).
--------------------------------------------------------------------------
--
You received this message because you are subscribed to the Google Groups "dirac-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dirac-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/BYAPR01MB3640E2AD1C99EE8E115F029FD6F60%40BYAPR01MB3640.prod.exchangelabs.com.
Dear Kirk,Thank you for your question.What you are seeing is not stricly speaking abormal, and will be seen irrespective of the Hamiltonian you use.The current EOM implementation has a shortcoming in that it is more memory hungry during the davidson procedure than it should/could be.Roughly speaking, at each iteration you increase memory usage by a bit around 4*number of unconverged roots*dimension(T1+T2), and that per MPI process, as the number of trial and sigma vectors is increased. note dimension(T1+T2) is not the full length since these are stored in triangular form.
Dear Andre,
thank you for your reply, this is very helpful. This gives me some good ideas on how to circumvent this issue.
best,
-Kirk
To view this discussion on the web visit https://groups.google.com/d/msgid/dirac-users/CADF8p%3D9F6h3HO_JvhivPv2p0KYbRJEa3Pa5EqWzxznuvEkJb_g%40mail.gmail.com.