I want to convince our users to move their long single-node calculations away from the fat nodes, and take advantage of the massively parallel algorithm in NWChem for large coupled cluster calculations.
I tried to reproduce the single point energy at the CCSDTQ/cc-pvdz-dk (with relativity dkn correction on) level that was completed with molpro/mrcc on a large memory node in 2 days (30 CCSDTQ iterations) , but I got stuck with the slow convergency in the CCSDTQ iterations in NWChem....
I have tried the different combinations of lshift (from 0.0 to 0.5) and diis (from 3 to 10), but it still converges terribly slow...(it seems that residuum is actually oscillating), e.g.
--------------------------------------------------------
Iter Residuum Correlation Cpu Wall
--------------------------------------------------------
1 0.4078080842436 -0.2077444459379 587.8 590.6
2 0.2161810372879 -0.2236057646351 593.9 596.7
3 0.1217385737596 -0.2361079044937 603.0 605.8
4 0.1279490652504 -0.2421982778198 596.4 599.2
5 0.1768463623714 -0.2469098419390 600.6 603.7
MICROCYCLE DIIS UPDATE: 5 5
6 0.1155378062632 -0.2398791028023 602.6 605.6
7 0.1117641498634 -0.2422275518849 602.1 605.1
8 0.1452811537413 -0.2448604640113 604.1 607.1
9 0.1894266285691 -0.2474261453493 603.4 606.5
10 0.3043459947650 -0.2499897661747 601.3 604.5
MICROCYCLE DIIS UPDATE: 10 5
...
MICROCYCLE DIIS UPDATE: 95 5
96 0.0617752294466 -0.2434571559652 599.5 603.3
97 0.0670127038516 -0.2448148161798 601.5 605.4
98 0.0916494443188 -0.2461383962905 599.1 602.9
99 0.2519738846531 -0.2474458666793 599.1 603.0
100 0.9762315407321 -0.2487533281784 599.0 603.6
MICROCYCLE DIIS UPDATE: 100 5
101 0.0607249831510 -0.2433079170802 600.6 604.5
102 0.0617749671057 -0.2446559917318 596.0 600.9
103 0.0677555019491 -0.2459724207197 601.7 606.1
104 0.0912653247904 -0.2472739286554 601.5 605.8
105 0.1726277513197 -0.2485688365507 601.7 605.6
MICROCYCLE DIIS UPDATE: 105 5
106 0.0615252063514 -0.2437998755818 601.0 605.0
107 0.0672283316319 -0.2450936347286 600.5 604.4
108 0.1008097492238 -0.2463848367663 598.8 602.8
...
Unless there is a way to speed up the convergence for CC iterations, it shows no advantage to use NWChem on a large computer cluster over molpro/mrcc on a fat-node.
Besides lshift and diis, is there any other trick to speed up a slow converging/oscillating CC iteration? Please advise!
Thanks!
~Dominic