GA Memory Issues

400 views
Skip to first unread message

Richard Messerly

unread,
Oct 11, 2021, 7:26:50 PM10/11/21
to NWChem Forum
I am a new user of NWChem. I have gone through the tutorials and I have successfully run some very basic calculations.

I am now trying to run some high-level calculations using CCSD(T) for relatively small systems, i.e., H2, H2+, CO2, CO2+.

I have attached an input file for CO2+ (CO2+.nw) that failed with the following error:

available GA memory             6418886632  bytes
 ------------------------------------------------------------------------
createfile: failed ga_create size/nproc bytes          8722793512

I have tried changing the values on the second line starting with "memory," but I have just been shooting blindly. For example, I tried dramatically increasing global to 140000 mb. The job did not crash due to memory but it was incredibly slow (did not finish during the 2 hours I had the node allocated).

Could you provide some advice for the best practice on how to optimize the memory  allocation? Is there a recommended systematic approach to set these memory parameters?

Alternatively, would you recommend I use something other than the GA method?

Thank you
CO2+.nw.rtf

Edoardo Aprà

unread,
Oct 14, 2021, 1:58:03 PM10/14/21
to NWChem Forum
You might to have a look at the documentation at the URLhttps://nwchemgit.github.io/TCE.html#memory-considerations

For your specific input, setting the value of tilesize to 8 will get the global memory requirement under the 7GB value you set in the memory input line

tce
  tilesize 8
  ccsdt
  freeze core
end

Richard Messerly

unread,
Oct 15, 2021, 6:18:53 PM10/15/21
to NWChem Forum
Edoardo,

Thank you for getting back to me so quickly.

I tried to simply add the tilesize 8 into my input file, but the calculation still crashed due to memory issues. Do you have any other recommendations?

If you were trying to systematically adjust certain memory parameters, where would you start? Even after reading the documentation, it is unclear to me what the various types of memory are referring to (memory stack 1300 mb heap 200 mb global 7000 mb) and which ones I should change.

Thank you again

Richard

Edoardo Aprà

unread,
Oct 15, 2021, 6:19:52 PM10/15/21
to NWChem Forum
Could you post the full output file and indicate details on how the calculation was run (how many node, processors/node, etc... )

jeff.science

unread,
Oct 17, 2021, 1:08:21 PM10/17/21
to NWChem Forum
You say CCSD(T) here but your input file says CCSDT.  These are different methods.  If you want the popular methods CCSD(T), you need to write "ccsd(t)" not "ccsdt".  If you want full iterative triples, these are expensive in memory and time.

The "global" memory setting is not binding in most cases.  It is allocated dynamically.  You should be able to set it to an arbitrarily large amount without causing any issues, unless NWChem actually tries to use it, in which case the system will swap and run incredibly slowly.

What matters more for TCE is having a stack that is at least 2000 mb, although more than 4000 mb is probably bad, because of INT_MAX issues I don't want to get into here.

For smaller molecules, the two-electron integrals are still larger in GA storage than the amplitudes, but unfortunately, CCSDT with "2eorb" was broken the last time I checked, so there is potentially a benefit to using a version of NWChem from the past, prior to this problem.  It has been too long since I ran TCE regularly but in theory I can figure it out.

Jeff

jeff.science

unread,
Oct 17, 2021, 1:26:08 PM10/17/21
to NWChem Forum
CCSDT with 2eorb was fixed and works in the latest version I have on my machine.  Sorry for that error.

Jeff

jeff.science

unread,
Oct 19, 2021, 5:01:57 AM10/19/21
to NWChem Forum
I ran the calculation on my workstation.  It appears to require about 40 GB of memory, but I think 256 GB is a safe bet.

Details attached.  This is with my development build of NWChem but it has no modifications to TCE.

If you want me to run it again with tighter convergence, that's fine, because it only takes an hour or two on this machine.

Geometry optimization:

% grep @ CO2.n64.log
@ Step Energy Delta E Gmax Grms Xrms Xmax Walltime
@ ---- ---------------- -------- -------- -------- -------- -------- --------
@ 0 -187.80659048 0.0D+00 0.05576 0.03392 0.00000 0.00000 22.9
@ 1 -187.80950981 -2.9D-03 0.03028 0.01675 0.04574 0.07699 38.5
@ 2 -187.81134489 -1.8D-03 0.01171 0.00645 0.02940 0.04683 55.4
@ 3 -187.81166674 -3.2D-04 0.00153 0.00086 0.01370 0.02454 69.7
@ 4 -187.81166756 -8.2D-07 0.00210 0.00117 0.00098 0.00173 85.4
@ 5 -187.81167550 -7.9D-06 0.00003 0.00002 0.00227 0.00394 100.3
@ 6 -187.81167551 -9.1D-09 0.00000 0.00000 0.00003 0.00005 112.9
@ 6 -187.81167551 -9.1D-09 0.00000 0.00000 0.00003 0.00005 112.9

 CCSDT iterations
 --------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall
 --------------------------------------------------------
    1   1.0109675703796  -0.5595023305584   126.9   130.5
    2   0.1956114978149  -0.5555834215776   127.1   130.4
    3   0.1989046629723  -0.6008486150325   127.2   130.6
    4   0.0773216707096  -0.5962863425751   126.5   130.1
    5   0.0816824219832  -0.6079690176364   126.9   130.5
    6   0.0392307027981  -0.6065663480304   126.4   130.0
    7   0.0382017032422  -0.6110373701980   127.1   130.9
    8   0.0205546943449  -0.6106169255764   126.2   130.0
 MICROCYCLE DIIS UPDATE:                        8                        8
    9   0.0034014998602  -0.6142898387732   126.9   131.0
   10   0.0020810099544  -0.6141957778534   126.5   130.7
   11   0.0014072571802  -0.6142255702191   125.0   129.3
   12   0.0010204877439  -0.6142317715974   126.4   130.9
   13   0.0007396959558  -0.6142440689994   126.0   130.2
   14   0.0005494392752  -0.6142485465316   124.9   129.1
   15   0.0004084251508  -0.6142553832939   127.6   131.8
   16   0.0003066810730  -0.6142588946961   126.4   130.4
 MICROCYCLE DIIS UPDATE:                       16                        8
   17   0.0000280462630  -0.6142758148800   125.9   130.6
   18   0.0000143543615  -0.6142755260211   126.9   131.6
   19   0.0000105753695  -0.6142763582171   124.6   129.4
   20   0.0000074534403  -0.6142763175894   125.8   130.4
   21   0.0000057001145  -0.6142765333313   126.9   131.7
   22   0.0000042315300  -0.6142765288477   126.1   131.0
   23   0.0000032520881  -0.6142766088577   127.9   132.8
   24   0.0000024612888  -0.6142766101260   126.4   131.3
 MICROCYCLE DIIS UPDATE:                       24                        8
   25   0.0000001940430  -0.6142766934307   126.0   131.3
 --------------------------------------------------------
 Iterations converged
 CCSDT correlation energy / hartree =        -0.614276693430692
 CCSDT total energy / hartree       =      -187.866443933534299

CO2.n64.log.5
CO2.n64.log

Richard Messerly

unread,
Oct 21, 2021, 2:11:23 PM10/21/21
to NWChem Forum
Edoardo,

I have attached the output file for a failed calculation.

System details are provided below:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                36
On-line CPU(s) list:   0-35
Thread(s) per core:    1
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz
Stepping:              1
CPU MHz:               2101.000
CPU max MHz:           2101.0000
CPU min MHz:           1200.0000
BogoMIPS:              4190.21
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-17
NUMA node1 CPU(s):     18-35

Thanks again

Richard
CO2+.out

Richard Messerly

unread,
Oct 21, 2021, 2:11:23 PM10/21/21
to NWChem Forum
Jeff,

Good catch. Yes, I originally wanted to run CCSD(T) because it is cheaper, and often more accurate. However, from what I could tell, CCSD(T) does not work for a doublet system. Is that not true?

Here is the error I ran into when I originally tried running CCSD (without (T)) for a doublet system (see attached "input1.nw" file; note this is for Ar+ instead of CO2+):

ccsd: nopen is not zero     1

I then tried modifying this input file by using unrestricted CCSD(T) (see "input2.nw") but received the following error:

task_energy : unknown theory
task uccsd(t)

If it is possible to get CCSD(T) working, that would be ideal for me. Plus this might alleviate the aforementioned memory issues.

Thanks again

Richard
input1.nw
input2.nw

Jeff Hammond

unread,
Oct 21, 2021, 2:23:49 PM10/21/21
to nwchem...@googlegroups.com
TCE CCSD(T) supports open shell references. ROHF allows 2eorb, which is the most important performance feature in TCE.

Please attach full input and output to help us debug.

There are theoretical questions about open shell CCSD(T). I’m mostly out of the theoretical chemistry at this point and don’t remember the details. This is something you’d want to ask John Stanton, Karol Kowalski or Anna Krylov about at an ACS meeting. 

CCSDT and other iterative variants are a lot better at open shell than perturbation methods. You can also try EOM-IP methods. I think TCE supports them but I’ve never used them. 

Jeff 

Sent from my iPhone

On Oct 21, 2021, at 21:11, Richard Messerly <r.alma....@gmail.com> wrote:


--
You received this message because you are subscribed to a topic in the Google Groups "NWChem Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nwchem-forum/sUBf6E0mcVM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nwchem-forum...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nwchem-forum/f12cc258-1682-4bab-9f62-a4ab958481cbn%40googlegroups.com.
<CO2+.out>

Edoardo Aprà

unread,
Oct 21, 2021, 3:26:59 PM10/21/21
to NWChem Forum
One suggestion after browsing the output file. You are using a pretty old version of  NWChem (6.5). I strongly encourage you to use the latest 7.0.2 version. It is possible you are stumbling into bugs present in version 6.5 that were later fixed.
Second suggestion is to use more than one processor in your calculations.

Here is an input for doublet ROHF CCSD(T) file that does work with version 7.0.2

memory stack 100 mb heap 200 mb global 3000 mb
charge 1
geometry noautosym
   C       -1.83239        1.19149       -0.00000
   O       -0.90491        1.94819       -0.00000
   O       -2.75986        0.43478       -0.00000
end

basis spherical
  c library cc-pvtz
  o library cc-pvtz
end

scf
  direct
  doublet
  rohf
end
tce
  ccsd(t)
  freeze atomic
end

task tce

Richard Messerly

unread,
Oct 22, 2021, 11:59:52 AM10/22/21
to NWChem Forum
Edoardo,

Yes, unfortunately, I am obligated to use NWChem 6.5 because that is the version that is compatible with the software VENUS. I understand that this version is no longer supported, but that is frankly my only option.

Thank you for providing me with this example. I will test it out on version 6.5 and see if it works as well.

Richard

Richard Messerly

unread,
Oct 22, 2021, 11:59:52 AM10/22/21
to NWChem Forum
Jeff,

OK, I will see if I can get this working for TCE CCSD(T). In the future I will make sure to attach both input and output files.

Fortunately, CCSD(T) is not our end goal. We are just trying to use CCSD(T) to benchmark different DFT methods so that we can then choose the best DFT for running our bimolecular-reaction trajectory simulations. If CCSD(T) does not work for open-shell, we might need to find a different gold standard for comparison.

Richard
Reply all
Reply to author
Forward
0 new messages