Memory Issues in TDDFT Calculations

205 views
Skip to first unread message

Andrew Salij

unread,
Feb 11, 2021, 3:56:18 PM2/11/21
to NWChem Forum
I've been running into some issues that appear to be memory related when scaling up calculations (either in basis set or in size of molecule). For example, see below for a code which works for either 6-31G or a small molecule but crashes in this instance.

Thank you in advance! 

Relevant memory statistics from failed job are:
ARMCI_DEFAULT_SHMMAX=4096
State: COMPLETED (exit code 0) #i.e., job completed, no crash on cluster end
Nodes: 1
Cores per node: 24 #these have in principle 3.25+ GB memory each, 3200 mb allocated in input file and full memory requested in cluster
CPU Utilized: 6-04:11:30
CPU Efficiency: 99.97% of 6-04:14:00 core-walltime #perhaps more CPUs are needed?
Job Wall-clock time: 06:10:35
Memory Utilized: 15.34 GB
Memory Efficiency: 13.14% of 116.75 GB #this would make it seem that there is still sufficient memory...

Input as:
title <title>
memory 3200 mb
start meijer_tzp 

echo

charge 0 

geometry autosym units angstrom
# Cartesian Coordinates in the form xyz
N         -0.08822        0.40497       -0.14368
C          0.12771       -0.86129        0.21482
N          1.30650       -1.46277        0.31242
C          2.34353       -0.69557        0.02553
N          2.24981        0.60500       -0.21731
C          1.00736        1.14670       -0.26625
N         -2.04180        0.29512       -0.24322
C         -2.20457       -0.95672        0.34172
N         -1.00271       -1.56431        0.54734
C          0.93141        2.62350       -0.31348
C         -0.24446        3.38699       -0.28644
C         -0.20690        4.80036       -0.29061
C          1.00158        5.51215       -0.30951
C          2.17551        4.76421       -0.31663
C          2.13710        3.36271       -0.32335
C          0.97991        6.99295       -0.34677
C          2.06097        7.79792       -0.35524
C          2.09550        9.28552       -0.44999
C          3.32353        9.95982       -0.64250
C          3.35012       11.36113       -0.73983
C          2.18245       12.13336       -0.62449
C          0.95700       11.46108       -0.42402
C          0.92792       10.05885       -0.33586
O         -0.16262       12.25185       -0.31298
C         -1.42349       11.61798       -0.15193
O          4.45126        9.17540       -0.73597
C          5.71555        9.82892       -0.77237
C          2.22039       13.61832       -0.69253
C          3.30780       14.40901       -0.73583
C          3.23754       15.88045       -0.70944
C          2.38241       16.55848        0.16348
C          2.36943       17.96237        0.24140
C          3.24939       18.71981       -0.57005
C          4.12444       18.03314       -1.44346
C          4.10721       16.62879       -1.49828
O          1.53926       18.63061        1.10595
C          0.60059       17.88635        1.87126
O          3.34810       20.08660       -0.58322
O          5.06008       18.54776       -2.30890
C          5.26750       19.95018       -2.37177
C          2.47963       20.89554        0.19208
C         -3.11047        0.99750       -0.95175
C         -4.22727        0.09374       -1.49168
C         -5.11481        0.76489       -2.53664
C         -6.24306       -0.17072       -2.95689
O         -3.25702       -1.51085        0.59748
H         -1.09609        0.63633       -0.23869
H         -0.95482       -2.52036        0.87743
N          3.57928       -1.27055       -0.02010
H         -1.23158        2.94374       -0.24779
H         -1.15059        5.34685       -0.27529
H          3.15291        5.24139       -0.31944
H          3.10450        2.85561       -0.32385
H         -0.01470        7.42501       -0.39782
H          3.05242        7.35680       -0.31090
H          4.29183       11.86774       -0.91149
H         -0.01547        9.55731       -0.16514
H         -2.18820       12.39976       -0.11653
H         -1.46749       11.06731        0.79322
H         -1.65259       10.96797       -1.00277
H          6.49099        9.05719       -0.74700
H          5.85952       10.47042        0.10369
H          5.83834       10.39180       -1.70328
H          1.24118       14.09473       -0.68600
H          4.31583       14.00972       -0.76847
H          1.73994       15.96255        0.80347
H          4.79522       16.12722       -2.17671
H          0.00641       18.59787        2.45265
H          1.10780       17.22347        2.57972
H         -0.08697       17.33080        1.22505
H          6.06439       20.13228       -3.10012
H          5.61529       20.34650       -1.41261
H          4.37613       20.46868       -2.73828
H          2.71897       21.94031       -0.03147
H          2.65376       20.75297        1.26296
H          1.43261       20.74169       -0.08642
H         -2.62422        1.51609       -1.78555
H         -3.51820        1.75471       -0.27492
H         -4.86237       -0.22604       -0.65665
H         -3.79296       -0.81115       -1.93552
H         -4.51879        1.02827       -3.41753
H         -5.54119        1.69043       -2.13524
H         -5.84730       -1.10936       -3.35978
H         -6.86294        0.29526       -3.72851
H         -6.88738       -0.41399       -2.10507
H          3.66758       -2.17516        0.42216
H          4.36632       -0.63882        0.03687
end

ecce_print ecce.out

basis spherical
# For TZP basis, change "STO-3G" to "Def2-TZVP"
# If using the same basis set for all elements, you can use the wildcard "*" instead of the element name 
  * library "Def2-TZVP"
END

driver 
  default
end


dft
  mult 1    # spin multiplicity. 1=singlet, 3=triplet
  XC xcamb88 1.00 lyp 0.81 vwn_5 0.19 hfexch 1.00
  cam 0.33 cam_alpha 0.19 cam_beta 0.46
  direct
  mulliken  # Method to calculate charges
end
tddft
  nroots 12
  cis
  civecs
end
task tddft energy

Error shown as :
 Entering Davidson iterations
  Restricted singlet excited states

  Iter   NTrls   NConv    DeltaV     DeltaE      Time   
  ----  ------  ------  ---------  ---------  --------- 
0:CreateSharedRegion:kr_malloc failed KB=: 269982
(rank:0 hostname:qnode5232 pid:17990):ARMCI DASSERT fail. ../../ga-5-4/armci/src/memory/shmem.c:Create_Shared_Region():1209 cond:0
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0 
with errorcode 269982.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
mpirun -machinefile nwch.machines nwchem nwch.nw
----- End of NWChem execution -----

Edoardo Aprà

unread,
Feb 11, 2021, 4:37:21 PM2/11/21
to NWChem Forum


Could you provide details about the NWChem version you are using?

Andrew Salij

unread,
Feb 11, 2021, 4:41:01 PM2/11/21
to NWChem Forum
Of course, I was so focused on providing memory information that I forgot to include the NWChem version (6.6.1). 

Edoardo Aprà

unread,
Feb 11, 2021, 4:43:07 PM2/11/21
to NWChem Forum
My suggestion for avoiding the memory error reported is to install version 7.0.2 with ARMCI_NETWORK=MPI-pR

Andrew Salij

unread,
Feb 11, 2021, 7:37:58 PM2/11/21
to NWChem Forum
Hmmm, I'll look into installation of updated software onto the cluster that is being used. Thanks.

Andrew Salij

unread,
Mar 1, 2021, 12:15:40 PM3/1/21
to NWChem Forum
Having installed 7.0.2, I'm now finding that the calculation runs for a few hours before producing the following error :

 Grid_pts file          = ./meijer_tzp.gridpts.00
 Record size in doubles =  12289        No. of grid_pts per rec  =   3070
 Max. records in memory =     28        Max. recs in file   =  42218779

  add input line set grid:eaf_size_in_dbl               6304257
 grpwrite: insuff eafsize              6291456
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
  current input line :
     0:
 ------------------------------------------------------------------------

Edoardo Aprà

unread,
Mar 1, 2021, 12:19:13 PM3/1/21
to NWChem Forum
The following line in the output file
add input line set grid:eaf_size_in_dbl               6304257
means that you need to insert the following line in the input file
 set grid:eaf_size_in_dbl               6304257
Since the value of 6304257 might be  a conservative estimate, you might want increase it a bit (say to 8304257)
In other words, the last two lines of you tddft input file should be

set grid:eaf_size_in_dbl               8304257
task tddft energy

Andrew Salij

unread,
Mar 3, 2021, 9:55:32 PM3/3/21
to NWChem Forum

This appears to have remedied the issue. Thank you. 
Reply all
Reply to author
Forward
0 new messages