Speeding up force predictions

51 views
Skip to first unread message

Michael Chen

unread,
Jan 25, 2019, 5:21:23 PM1/25/19
to aenet
Dear all,

I have been using aenet for a bit now, and everything has worked pretty well for me. However for some reason, force predictions in aenet have been anomalously slow for my specific setup. For instance, the gprof output for predicting forces for a given system of 192 atoms using a single core was as follows:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 90.14     20.84    20.84      192     0.11     0.12  aenet_atomic_energy_and_forces
  2.90     21.51     0.67  8995437     0.00     0.00  __symmfunc_MOD_sf_f1_ijk
  2.79     22.16     0.65  1050711     0.00     0.00  __symmfunc_MOD_sf_g4_update
  1.12     22.42     0.26 27269479     0.00     0.00  __symmfunc_MOD_sf_cut
  0.80     22.60     0.19 26986311     0.00     0.00  __symmfunc_MOD_sf_f2_ij

I though this was weird given that the function aenet_atomic_energy_and_forces() is a top-level function that delegates the bulk of the computations to lower-level functions yet it seems to be hanging there seemingly unnecessarily. After stepping through the code I found that if I just commented out the following bolded line, which happens to be line 455 in aenet.f90 for aenet v2.0.3

    sfval(:) = 0.0d0
    sfderiv_i(:,:) = 0.0d0
    sfderiv_j(:,:,:) = 0.0d0

    nsf = aenet_pot(type_i)%stp%nsf
    call stp_eval(type_i, coo_i, n_j, coo_j, type_j, &
                  aenet_pot(type_i)%stp, sfval=sfval, &
                  sfderiv_i=sfderiv_i, sfderiv_j=sfderiv_j, scaled=.true.)

I get a ~10 fold speed up for predicting the same frame and the program no longer bottlenecks in aenet_atomic_energy_and_forces() when it is zeroing sfderiv_j. From my understanding, if we follow the call stack down we find that the relevant slices of the array are zeroed out before any derivative evaluations are carried out. Therefore commenting out the above line does not effect the final results and I have checked this numerically for individual frames.

By the way, the executable I used for the predictions was compiled with Makefile.gfortran_openblas_mpi using gcc v6.3.0, openblas v0.2.18, and openmpi v2.0.1. Also I have been running predictions on an Intel Xeon CPU E5-2670.

This might be an issue only specific to my setup, but in case this is more general I hope this helps.

Best,
Michael

--
Michael S. Chen
Ph.D. Student
Department of Chemistry
Stanford University

Nongnuch Artrith

unread,
Jan 25, 2019, 6:10:46 PM1/25/19
to Michael Chen, aenet
Hi Michael,

Thank you (and your group) for this contribution.  I think you are right and it is safe to remove this line from the source.  
It is great if this resulted in a speed up!!

We will make sure that it does not break anything and will then commit a bug fix.

Thanks again.

Cheers,
Nong
Reply all
Reply to author
Forward
0 new messages