Dear all,
I have been using aenet for a bit now, and everything has worked pretty well for me. However for some reason, force predictions in aenet have been anomalously slow for my specific setup. For instance, the gprof output for predicting forces for a given system of 192 atoms using a single core was as follows:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
90.14 20.84 20.84 192 0.11 0.12 aenet_atomic_energy_and_forces
2.90 21.51 0.67 8995437 0.00 0.00 __symmfunc_MOD_sf_f1_ijk
2.79 22.16 0.65 1050711 0.00 0.00 __symmfunc_MOD_sf_g4_update
1.12 22.42 0.26 27269479 0.00 0.00 __symmfunc_MOD_sf_cut
0.80 22.60 0.19 26986311 0.00 0.00 __symmfunc_MOD_sf_f2_ij
I though this was weird given that the function aenet_atomic_energy_and_forces() is a top-level function that delegates the bulk of the computations to lower-level functions yet it seems to be hanging there seemingly unnecessarily. After stepping through the code I found that if I just commented out the following bolded line, which happens to be line 455 in aenet.f90 for aenet v2.0.3
nsf = aenet_pot(type_i)%stp%nsf
call stp_eval(type_i, coo_i, n_j, coo_j, type_j, &
aenet_pot(type_i)%stp, sfval=sfval, &
sfderiv_i=sfderiv_i, sfderiv_j=sfderiv_j, scaled=.true.)
I get a ~10 fold speed up for predicting the same frame and the program no longer bottlenecks in aenet_atomic_energy_and_forces() when it is zeroing sfderiv_j. From my understanding, if we follow the call stack down we find that the relevant slices of the array are zeroed out before any derivative evaluations are carried out. Therefore commenting out the above line does not effect the final results and I have checked this numerically for individual frames.
By the way, the executable I used for the predictions was compiled with Makefile.gfortran_openblas_mpi using gcc v6.3.0, openblas v0.2.18, and openmpi v2.0.1. Also I have been running predictions on an Intel Xeon CPU E5-2670.
This might be an issue only specific to my setup, but in case this is more general I hope this helps.
Best,
Michael
--
Michael S. Chen
Ph.D. Student
Department of Chemistry
Stanford University