Nwchem 7.0.2 built with oneAPI becomes NaN only in paw test

106 views
Skip to first unread message

Hiromawa Watanabe

unread,
Mar 6, 2022, 9:15:13 PM3/6/22
to NWChem Forum
Dear all.

I built nwchem 7.0.2 with oneAPI 2021.2.0 classic fortran (ifort) and icc, intelmpi.
The build settings are the same as when nwchem 6.8.1 was built with the same
development environment and compiler. With this setting nwchem 6.8.1 worked
fine.

When I run the built nwchem 7.0.2 binary with QA test, the result of the paw test
was NaN. This result did not occur with nwchem 6.8.1.

This was part of the Car-Parrinello iteration of the first PAW Car-Parrinello
microcluster calculation after the NWPW PSPW Calculation was completed.

=================
== Energy Calculation ==


          ====== Grassmann conjugate gradient iteration ======
     >>>  ITERATION STARTED AT Thu Mar  3 19:02:13 2022 <<<
    iter.           Energy         DeltaE       DeltaRho
    ------------------------------------------------------
      10   -0.7580454310E+02   -0.22882E-03    0.50057E-05
      20   -0.7580488329E+02   -0.12540E-04    0.37306E-06
      30   -0.7580489723E+02   -0.61032E-06    0.70266E-08
      40   -0.7580489782E+02   -0.68539E-08    0.76379E-10
      50   -0.7580489787E+02   -0.23736E-08    0.24975E-10
      60   -0.7580489787E+02   -0.12541E-09    0.13324E-11
      70   -0.7580489787E+02   -0.98055E-11    0.13075E-12
  *** tolerance ok. iteration terminated
     >>>  ITERATION ENDED   AT Thu Mar  3 19:02:22 2022 <<<
 kinetic (planewave) :   0.9720493653E+01 ( 0.24301E+01/electron)
 coulomb (planewave) :  -0.6139016923E+01 ( -0.15348E+01/electron)
 exc-cor (planewave) :  -0.3695986151E+01 ( -0.92400E+00/electron)
 pseudo  (planewave) :   0.8437316317E+01 ( 0.21093E+01/electron)

 kinetic (loc. basis):   0.6587080488E+02 ( 0.16468E+02/electron)
 coulomb (loc. basis):  -0.2171315234E+03 ( -0.54283E+02/electron)
 exc-cor (loc. basis):  -0.4994334752E+01 ( -0.12486E+01/electron)
 pseudo (loc. basis) :  -0.8173523798E+01 ( -0.20434E+01/electron)

 coulomb (multipole) :   0.8030087234E+02 (    0.26767E+02/ion)

 orbital energies:
    -0.2560386E+00 (  -6.967eV)
    -0.3382326E+00 (  -9.204eV)
    -0.4571583E+00 ( -12.440eV)
    -0.9198444E+00 ( -25.030eV)

 Total PAW energy   :  -0.7580489787E+02

 output psi filename:./paw_test.movecs


== Timing ==

cputime in seconds
  prologue    :   0.213531E+01
  main loop   :   0.924933E+01
  epilogue    :   0.223972E-02
  total       :   0.113869E+02
  cputime/step:   0.255506E-01       (     362 evalulations, 69 linesearches)


Time spent doing                        total          step
  FFTs                       :   0.978234E+00  0.270231E-02
  dot products               :   0.187169E+01  0.517040E-02
  geodesic                   :   0.354140E+01  0.978288E-02
  exchange correlation       :   0.121327E+01  0.335157E-02
  local pseudopotentials     :   0.112692E-03  0.311304E-06
  non-local pseudopotentials :   0.383282E+01  0.105879E-01
  hartree potentials         :   0.607407E-02  0.167792E-04
  structure factors          :   0.454114E+00  0.125446E-02
  masking and packing        :   0.164808E+00  0.455270E-03
  atomic density generation  :   0.492109E-01  0.135942E-03
  atomic xc matrix elements  :   0.444004E+00  0.122653E-02

     >>>  JOB COMPLETED     AT Thu Mar  3 19:02:22 2022 <<<

 Task  times  cpu:       11.4s     wall:       11.4s


                                NWChem Input Module
                                -------------------


 >>>> PAW Parallel Module - Car-Parrinello <<<<
          ****************************************************
          *                                                  *
          *  PAW Car-Parrinello microcluster calculation     *
          *                                                  *
          *     [    extended Lagrangian molecular    ]      *
          *                                                  *
          *     [         dynamics simulation         ]      *
          *     [  Northwest Chemistry implementation ]      *
          *                                                  *
          *            version #1.00   10/01/03              *
          *                                                  *
          *    Authors: Marat Valiev and Eric J. Bylaska     *
          *                                                  *
          *    This code is based upon algorithms and code   *
          *    developed by the group of Prof. John H. Weare *
          *                                                  *
          ****************************************************
~~~~~~~~~~~~
~~~~~~~~~~~~
 supercell:
      lattice:    a1=<  20.000   0.000   0.000 >
                  a2=<   0.000  20.000   0.000 >
                  a3=<   0.000   0.000  20.000 >
      reciprocal: b1=<   0.314   0.000   0.000 >
                  b2=<   0.000   0.314   0.000 >
                  b3=<   0.000   0.000   0.314 >
      volume :     8000.0
      density cutoff= 12.633  fft= 32x 32x 32(     8536 waves 8536 per task)
      wavefnc cutoff= 12.633  fft= 32x 32x 32(     8536 waves 8536 per task)
      smooth compensation (ewald) summation: cut radius=    1.50 and  1

 technical parameters:
      core integration lmax= 0
      time step=      5.00     fictitious mass=    1100.0
      cooling/heatting rates:  0.10000E+01 (psi)   0.10000E+01 (ion)
      initial kinetic energy:  0.00000E+00 (psi)   0.00000E+00 (ion)
                                                   0.00000E+00 (c.o.m.)
      after scaling:           0.00000E+00 (psi)   0.00000E+00 (ion)
      increased energy:        0.00000E+00 (psi)   0.00000E+00 (ion)


 Constant Energy Simulation



          ============ Car-Parrinello iteration ==============
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
     >>>  ITERATION STARTED AT Thu Mar  3 19:02:22 2022 <<<
    iter.         KE+Energy             Energy        KE_psi KE_ion   Temperature
------------------------------------------------------------------------------------
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
 Warning: Lagrange Multiplier generation failed.
         +Try using a smaller time step
         +Gram-Schmidt being performed, spin: 1
      10                NaN                NaN NaN NaN NaN
==================

I built and run it again with check all and fpe0 options.
Despite the option fpe0 to stop at the exception of the floating point calculation,
the paw test remained NaN and the calculation was completed to the end.
So, this time with standard output, when NaN was displayed, I tried to forcibly
terminate it, and the result was "Numerical result out of range".
Such NaN results have only been found in paw.
Since it did not stop at fpe0, there is a possibility of integer overflow.

So, I tried to modify the source of paw of nwchem 7.0.2 to dgemm and dcopy
like nwchem 6.8.1 by modifying the part corresponding to lib64to32
(ygemm, ycopy, etc.), but the result of paw was NaN.

I thought it was a problem with blas, so I tried to build nwchem 7.0.2 using mkl blas
in the environment of gfortran, gcc, intelmpi.
This binary did not result in NaN on the paw and passed the test.

From these things, it seems that the cause of NaN in this paw is the difference
between gfortran and ifort.
Also, even if I modify ycopy or ygemm of the paw source to dcopy or dgemm,
nwchem works and the result of the paw is NaN, so I think it is hard to think
that it is a problem with the paw source.

The result obtained with the debug option is "Numerical result out of range",
so it is expected to be resolved by modifying the type declaration somewhere
in the source.

Could you please give me a way or suggestion to solve this problem ?

Best regards

Edoardo Aprà

unread,
Mar 6, 2022, 9:18:33 PM3/6/22
to NWChem Forum
Thanks for reporting this installation problem. When you compilation with ifort, are you using MKL for BLASOPT and LAPACK_LIB?

Edoardo Aprà

unread,
Mar 7, 2022, 6:32:17 PM3/7/22
to NWChem Forum
Sorry for missing that you seem to have already reported that you are using ifort in conjunction with MKL.

I have tried to reproduce your PAW failure with the latest OpenAPI ifort, but I was not able to.
Instead, if I use the new Intel ifx compiler, this same failure is reproduced.
What is the output of the command
ifort -V

Hiromawa Watanabe

unread,
Mar 7, 2022, 11:17:10 PM3/7/22
to NWChem Forum
 Hello, Mr. Edoardo Aprà.

As mentioned in the first post, it is the result of using "ifort".
The version is also listed, but just in case, the result of ift -v is listed.
$ ifort -v
ifort version 2021.2.0

It means that you used the latest version and the problem of paw did
not reproduce, so I would appreciate it if you could tell me the version
of oneAPI ifort used.

When I built nwchem 6.8.1 with ifort 2021.2.0, I didn't have any
issues with paw, so I thought it wasn't a version issue. If the version
of ifort in oneAPI you used is different from the one I used,
it may be a problem specific to ifort 2021.2.0.

Since it is said that it was reproduced in ifx, I think that it may be a
problem that is occurring as a result of intel changing various
specifications between ifort and ifx regarding consistency etc.

Best regards,   

2022年3月8日火曜日 8:32:17 UTC+9 Edoardo Aprà:

Edoardo Aprà

unread,
Mar 7, 2022, 11:31:27 PM3/7/22
to NWChem Forum
You might want to update your Intel software installation.
The version I am using is the following
$ ifort -v
ifort version 2021.5.0
$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000

Edoardo Aprà

unread,
Mar 7, 2022, 11:51:46 PM3/7/22
to NWChem Forum
By the way, did you compile with with USE_OPENMP=y?
If you did  not, could recompile with USE_OPENMP=y?

Hiromawa Watanabe

unread,
Mar 9, 2022, 7:24:06 PM3/9/22
to NWChem Forum
 Hello, Mr. Edoardo Aprà.

 I tried setting USE_OPENP=y, but there was no chenge in paw becomming NaN.
   
I tried using ifor 2021.5.0 of oneAPI 2022.1.0, but there was no change in the situation where NaN appears in paw.

My development environment is CentOS 8.3, but it may be an OS problem.

If there are multiple NaN results in other QA tests as well, I think there is something wrong with the settings for the build, but only paw is specifically NaN, and other QA's The result seems to be a reasonable result, so I think it's okay.

When calculating paw, the binary created by gfortran mkl can perform normal calculation, so in that case, we will use this binary.

Thank you very much for your advice.   

2022年3月8日火曜日 13:51:46 UTC+9 Edoardo Aprà:

Daniel Mejia Rodriguez

unread,
Mar 10, 2022, 1:01:05 PM3/10/22
to NWChem Forum
Could you try a compilation setting I_MPI_F90=ifort ?
Reply all
Reply to author
Forward
Message has been deleted
0 new messages