How to calculate RMSE correctly?

98 views
Skip to first unread message

wan...@vt.edu

unread,
Feb 14, 2019, 4:12:59 PM2/14/19
to aenet
Dear Nong,

I read your "aenet-example-02-TiO2-Chebyshev.tar.bz2" files and want to reproduce the RMSE of your training results. The value should be a few meVs, just like your training results.

        |------------TRAIN-----------|  |------------TEST------------|
 epoch             MAE          <RMSE>             MAE          <RMSE>
  5000    2.100740E-03    2.159856E-03    2.889744E-03    3.691766E-03 <

I put the training data set and testing data set into RMSE calculation at the same time. The answer is around 0.5 eV/atom, instead of 2 meV/atom

I checked your database and the output file of prediction and found that the structure 6909 (structure6909.xsf) have an extremely large error between database and prediction value.



The data base shows:

# total energy = -4807.88755324 eV

CRYSTAL
PRIMVEC
        5.13425009     0.00000000     0.00000000
       -0.00000635     1.71141670     0.00000000
       -2.56715772    -0.85581971     5.82439586
PRIMCOORD
6 1
Ti     -0.00021973     1.28360333     1.45609861    -0.10165657    -0.06452057   191.42475142
Ti      2.56730561    -0.42800640     4.36829639     0.10165657     0.06452057  -191.42475142
O       0.00005421     0.42783562     0.94874634     0.10170568     0.06422849  -184.91322199
O       2.56703153     0.42776127     4.87564871    -0.10170568    -0.06422849   184.91322199
O      -0.00002355    -0.42790750     3.86094412    -0.00083407     0.00177843     0.06886034
O       2.56710919     1.28350434     1.96345093     0.00083407    -0.00177843    -0.06886034



The prediction one shows:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Cartesian atomic coordinates (input) and corresponding atomic forces:

          x             y             z             Fx           Fy            Fz     
         (Ang)         (Ang)         (Ang)       (eV/Ang)      (eV/Ang)      (eV/Ang) 
 --------------------------------------------------------------------------------------
 Ti     -0.000220      1.283603      1.456099     -0.000336      0.096083     -0.470577
 Ti      2.567306     -0.428006      4.368296      0.000336     -0.096083      0.470576
 O       0.000054      0.427836      0.948746      0.000283     -0.007268      0.038711
 O       2.567032      0.427761      4.875649     -0.000283      0.007268     -0.038710
 O      -0.000024     -0.427908      3.860944      0.000643     -0.032689      2.321371
 O       2.567109      1.283504      1.963451     -0.000643      0.032690     -2.321371

 Cohesive energy            :           17.15281892 eV
 Total energy               :        -4969.12457650 eV
 Mean force (must be zero)  :     -0.000000     -0.000000     -0.000000
 Mean absolute force        :      0.000421      0.045347      0.943553
 Maximum force              :      0.000643     -0.032689      2.321371
 RMS force                  :      1.368948
 The maximum force is acting on atom 5.
 All forces are given in eV/Angstrom.

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The error is 161.237 eV or 26.873 eV/atom. If I use this value to calculate the RMSE, the result will never be smaller than 0.3 eV/atom.

Do you know why this structure has that kind of large error? The structure isn't on the testing data set, hence, it should be included inside the training data set, right? (Actually, not only structure 6909, but also some other structures have the same situation.)

Best Regards,
Shih-Han Wang

Nongnuch Artrith

unread,
Feb 14, 2019, 5:04:18 PM2/14/19
to wan...@vt.edu, aenet
Dear Shih-Han,

The TiO2 data set contains several structures generated from distortion of the crystal structures that have very high energy and are unphysical.  In the input file (train.in) for "train.x" an energy cutoff can be defined using the keyword MAXENERGY 1.0, so that structures above a certain cohesive energy will not be considered.  The example input files that we provide set this energy to 1.0 eV/atom.  I suspect that the structures for which you see large errors are unphysical high-energy structures for which the DFT energy might not be well converged either.  

The number of structures that are ignored is printed in the training output file (see Example below).  The line reads: "XXXX high-energy structures will be removed from the scaled training set.

So please check whether the structures with large errors were actually used for training.

You can write out the energies and errors of all training and test set structures with the SAVE_ENERGIES keyword in your "train.in" input file.  The generated files will only contain information about the structures that were not ignored.

All the best,
Nong

Example:
----------------------------------------------------------------------
                      Training set normalization
----------------------------------------------------------------------
The training set will be normalized now.  Depending on its size this
process can take a while.  The normalized data set will be written to
another file. Load that file in future to avoid this step.

Name of the new training set file: TiO.train.scaled

The network output energy will be normalized to the interval [-1,1].
  Energy scaling factor: f = 1.156976
  Atomic energy shift  : s = 0.135678

1118 high-energy structures will be removed from the scaled training set.
=======================================

Reply all
Reply to author
Forward
0 new messages