How to calculate RMSE correctly?

98 views

Skip to first unread message

wan...@vt.edu

unread,

Feb 14, 2019, 4:12:59 PM2/14/19

to aenet

Dear Nong,

I read your "aenet-example-02-TiO2-Chebyshev.tar.bz2" files and want to reproduce the RMSE of your training results. The value should be a few meVs, just like your training results.

        |------------TRAIN-----------| |------------TEST------------|
epoch             MAE          <RMSE>             MAE          <RMSE>
5000    2.100740E-03    2.159856E-03    2.889744E-03    3.691766E-03 <

I put the training data set and testing data set into RMSE calculation at the same time. The answer is around 0.5 eV/atom, instead of 2 meV/atom

I checked your database and the output file of prediction and found that the structure 6909 (structure6909.xsf) have an extremely large error between database and prediction value.

The data base shows:

# total energy = -4807.88755324 eV

CRYSTAL
PRIMVEC
        5.13425009     0.00000000     0.00000000
       -0.00000635     1.71141670     0.00000000
       -2.56715772    -0.85581971     5.82439586
PRIMCOORD
6 1
Ti     -0.00021973     1.28360333     1.45609861    -0.10165657    -0.06452057   191.42475142
Ti      2.56730561    -0.42800640     4.36829639     0.10165657     0.06452057 -191.42475142
O       0.00005421     0.42783562     0.94874634     0.10170568     0.06422849 -184.91322199
O       2.56703153     0.42776127     4.87564871    -0.10170568    -0.06422849   184.91322199
O      -0.00002355    -0.42790750     3.86094412    -0.00083407     0.00177843     0.06886034
O       2.56710919     1.28350434     1.96345093     0.00083407    -0.00177843    -0.06886034

The prediction one shows:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Cartesian atomic coordinates (input) and corresponding atomic forces:

          x             y             z             Fx           Fy            Fz
         (Ang)         (Ang)         (Ang)       (eV/Ang)      (eV/Ang)      (eV/Ang)
--------------------------------------------------------------------------------------
Ti     -0.000220      1.283603      1.456099     -0.000336      0.096083     -0.470577
Ti      2.567306     -0.428006      4.368296      0.000336     -0.096083      0.470576
O       0.000054      0.427836      0.948746      0.000283     -0.007268      0.038711
O       2.567032      0.427761      4.875649     -0.000283      0.007268     -0.038710
O      -0.000024     -0.427908      3.860944      0.000643     -0.032689      2.321371
O       2.567109      1.283504      1.963451     -0.000643      0.032690     -2.321371

Cohesive energy            :           17.15281892 eV
Total energy               :        -4969.12457650 eV
Mean force (must be zero) :     -0.000000     -0.000000     -0.000000
Mean absolute force        :      0.000421      0.045347      0.943553
Maximum force              :      0.000643     -0.032689      2.321371
RMS force                  :      1.368948
The maximum force is acting on atom 5.
All forces are given in eV/Angstrom.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The error is 161.237 eV or 26.873 eV/atom. If I use this value to calculate the RMSE, the result will never be smaller than 0.3 eV/atom.

Do you know why this structure has that kind of large error? The structure isn't on the testing data set, hence, it should be included inside the training data set, right? (Actually, not only structure 6909, but also some other structures have the same situation.)

Best Regards,

Shih-Han Wang

Nongnuch Artrith

unread,

Feb 14, 2019, 5:04:18 PM2/14/19

to wan...@vt.edu, aenet

Dear Shih-Han,

The TiO2 data set contains several structures generated from distortion of the crystal structures that have very high energy and are unphysical. In the input file (train.in) for "train.x" an energy cutoff can be defined using the keyword MAXENERGY 1.0, so that structures above a certain cohesive energy will not be considered. The example input files that we provide set this energy to 1.0 eV/atom. I suspect that the structures for which you see large errors are unphysical high-energy structures for which the DFT energy might not be well converged either.

The number of structures that are ignored is printed in the training output file (see Example below). The line reads: "XXXX high-energy structures will be removed from the scaled training set."

So please check whether the structures with large errors were actually used for training.

You can write out the energies and errors of all training and test set structures with the SAVE_ENERGIES keyword in your "train.in" input file. The generated files will only contain information about the structures that were not ignored.

All the best,

Nong

Example:

----------------------------------------------------------------------

Training set normalization

----------------------------------------------------------------------

The training set will be normalized now. Depending on its size this

process can take a while. The normalized data set will be written to

another file. Load that file in future to avoid this step.

Name of the new training set file: TiO.train.scaled

The network output energy will be normalized to the interval [-1,1].

Energy scaling factor: f = 1.156976

Atomic energy shift : s = 0.135678

1118 high-energy structures will be removed from the scaled training set.

=======================================

Reply all

Reply to author

Forward

0 new messages