Hi,
This is something that I would like to discuss to decide if this is a problem for us, or at least what we can do to let potential users know about it.
If the same top/crd file is loaded by sander or Sire and a single-point energy is calculated, the results will differ. Specifically
For bond, angles, dihedrals, the agreement on all the tests I have done is on the 4th decimal or 3rd decimal (dihedrals) for large proteins. I consider this good enough.
The Coulombic and Lennard-Jones energies can differ a lot more. The larger the system, the more variability there is.
To give a specific example, the non-bonded energies for a protein-ligand complex Sushil is working on are listed below:
Sander
Coulombic: -2380.7300 kcal/mol
LJ : -400.4570 kcal/mol
Sire
Coulombic: -2345.3969 kcal/mol
LJ : -335.322 kcal/mol
I believe the error is due to different rounding of the CLJ parameters in the top file by sander and Sire.
In Sire the charge of an atom is set with this line of code in SireIO/amber.cpp near line 508
SireUnits::Dimension::Charge chg = ( charge[atomNumber - 1] / AMBERCHARGECONV )
And the sigma, epsilon parameters are set near line 534
double iAcoef = lj_a_coeff[ inbparams - 1 ];
double iBcoef = lj_b_coeff[ inbparams - 1 ];
double sigma, epsilon, rstar;
// is there a SMALL?
if (iAcoef < 0.000001)
{
sigma = 0.0;
epsilon = 0.0;
rstar = 0.0;
}
else
{
// and convert lj_a_coeff & lj_b_coeff into angstroms and kcal/mol-1
sigma = std::pow( iAcoef / iBcoef , 1/6. );
epsilon = pow_2( iBcoef ) / (4*iAcoef);
rstar = (sigma/2.)* std::pow(2.0, 1/6. ) ;
}
Note that in both cases we have to regenerate the charge, sigma and epsilon in AKMA units. As far as I know, sander doesn't do that.
If I set all the charges in the top file to a number (1.82223000E-01) that is going to divide well by AMBERCHARGECONV (18.2223) I now get the following Coulombic energies:
Sander
Coulombic: 2210.7274 kcal/mol
Sire
Coulombic: 2210.8039 kcal/mol
If I set all A and B lennard jones parameters to 1E04 and 1E02 respectively in the top file (so that it is easy to take roots) I get:
Sander
LJ : -862.1742 kcal/mol
Sire
LJ : -862.1742 kcal/mol
At this stage I am speculating that the Coulombic agreement isn't perfect because the term is longer-ranged and will be more sensitive to minute differences in distance calculations between the two codes. I am happy with a difference < 1/5 kT in energies for a 266 residues protein.
I am split between different opinions here:
1) The amber force field as described in the literature is correctly implemented in Sire. That energies differ is just a numerical error due floating point arithmetic. Equilibrium properties are not affected. We live with it.
2) The energies differ too much. It makes validation of our implementation difficult. The differences could affect simulation results. Correct implementation of the force field requires exact reproduction of sander energies. Maybe we have to change the way we setup non-bonded energies evaluation to minimize differences with sander. This may not be easy, and doesn't guarantee that we would also reproduce energies from other codes (e.g. OPLS from MCPRO).
Thoughts?