nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
nlm["RSquared"]
The RSquared by Mathematica is 0.963173
Meanwhile, Excel and manual hand calculation show that R^2 should be
equal to 0.7622.
Is Mathematica wrong? Thanks!
Cheers -- Sjoerd
>sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
>0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
>0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
>0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`,
>0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
>0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`,
>0.798426867`}};
>nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
>nlm["RSquared"]
>The RSquared by Mathematica is 0.963173 Meanwhile, Excel and manual
>hand calculation show that R^2 should be equal to 0.7622.
>Is Mathematica wrong?
Whenever Mathematica and Excel disagree it is almost certain the
problem lies with Excel. Simply put, the current versions of
Excel should never be relied upon for any serious statistical
analysis. Do a Google search on Excel and you can find several
sites saying essentially the same thing as I just said here.
But this case seems to be the exception. There is a more subtle
issue in play.
The problem you are solving is not a non-linear problem. Linear
versus non-linear in model fitting refers to the way the unknown
parameters are included in the model not the functions of x used
in the model
Consider:
In[20]:= m = LinearModelFit[sbbBN, {1, x, x^2}, x];
In[21]:= m@"RSquared"
Out[21]= 0.762242
Which is the result returned by Excel. So, in this case it is
clear Excel is solving the linear regression problem and
computing RSquared for that problem correctly. In general, you
never want to use NonlinearModelFit for a linear problem that
can be handled by LinearModelFit.
Note, R is the *linear* correlation coefficient. To compute
something equivalent to R for a non-linear problem you have to
generalize the definition of R is some manner. I don't know how
this is being done in NonlinearModelFit. It is this detail that
is needed to determine whether the result returned for RSquare
by NonlinearModelFit is incorrect or not.
One final comment. Using powers of x as your set of basis
functions is OK for powers less than 2 and possibly OK for
powers up to 3. But this is definitely not a good idea for any
higher powers of x. The problem is the powers of x do not form
an orthogonal basis set. Also, perhaps even more important is
the matrices used to solve the linear regression problem become
increasingly ill conditioned as the powers of x increase. If you
need to fit a high degree polynomial to your data, you should
use Chebyshev polynomials as the basis functions rather than
powers of x.
This is as designed. For nonlinear models, the corrected (i.e. with the
mean subtracted out) sum of squares is sometimes used. This is
consistent with comparing to a constant model, but most nonlinear models
do not include a constant in an additive way. For this reason,
NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
mean) sum of squares.
Because the model you are using is a linear model, you could instead use
LinearModelFit, which uses corrected sums of squares if a constant term
is present and assumes a constant term is present unless it is told
otherwise.
In[1]:= sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`,
0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`,
0.798426867`}};
In[2]:= nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x];
In[3]:= nlm["RSquared"]
Out[3]= 0.963173
In[4]:= lm = LinearModelFit[sbbBN, {x, x^2}, x];
In[5]:= lm["RSquared"]
Out[5]= 0.762242
Darren Glosemeyer
Wolfram Research
This information should be included in the "Goodness-of-Fit Measures"
section of the NonlinearModelFit documentation, which should also
point out that RSquared is computed as 1 - (Residual SS)/(Total SS),
and that in nonlinear models this is generally different from the
ratio (Model SS)/(Total SS) that is sometimes cited -- e.g.,
http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html
-- as the definition of RSquared.
> This is as designed. For nonlinear models, the corrected (i.e. with the
> mean subtracted out) sum of squares is sometimes used. This is
> consistent with comparing to a constant model, but most nonlinear models
> do not include a constant in an additive way. For this reason,
> NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
> mean) sum of squares.
Is this the standard practice in mathematics world?
It seems to me that this takes away the common comparison ground
between linear and nonlinear regression.
I always get unrealistically high R^2 (>0.9) from NonlinearModelFit
function, even though the fit might be awfully off.
This makes me think if the so called uncorrected R^2 is right.
Any explanation? Thanks
PC
On Sep 28, 6:09 pm, Darren Glosemeyer <darr...@wolfram.com> wrote:
> On 9/27/2010 4:47 AM, Lawrence Teo wrote:
>
>
>
> > sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
> > 0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
> > 0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
> > 0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`=
,
> > 0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
> > 0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`=
,
> > 0.798426867`}};
>
> > nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
> > nlm["RSquared"]
>
> > The RSquared by Mathematica is 0.963173
> > Meanwhile, Excel and manual hand calculation show that R^2 should be
> > equal to 0.7622.
>
> > Is Mathematica wrong? Thanks!
>
> This is as designed. For nonlinear models, the corrected (i.e. with the
> mean subtracted out) sum of squares is sometimes used. This is
> consistent with comparing to a constant model, but most nonlinear models
> do not include a constant in an additive way. For this reason,
> NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
> mean) sum of squares.
>
> Because the model you are using is a linear model, you could instead use
> LinearModelFit, which uses corrected sums of squares if a constant term
> is present and assumes a constant term is present unless it is told
> otherwise.
>
> In[1]:= sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
> 0.751434565`}, {-1.508540016`, 0.571212292`}, =
{2.004747546`,
> 0.187621117`}, {1.139972167`, 0.297735572`}, {=
-0.724053077`,
> 0.457858443`}, {-0.830992757`, 0.313642502`}, =
{-3.830561204`,
> 0.81639874`}, {-2.357296433`, 0.804397821`}, {=
0.986610836`,
> 0.221932888`}, {-0.513640368`, 0.704999208`}, =
{-1.508540016`,
> 0.798426867`}};
>
> In[2]:= nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]=
Also, the n-1 in the formula for AdjustedRSquared should be n,
because the total sum of squares is uncorrected.
However, all that misses the main point I was trying to make, which
is that simply changing from corrected to uncorrected sums of squares
will not give 1 - SS_res/SS_tot, which is how NonlinearModelFit
calculates RSquared. The reason is that the residuals are not
generally orthogonal to the fitted values, so the decomposition
SS_tot = SS_fit + SS_res that holds for linear models does not
generally hold for nonlinear models.
For instance, using the data and model from the "Goodness-of-Fit
Measures" section of the NonlinearModelFit documentation,
the fitted values are
{13.658, 2.00568, 1.48485, 14.8951, 5.6088, 10.1695,
11.0627, 5.77841, 4.51702, 5.67666, 13.4947, 11.4323},
and the residuals are
{0.742037, 9.09432, 5.01515, -3.79512, 1.1912, 0.930521,
1.3373, 3.12159, 4.08298, 5.72334, -1.69468, -0.532303}.
Their uncentered inner product is 50.2468; centering gives -159.435.
The RegressionCommon documentation is for a now obsolete standard
package. The "RSquared" property for nonlinear models is described near
the bottom of
http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html
The current statement is:
"The coefficient of determination "RSquared" is the ratio of the model
sum of squares to the total sum of squares."
I will modify this to mention that the total is the uncorrected total
for the next version.
Darren Glosemeyer
Wolfram Research
I have corrected the documentation (for the next version) for the
nonlinear "RSquared" property to state that it is 1 - SS_res/SS_tot.
Darren Glosemeyer
Wolfram Research
I have seen both definitions (based on corrected and based on
uncorrected) used, but in both cases the comparison and interpretation
breaks down because of the nonorthogonality Ray mentioned. Authors often
caution that R^2 is not particularly meaningful for nonlinear model. As
a general rule, I would advice against using R^2 for nonlinear models
because the interpretation is at the very least not as clear as it is in
linear models. Also, if the model is actually a linear model, I would
advice fitting it as a linear model to take advantage of the properties
and results available for linear models which may not be applicable to
nonlinear models in general.
Darren Glosemeyer
Wolfram Research