Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Mathematica calculates RSquared wrongly?

1,354 views
Skip to first unread message

Lawrence Teo

unread,
Sep 27, 2010, 5:47:07 AM9/27/10
to
sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`,
0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`,
0.798426867`}};

nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
nlm["RSquared"]


The RSquared by Mathematica is 0.963173
Meanwhile, Excel and manual hand calculation show that R^2 should be
equal to 0.7622.

Is Mathematica wrong? Thanks!


Sjoerd C. de Vries

unread,
Sep 28, 2010, 6:06:20 AM9/28/10
to
I get the same numbers as you, using three definitions of Rsquared.
The difference might be caused by the standard definition of R squared
being applicable for linear least squares fits, whereas here we don't
have a linear model. You may consider asking sup...@wolfram.com
whether this is a bug or not.

Cheers -- Sjoerd

Bill Rowe

unread,
Sep 28, 2010, 6:07:14 AM9/28/10
to
On 9/27/10 at 5:47 AM, lawre...@yahoo.com (Lawrence Teo) wrote:

>sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
>0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
>0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
>0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`,
>0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
>0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`,
>0.798426867`}};

>nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
>nlm["RSquared"]

>The RSquared by Mathematica is 0.963173 Meanwhile, Excel and manual
>hand calculation show that R^2 should be equal to 0.7622.

>Is Mathematica wrong?

Whenever Mathematica and Excel disagree it is almost certain the
problem lies with Excel. Simply put, the current versions of
Excel should never be relied upon for any serious statistical
analysis. Do a Google search on Excel and you can find several
sites saying essentially the same thing as I just said here.

But this case seems to be the exception. There is a more subtle
issue in play.

The problem you are solving is not a non-linear problem. Linear
versus non-linear in model fitting refers to the way the unknown
parameters are included in the model not the functions of x used
in the model

Consider:

In[20]:= m = LinearModelFit[sbbBN, {1, x, x^2}, x];

In[21]:= m@"RSquared"

Out[21]= 0.762242

Which is the result returned by Excel. So, in this case it is
clear Excel is solving the linear regression problem and
computing RSquared for that problem correctly. In general, you
never want to use NonlinearModelFit for a linear problem that
can be handled by LinearModelFit.

Note, R is the *linear* correlation coefficient. To compute
something equivalent to R for a non-linear problem you have to
generalize the definition of R is some manner. I don't know how
this is being done in NonlinearModelFit. It is this detail that
is needed to determine whether the result returned for RSquare
by NonlinearModelFit is incorrect or not.

One final comment. Using powers of x as your set of basis
functions is OK for powers less than 2 and possibly OK for
powers up to 3. But this is definitely not a good idea for any
higher powers of x. The problem is the powers of x do not form
an orthogonal basis set. Also, perhaps even more important is
the matrices used to solve the linear regression problem become
increasingly ill conditioned as the powers of x increase. If you
need to fit a high degree polynomial to your data, you should
use Chebyshev polynomials as the basis functions rather than
powers of x.


Darren Glosemeyer

unread,
Sep 28, 2010, 6:09:02 AM9/28/10
to
On 9/27/2010 4:47 AM, Lawrence Teo wrote:
> sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
> 0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
> 0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
> 0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`,
> 0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
> 0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`,
> 0.798426867`}};
>
> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
> nlm["RSquared"]
>
>
> The RSquared by Mathematica is 0.963173
> Meanwhile, Excel and manual hand calculation show that R^2 should be
> equal to 0.7622.
>
> Is Mathematica wrong? Thanks!
>
>

This is as designed. For nonlinear models, the corrected (i.e. with the
mean subtracted out) sum of squares is sometimes used. This is
consistent with comparing to a constant model, but most nonlinear models
do not include a constant in an additive way. For this reason,
NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
mean) sum of squares.

Because the model you are using is a linear model, you could instead use
LinearModelFit, which uses corrected sums of squares if a constant term
is present and assumes a constant term is present unless it is told
otherwise.


In[1]:= sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,


0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,
0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`,
0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,
0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`,
0.798426867`}};

In[2]:= nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x];

In[3]:= nlm["RSquared"]

Out[3]= 0.963173

In[4]:= lm = LinearModelFit[sbbBN, {x, x^2}, x];

In[5]:= lm["RSquared"]

Out[5]= 0.762242


Darren Glosemeyer
Wolfram Research

Ray Koopman

unread,
Sep 29, 2010, 4:15:29 AM9/29/10
to
On Sep 28, 3:09 am, Darren Glosemeyer <darr...@wolfram.com> wrote:
> On 9/27/2010 4:47 AM, Lawrence Teo wrote:
>> [...]

>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
>> nlm["RSquared"]
>>
>> The RSquared by Mathematica is 0.963173
>> Meanwhile, Excel and manual hand calculation show that R^2 should
>> be equal to 0.7622.
>>
>> Is Mathematica wrong? Thanks!
>
> This is as designed. For nonlinear models, the corrected (i.e. with
> the mean subtracted out) sum of squares is sometimes used. This is
> consistent with comparing to a constant model, but most nonlinear
> models do not include a constant in an additive way. For this reason,
> NonlinearModelFit uses the uncorrected (i.e. without subtracting out
> the mean) sum of squares.

This information should be included in the "Goodness-of-Fit Measures"
section of the NonlinearModelFit documentation, which should also
point out that RSquared is computed as 1 - (Residual SS)/(Total SS),
and that in nonlinear models this is generally different from the
ratio (Model SS)/(Total SS) that is sometimes cited -- e.g.,
http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html
-- as the definition of RSquared.

Lawrence Teo

unread,
Sep 30, 2010, 4:53:19 AM9/30/10
to
With reference to the following statement,

> This is as designed. For nonlinear models, the corrected (i.e. with the
> mean subtracted out) sum of squares is sometimes used. This is
> consistent with comparing to a constant model, but most nonlinear models
> do not include a constant in an additive way. For this reason,
> NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
> mean) sum of squares.

Is this the standard practice in mathematics world?
It seems to me that this takes away the common comparison ground
between linear and nonlinear regression.

I always get unrealistically high R^2 (>0.9) from NonlinearModelFit
function, even though the fit might be awfully off.
This makes me think if the so called uncorrected R^2 is right.

Any explanation? Thanks

PC


On Sep 28, 6:09 pm, Darren Glosemeyer <darr...@wolfram.com> wrote:
> On 9/27/2010 4:47 AM, Lawrence Teo wrote:
>
>
>
> > sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,
> > 0.751434565`}, {-1.508540016`, 0.571212292`}, {2.004747546`,
> > 0.187621117`}, {1.139972167`, 0.297735572`}, {-0.724053077`,

> > 0.457858443`}, {-0.830992757`, 0.313642502`}, {-3.830561204`=


,
> > 0.81639874`}, {-2.357296433`, 0.804397821`}, {0.986610836`,

> > 0.221932888`}, {-0.513640368`, 0.704999208`}, {-1.508540016`=


,
> > 0.798426867`}};
>
> > nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
> > nlm["RSquared"]
>
> > The RSquared by Mathematica is 0.963173
> > Meanwhile, Excel and manual hand calculation show that R^2 should be
> > equal to 0.7622.
>
> > Is Mathematica wrong? Thanks!
>
> This is as designed. For nonlinear models, the corrected (i.e. with the
> mean subtracted out) sum of squares is sometimes used. This is
> consistent with comparing to a constant model, but most nonlinear models
> do not include a constant in an additive way. For this reason,
> NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
> mean) sum of squares.
>
> Because the model you are using is a linear model, you could instead use
> LinearModelFit, which uses corrected sums of squares if a constant term
> is present and assumes a constant term is present unless it is told
> otherwise.
>
> In[1]:= sbbBN = {{-0.582258428`, 0.49531889`}, {-2.475512593`,

> 0.751434565`}, {-1.508540016`, 0.571212292`}, =
{2.004747546`,
> 0.187621117`}, {1.139972167`, 0.297735572`}, {=
-0.724053077`,
> 0.457858443`}, {-0.830992757`, 0.313642502`}, =
{-3.830561204`,
> 0.81639874`}, {-2.357296433`, 0.804397821`}, {=
0.986610836`,
> 0.221932888`}, {-0.513640368`, 0.704999208`}, =
{-1.508540016`,
> 0.798426867`}};
>
> In[2]:= nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]=

Ray Koopman

unread,
Sep 30, 2010, 4:53:40 AM9/30/10
to
On Sep 29, 7:48 am, Darren Glosemeyer<dar...@wolfram.com> wrote:
> On 9/29/2010 3:15 AM, Ray Koopman wrote:

>> On Sep 28, 3:09 am, Darren Glosemeyer<dar...@wolfram.com> wrote:
>>> On 9/27/2010 4:47 AM, Lawrence Teo wrote:
>>>> [...]

>>>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
>>>> nlm["RSquared"]
>>>>
>>>> The RSquared by Mathematica is 0.963173
>>>> Meanwhile, Excel and manual hand calculation show that R^2 should
>>>> be equal to 0.7622.
>>>>
>>>> Is Mathematica wrong? Thanks!
>>>
>>> This is as designed. For nonlinear models, the corrected (i.e. with
>>> the mean subtracted out) sum of squares is sometimes used. This is
>>> consistent with comparing to a constant model, but most nonlinear
>>> models do not include a constant in an additive way. For this reason,
>>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out
>>> the mean) sum of squares.
>>
>> This information should be included in the "Goodness-of-Fit Measures"
>> section of the NonlinearModelFit documentation, which should also
>> point out that RSquared is computed as 1 - (Residual SS)/(Total SS),
>> and that in nonlinear models this is generally different from the
>> ratio (Model SS)/(Total SS) that is sometimes cited -- e.g.,
>> http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html
>> -- as the definition of RSquared.
>
> The RegressionCommon documentation is for a now obsolete standard
> package. The "RSquared" property for nonlinear models is described
> near the bottom of
>
> http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html
>
> The current statement is:
>
> "The coefficient of determination "RSquared" is the ratio of the model
> sum of squares to the total sum of squares."
>
> I will modify this to mention that the total is the uncorrected total
> for the next version.

Also, the n-1 in the formula for AdjustedRSquared should be n,
because the total sum of squares is uncorrected.

However, all that misses the main point I was trying to make, which
is that simply changing from corrected to uncorrected sums of squares
will not give 1 - SS_res/SS_tot, which is how NonlinearModelFit
calculates RSquared. The reason is that the residuals are not
generally orthogonal to the fitted values, so the decomposition
SS_tot = SS_fit + SS_res that holds for linear models does not
generally hold for nonlinear models.

For instance, using the data and model from the "Goodness-of-Fit


Measures" section of the NonlinearModelFit documentation,

the fitted values are

{13.658, 2.00568, 1.48485, 14.8951, 5.6088, 10.1695,
11.0627, 5.77841, 4.51702, 5.67666, 13.4947, 11.4323},

and the residuals are

{0.742037, 9.09432, 5.01515, -3.79512, 1.1912, 0.930521,
1.3373, 3.12159, 4.08298, 5.72334, -1.69468, -0.532303}.

Their uncentered inner product is 50.2468; centering gives -159.435.

Darren Glosemeyer

unread,
Sep 30, 2010, 4:51:51 AM9/30/10
to
On 9/29/2010 3:15 AM, Ray Koopman wrote:
> On Sep 28, 3:09 am, Darren Glosemeyer<darr...@wolfram.com> wrote:
>> On 9/27/2010 4:47 AM, Lawrence Teo wrote:
>>> [...]

>>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
>>> nlm["RSquared"]
>>>
>>> The RSquared by Mathematica is 0.963173
>>> Meanwhile, Excel and manual hand calculation show that R^2 should
>>> be equal to 0.7622.
>>>
>>> Is Mathematica wrong? Thanks!
>> This is as designed. For nonlinear models, the corrected (i.e. with
>> the mean subtracted out) sum of squares is sometimes used. This is
>> consistent with comparing to a constant model, but most nonlinear
>> models do not include a constant in an additive way. For this reason,
>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out
>> the mean) sum of squares.
> This information should be included in the "Goodness-of-Fit Measures"
> section of the NonlinearModelFit documentation, which should also
> point out that RSquared is computed as 1 - (Residual SS)/(Total SS),
> and that in nonlinear models this is generally different from the
> ratio (Model SS)/(Total SS) that is sometimes cited -- e.g.,
> http://reference.wolfram.com/mathematica/RegressionCommon/ref/RSquared.html
> -- as the definition of RSquared.
>

The RegressionCommon documentation is for a now obsolete standard
package. The "RSquared" property for nonlinear models is described near
the bottom of

http://reference.wolfram.com/mathematica/tutorial/StatisticalModelAnalysis.html

The current statement is:

"The coefficient of determination "RSquared" is the ratio of the model
sum of squares to the total sum of squares."

I will modify this to mention that the total is the uncorrected total
for the next version.

Darren Glosemeyer
Wolfram Research

Darren Glosemeyer

unread,
Oct 1, 2010, 5:40:16 AM10/1/10
to
On 9/30/2010 9:55 AM, Darren Glosemeyer wrote:

> On 9/30/2010 3:53 AM, Ray Koopman wrote:
>> On Sep 29, 7:48 am, Darren Glosemeyer<dar...@wolfram.com> wrote:
>>> On 9/29/2010 3:15 AM, Ray Koopman wrote:
>> Also, the n-1 in the formula for AdjustedRSquared should be n,
>> because the total sum of squares is uncorrected.
>>
>> However, all that misses the main point I was trying to make, which
>> is that simply changing from corrected to uncorrected sums of squares
>> will not give 1 - SS_res/SS_tot, which is how NonlinearModelFit
>> calculates RSquared. The reason is that the residuals are not
>> generally orthogonal to the fitted values, so the decomposition
>> SS_tot = SS_fit + SS_res that holds for linear models does not
>> generally hold for nonlinear models.
>>
>> For instance, using the data and model from the "Goodness-of-Fit

>> Measures" section of the NonlinearModelFit documentation,
>> the fitted values are
>>
>> {13.658, 2.00568, 1.48485, 14.8951, 5.6088, 10.1695,
>> 11.0627, 5.77841, 4.51702, 5.67666, 13.4947, 11.4323},
>>
>> and the residuals are
>>
>> {0.742037, 9.09432, 5.01515, -3.79512, 1.1912, 0.930521,
>> 1.3373, 3.12159, 4.08298, 5.72334, -1.69468, -0.532303}.
>>
>> Their uncentered inner product is 50.2468; centering gives -159.435.
>>
>
> Thanks for catching the AdjustedRSquared typo. The code is
> (effectively) using n. I've corrected the docs.
>
> I see your point about the orthogonality now. I missed it in the
> original example because the original example was actually a linear
> model. I'll have to take a closer look and decide if the code or the
> docs need to be corrected.
>
> Darren Glosemeyer
> Wolfram Research


I have corrected the documentation (for the next version) for the
nonlinear "RSquared" property to state that it is 1 - SS_res/SS_tot.

Darren Glosemeyer
Wolfram Research

Darren Glosemeyer

unread,
Oct 1, 2010, 5:40:05 AM10/1/10
to
On 9/30/2010 3:52 AM, Lawrence Teo wrote:
> With reference to the following statement,
>
>> This is as designed. For nonlinear models, the corrected (i.e. with the
>> mean subtracted out) sum of squares is sometimes used. This is
>> consistent with comparing to a constant model, but most nonlinear models
>> do not include a constant in an additive way. For this reason,
>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out the
>> mean) sum of squares.
> Is this the standard practice in mathematics world?
> It seems to me that this takes away the common comparison ground
> between linear and nonlinear regression.
>
> I always get unrealistically high R^2 (>0.9) from NonlinearModelFit
> function, even though the fit might be awfully off.
> This makes me think if the so called uncorrected R^2 is right.
>
> Any explanation? Thanks
>
> PC
>

I have seen both definitions (based on corrected and based on
uncorrected) used, but in both cases the comparison and interpretation
breaks down because of the nonorthogonality Ray mentioned. Authors often
caution that R^2 is not particularly meaningful for nonlinear model. As
a general rule, I would advice against using R^2 for nonlinear models
because the interpretation is at the very least not as clear as it is in
linear models. Also, if the model is actually a linear model, I would
advice fitting it as a linear model to take advantage of the properties
and results available for linear models which may not be applicable to
nonlinear models in general.

Darren Glosemeyer
Wolfram Research

Darren Glosemeyer

unread,
Oct 1, 2010, 5:39:43 AM10/1/10
to
On 9/30/2010 3:53 AM, Ray Koopman wrote:
> On Sep 29, 7:48 am, Darren Glosemeyer<dar...@wolfram.com> wrote:
>> On 9/29/2010 3:15 AM, Ray Koopman wrote:
>>> On Sep 28, 3:09 am, Darren Glosemeyer<dar...@wolfram.com> wrote:
>>>> On 9/27/2010 4:47 AM, Lawrence Teo wrote:
>>>>> [...]
>>>>> nlm = NonlinearModelFit[sbbBN, a*x^2 + b*x + c, {a, b, c}, x]
>>>>> nlm["RSquared"]
>>>>>
>>>>> The RSquared by Mathematica is 0.963173
>>>>> Meanwhile, Excel and manual hand calculation show that R^2 should
>>>>> be equal to 0.7622.
>>>>>
>>>>> Is Mathematica wrong? Thanks!
>>>> This is as designed. For nonlinear models, the corrected (i.e. with
>>>> the mean subtracted out) sum of squares is sometimes used. This is
>>>> consistent with comparing to a constant model, but most nonlinear
>>>> models do not include a constant in an additive way. For this reason,
>>>> NonlinearModelFit uses the uncorrected (i.e. without subtracting out
>>>> the mean) sum of squares.
0 new messages