Curve fitting toolbox: R^2 for nonlinear functions

Mike Gerst

unread,

Jan 9, 2007, 11:45:02 AM1/9/07

to

Hi all,

I have a question regarding the R^2 value that is returned in the
curve fitting toolbox for nonlinear equations, such as Gaussian
functions.

My understanding was that R^2 is only valid for linear models, and
that if calculated for a nonlinear model, the result has no meaning.
Does Matlab's curve fitting toolbox use a different formula to
calculate R^2 (other than the standard squared sum of explained
variance divided by squared sum of total variance) that is more
robust under nonlinearity? Or, do I have an incorrect assumption for
the proper use of R^2?

Thanks,
Mike Gerst

Tom Lane

unread,

Jan 9, 2007, 1:38:32 PM1/9/07

to

> My understanding was that R^2 is only valid for linear models, and
> that if calculated for a nonlinear model, the result has no meaning.
> Does Matlab's curve fitting toolbox use a different formula to
> calculate R^2 (other than the standard squared sum of explained
> variance divided by squared sum of total variance) that is more
> robust under nonlinearity? Or, do I have an incorrect assumption for
> the proper use of R^2?

Mike, some people don't like R^2 for various reasons, but if the nonlinear
function has a constant term then the usual interpretation seems like it
should be okay to me. It's even true if the function doesn't have an
explicit constant term, but it can fit any constant value. An example of
the latter would be (a+b*x)/(1+c*x), which is a constant if b=c=0.

For a model without a constant term, whether linear or not, the various ways
of computing R^2 do not yield the same value.

Here's an example of a nonlinear fit. You can see that the rsquare field's
value is the same as the formula you mention, and there are two other
expressions that yield the same answer.

-- Tom

>> load census
>> [f,g] = fit(cdate,pop,'a + (x/b)^c','start',[1 1900 1])
f =
General model:
f(x) = a + (x/b)^c
Coefficients (with 95% confidence bounds):
a = -33.8 (-45.1, -22.49)
b = 1522 (1480, 1563)
c = 21.11 (19.04, 23.18)
g =
sse: 341.5431
rsquare: 0.9972
dfe: 18
adjrsquare: 0.9969
rmse: 4.3560
>> yfit = f(cdate);
>> res = pop - yfit;
>> corrcoef(pop,yfit).^2
ans =
1.0000 0.9972
0.9972 1.0000
>> 1 - norm(res)^2 / norm(pop - mean(pop))^2
ans =
0.9972
>> norm(yfit - mean(yfit))^2 / norm(pop - mean(pop))^2
ans =
0.9972

Mike Gerst

unread,

Jan 9, 2007, 2:31:20 PM1/9/07

to

Dear Tom,

Thank you for your response. Is there a way to measure goodness of
fit for a nonlinear function without a constant term? I know I can
look at the plot of my function against the observed data, and decide
whether the fit is good by eye (which it is), but I would like to
have some sort of numerical measurement for this. I have tried
calculating the traditional r squared by various methods and each
returns a different result.

Specifically, I am just fitting a Gaussian curve of the form y =
exp(-1(ln(x) - A)^2/(2B^2)). Is there a statistic (doesn't have to
be r squared) that will simply describe the explanatory degree this
function has for my dataset?

thanks,
Mike

Tom Lane

unread,

Jan 9, 2007, 3:30:43 PM1/9/07

to

> Thank you for your response. Is there a way to measure goodness of
> fit for a nonlinear function without a constant term? I know I can
> look at the plot of my function against the observed data, and decide
> whether the fit is good by eye (which it is), but I would like to
> have some sort of numerical measurement for this. I have tried
> calculating the traditional r squared by various methods and each
> returns a different result.
>
> Specifically, I am just fitting a Gaussian curve of the form y =
> exp(-1(ln(x) - A)^2/(2B^2)). Is there a statistic (doesn't have to
> be r squared) that will simply describe the explanatory degree this
> function has for my dataset?

Mike, if you are comparing a bunch of alternative fits to the same data set,
then looking at the residual sum of squares is a reasonable way of comparing
them. You can also look at derived quantities such as the residual standard
deviation, or you can incorporate some sort of penalty for the number of
parameters to be estimated. There are things like AIC (Akaike information
criterion) and BIC (Bayes information criterion) sometimes used for this
purpose.

People like R^2 because it is on a standard scale and has a simple
interpretation. As you find, though, the various interpretations don't
carry over to a model without a constant term. I'm not aware of any
generally accepted measure like this for such models.

-- Tom

John D'Errico

unread,

Jan 9, 2007, 3:32:28 PM1/9/07

to

Tom Lane wrote:
>
>
>> My understanding was that R^2 is only valid for linear models,
> and
>> that if calculated for a nonlinear model, the result has no
> meaning.
>> Does Matlab's curve fitting toolbox use a different formula to
>> calculate R^2 (other than the standard squared sum of explained
>> variance divided by squared sum of total variance) that is more
>> robust under nonlinearity? Or, do I have an incorrect
assumption
> for
>> the proper use of R^2?
>
> Mike, some people don't like R^2 for various reasons, but if the
> nonlinear
> function has a constant term then the usual interpretation seems
> like it
> should be okay to me. It's even true if the function doesn't have
> an
> explicit constant term, but it can fit any constant value. An
> example of
> the latter would be (a+b*x)/(1+c*x), which is a constant if b=c=0.
>
> For a model without a constant term, whether linear or not, the
> various ways
> of computing R^2 do not yield the same value.

Tom,

Put me in the camp that I don't have
much love for R^2, BECAUSE of the abuse
it gets. Too many people think its a
magic number that automatically says
their fit is acceptable or not. R^2
is only a single indicator of fit.

I try to convince them that what
really matters is their goals for a
model. They need to know something
about the system under study. Does
the model fit acceptably well, given
their expectations for the noise in
the data? Is there (unacceptable)
lack of fit in the fit to their data
with this model?

If I plot my data and the model fit
and I'm satisfied with the result,
then I don't give a hoot if R^2 is
-2 or even 10. ;-)

John

(Yes, as long as a constant term is
estimated in the model, then R^2
makes sense for either linear or
nonlinear models.)

Greg Heath

unread,

Jan 10, 2007, 1:01:19 AM1/10/07

to

Mike Gerst wrote:
> Dear Tom,
>
> Thank you for your response. Is there a way to measure goodness of
> fit for a nonlinear function without a constant term? I know I can
> look at the plot of my function against the observed data, and decide
> whether the fit is good by eye (which it is), but I would like to
> have some sort of numerical measurement for this. I have tried
> calculating the traditional r squared by various methods and each
> returns a different result.
>
> Specifically, I am just fitting a Gaussian curve of the form y =
> exp(-1(ln(x) - A)^2/(2B^2)). Is there a statistic (doesn't have to
> be r squared) that will simply describe the explanatory degree this
> function has for my dataset?

If you changed your model into the form

y = exp(-(A*log(x)+B)^2)

Then A = 0 yields a constant model and R^2 might be useful.

Hope this helps.

Greg