I have a question regarding the R^2 value that is returned in the
curve fitting toolbox for nonlinear equations, such as Gaussian
functions.
My understanding was that R^2 is only valid for linear models, and
that if calculated for a nonlinear model, the result has no meaning.
Does Matlab's curve fitting toolbox use a different formula to
calculate R^2 (other than the standard squared sum of explained
variance divided by squared sum of total variance) that is more
robust under nonlinearity? Or, do I have an incorrect assumption for
the proper use of R^2?
Thanks,
Mike Gerst
Mike, some people don't like R^2 for various reasons, but if the nonlinear
function has a constant term then the usual interpretation seems like it
should be okay to me. It's even true if the function doesn't have an
explicit constant term, but it can fit any constant value. An example of
the latter would be (a+b*x)/(1+c*x), which is a constant if b=c=0.
For a model without a constant term, whether linear or not, the various ways
of computing R^2 do not yield the same value.
Here's an example of a nonlinear fit. You can see that the rsquare field's
value is the same as the formula you mention, and there are two other
expressions that yield the same answer.
-- Tom
>> load census
>> [f,g] = fit(cdate,pop,'a + (x/b)^c','start',[1 1900 1])
f =
General model:
f(x) = a + (x/b)^c
Coefficients (with 95% confidence bounds):
a = -33.8 (-45.1, -22.49)
b = 1522 (1480, 1563)
c = 21.11 (19.04, 23.18)
g =
sse: 341.5431
rsquare: 0.9972
dfe: 18
adjrsquare: 0.9969
rmse: 4.3560
>> yfit = f(cdate);
>> res = pop - yfit;
>> corrcoef(pop,yfit).^2
ans =
1.0000 0.9972
0.9972 1.0000
>> 1 - norm(res)^2 / norm(pop - mean(pop))^2
ans =
0.9972
>> norm(yfit - mean(yfit))^2 / norm(pop - mean(pop))^2
ans =
0.9972
Thank you for your response. Is there a way to measure goodness of
fit for a nonlinear function without a constant term? I know I can
look at the plot of my function against the observed data, and decide
whether the fit is good by eye (which it is), but I would like to
have some sort of numerical measurement for this. I have tried
calculating the traditional r squared by various methods and each
returns a different result.
Specifically, I am just fitting a Gaussian curve of the form y =
exp(-1(ln(x) - A)^2/(2B^2)). Is there a statistic (doesn't have to
be r squared) that will simply describe the explanatory degree this
function has for my dataset?
thanks,
Mike
Mike, if you are comparing a bunch of alternative fits to the same data set,
then looking at the residual sum of squares is a reasonable way of comparing
them. You can also look at derived quantities such as the residual standard
deviation, or you can incorporate some sort of penalty for the number of
parameters to be estimated. There are things like AIC (Akaike information
criterion) and BIC (Bayes information criterion) sometimes used for this
purpose.
People like R^2 because it is on a standard scale and has a simple
interpretation. As you find, though, the various interpretations don't
carry over to a model without a constant term. I'm not aware of any
generally accepted measure like this for such models.
-- Tom
Tom,
Put me in the camp that I don't have
much love for R^2, BECAUSE of the abuse
it gets. Too many people think its a
magic number that automatically says
their fit is acceptable or not. R^2
is only a single indicator of fit.
I try to convince them that what
really matters is their goals for a
model. They need to know something
about the system under study. Does
the model fit acceptably well, given
their expectations for the noise in
the data? Is there (unacceptable)
lack of fit in the fit to their data
with this model?
If I plot my data and the model fit
and I'm satisfied with the result,
then I don't give a hoot if R^2 is
-2 or even 10. ;-)
John
(Yes, as long as a constant term is
estimated in the model, then R^2
makes sense for either linear or
nonlinear models.)
If you changed your model into the form
y = exp(-(A*log(x)+B)^2)
Then A = 0 yields a constant model and R^2 might be useful.
Hope this helps.
Greg