wrong R-Squared value??

jantunes

unread,

Nov 14, 2007, 1:07:01 PM11/14/07

to

Hi all,

I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).

This is what I do:
1) get 200 data points (x is a time series; y is CPU usage)
2) do linear regression based on those 200 points, resulting in some y'=a + bx
3) get R-squared (R^2=0.96) for the y'

Then, I want to validate that trendline/prediction by comparing it with more real data:
4) get more data points, past the 200 points (eg 10000)
5) get R-squared for the y' (this time against the new data)

The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).

Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?

Thanks in advance!

Kenneth M. Lin

unread,

Nov 14, 2007, 2:56:34 PM11/14/07

to

What software are you using?

"jantunes" <jasan...@gmail.com> wrote in message
news:9884857.11950636513...@nitrogen.mathforum.org...

Paige Miller

unread,

Nov 14, 2007, 3:50:23 PM11/14/07

to

I believe that when you take a validation data set, you can indeed
have such wild predictions that SSR > SST, or SSE > SST. This
indicates that there is a problem with the way the model fits to your
validation data.

I don't think R-squared is the proper way to compare the validation
fit to the training fit. I would compare the MSE from the validation
data set to the MSE of the training data set -- if they are close,
that's good, if they are widely different, that's bad.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

jantunes

unread,

Nov 15, 2007, 4:40:39 AM11/15/07

to

Java programming, then (upon the strange results) Excel.

jantunes

unread,

Nov 15, 2007, 4:53:22 AM11/15/07

to

> I believe that when you take a validation data set,
> you can indeed
> have such wild predictions that SSR > SST, or SSE >
> SST. This
> indicates that there is a problem with the way the
> model fits to your
> validation data.
>
> I don't think R-squared is the proper way to compare
> the validation
> fit to the training fit. I would compare the MSE from
> the validation
> data set to the MSE of the training data set -- if
> they are close,
> that's good, if they are widely different, that's
> bad.

I've done some searching on MSE and it appears that MSE is good to compare different statistics models (lowest MSE = best model), but not in giving an abslotute value like R-squared.
I'm not familiar with these but I've seen them in the literature: Chi-squared, F-test, p-value. Isn't there any (abslolute) statistic measure I can produce that validates y' (prediction) against the real and full data?

Paige Miller

unread,

Nov 15, 2007, 8:22:01 AM11/15/07

to

On Nov 15, 4:53 am, jantunes <jasantu...@gmail.com> wrote:
> > I believe that when you take a validation data set,
> > you can indeed
> > have such wild predictions that SSR > SST, or SSE >
> > SST. This
> > indicates that there is a problem with the way the
> > model fits to your
> > validation data.
>
> > I don't think R-squared is the proper way to compare
> > the validation
> > fit to the training fit. I would compare the MSE from
> > the validation
> > data set to the MSE of the training data set -- if
> > they are close,
> > that's good, if they are widely different, that's
> > bad.
>
> I've done some searching on MSE and it appears that MSE is good to compare different statistics models (lowest MSE = best model), but not in giving an abslotute value like R-squared.

Modelling techniques such as Partial Least Squares typically uses a
measure of residual error to compare models from a training set to a
validation set. I see no reason why a similar measure can't be used in
an Ordinary Least Squares model as well.

> I'm not familiar with these but I've seen them in the literature: Chi-squared, F-test, p-value. Isn't there any (abslolute) statistic measure I can produce that validates y' (prediction) against the real and full data?

I suppose you could do an F-test of MSE(training set)/MSE(validation
set). I'm not sure if this violates any of the standard
assumptions ... I'll have to think about that.

m00es

unread,

Nov 15, 2007, 12:54:36 PM11/15/07

to

Why not just correlate the predicted values with the observed values
and square that?

So, you start with one set of data and find the least squares
regression line:

Y_hat = b0 + b1 X

Note that the squared correlation between the observed Y values and
the predicted values from this model (i.e., the Y_hat values) is equal
to R^2.

Now take a new set of data. Predict Y_hat using the least squares
regression line found in the first dataset. Calculate the squared
correlation between the (new) Y values and the predicted Y_hat values.
This will be between 0 and 1 and indicates how much of the variance in
the Y values (in the new dataset) can be accounted for based on
knowing X and using the regression equation found using the first
dataset.

m00es

Ray Koopman

unread,

Nov 15, 2007, 2:54:12 PM11/15/07

to

On Nov 15, 1:53 am, jantunes <jasantu...@gmail.com> wrote:
> I've done some searching on MSE and it appears that MSE is good
> to compare different statistics models (lowest MSE = best model),
> but not in giving an abslotute value like R-squared.

You've got it backwards. R-square is relative, not absolute.
It depends on the size of the errors, relative to the overall
variability. MSE is an absolute measure of the size of the errors,
although you should be looking at sqrt(MSE), the root-mean-square
error, because it's in the right units. Similarly, if you insist
on using a relative measure, you should look at sqrt(1 - R^2),
the rms error relative to the overall SD.

Richard Ulrich

unread,

Nov 15, 2007, 3:29:53 PM11/15/07

to

On Wed, 14 Nov 2007 13:07:01 EST, jantunes <jasan...@gmail.com>
wrote:

> Hi all,
>
> I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).
>
> This is what I do:
> 1) get 200 data points (x is a time series; y is CPU usage)
> 2) do linear regression based on those 200 points, resulting in some y'=a + bx
> 3) get R-squared (R^2=0.96) for the y'

At this point -- Are you doing anything to get rid of
"spurious" correlations based on simple trends, etc.?
OLS regression may give useful results, in some sense,
but it won't give legitimate tests, and values such as
R-squared might be approximately useless, even when
they are as big as 0.96, if the whole thing represents
a simple trend.

>
> Then, I want to validate that trendline/prediction by comparing it with more real data:
> 4) get more data points, past the 200 points (eg 10000)

Wow! I though only astronomers used 50 times the
baseline for projections. What *is* this problem, anyway?

> 5) get R-squared for the y' (this time against the new data)

Use the squared deviations from the predicted values,
predicting from the original 200 points. It is hard to find much
fault with the RMS error. Isn't this exercise supposed to be
validation of the original equation? - then you have to use
the original equation.

Which R-squared? You show, yourself, you can't figure out
what to use to make a "proper" R-squared, or else, how to make
decent sense of R-squared.

>
> The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
> Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).
>
> Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Paige Miller

unread,

Nov 16, 2007, 8:03:40 AM11/16/07

to

You could have a high R^2 between predicted Y and the actual values
and still have a terrible fit in the new set of data. Correlation in
this case does not imply a good fit.

jantunes

unread,

Nov 16, 2007, 9:03:30 AM11/16/07

to

> > This is what I do:
> > 1) get 200 data points (x is a time series; y is
> > CPU usage)
> > 2) do linear regression based on those 200 points,
> > resulting in some y'=a + bx
> > 3) get R-squared (R^2=0.96) for the y'
>
> At this point -- Are you doing anything to get rid of
> "spurious" correlations based on simple trends, etc.?

No. I'm sorry but I don't even know what that is.

> OLS regression may give useful results, in some
> sense, but it won't give legitimate tests, and values
> such as R-squared might be approximately useless,
> even when they are as big as 0.96, if the whole thing
> represents a simple trend.

What do you mean by a simple trend?

> > Then, I want to validate that trendline/prediction
> > by comparing it with more real data:
> > 4) get more data points, past the 200 points
> > (eg 10000)
>
> Wow! I though only astronomers used 50 times the
> baseline for projections. What *is* this problem,
> anyway?

I'm trying to predict the resource usage for a given computer task (x = number of times the taks is repeated). So, I get a different y and y' (prediction) for a different type of resource (CPU, memory, etc).

> > 5) get R-squared for the y' (this time against the
> > new data)
>
> Use the squared deviations from the predicted values,
> predicting from the original 200 points. It is hard
> to find much fault with the RMS error.
> Isn't this exercise supposed to be validation of the
> original equation? - then you have to use the
> original equation.

y' is the original equation, the data (y) used in R-squared calculation is not (only the first 200 points are the same).

> Which R-squared? You show, yourself, you can't
> figure out what to use to make a "proper"
> R-squared, or else, how to make decent sense of
> R-squared.

Then what should I use? I need a statistically recognized measure of the prediction (y') against the data it is supposed to predict (not the data used for creating y').

Thanks!

jantunes

unread,

Nov 16, 2007, 9:13:23 AM11/16/07

to

> > > I believe that when you take a validation data
> > > set, you can indeed have such wild predictions
> > > that SSR > SST, or SSE > SST. This indicates
> > > that there is a problem with the way the model
> > > fits to your validation data.
> > >
> > > I don't think R-squared is the proper way to
> > > compare the validation fit to the training fit.
> > > I would compare the MSE from the validation
> > > data set to the MSE of the training data set --
> > > if they are close, that's good, if they are
> > > widely different, that's bad.
> >
> > I've done some searching on MSE and it appears that
> > MSE is good to compare different statistics models
> > (lowest MSE = best model), but not in giving an
> > abslotute value like R-squared.
>
> Modelling techniques such as Partial Least Squares
> typically uses a measure of residual error to
> compare models from a training set to a validation
> set. I see no reason why a similar measure can't
> be used in an Ordinary Least Squares model as well.

Can you give me some pointer for that measure (how can I calculate it or the name)?

> > I'm not familiar with these but I've seen them in
> > the literature: Chi-squared, F-test, p-value. Isn't
> > there any (abslolute) statistic measure I can produce
> > that validates y' (prediction) against the real and
> > full data?
>
> I suppose you could do an F-test of MSE(training
> set)/MSE(validation set). I'm not sure if this violates
> any of the standard assumptions ... I'll have to think
> about that.

I'm not sure if this test is usefull for what I want to prove.

Thanks!

jantunes

unread,

Nov 16, 2007, 9:21:56 AM11/16/07

to

Exactly. I tried that before. Both prediction and real data were very correlated (high R) but fitted terribly.

Thanks!

jantunes

unread,

Nov 16, 2007, 9:44:35 AM11/16/07

to

> > I've done some searching on MSE and it appears that
> > MSE is good to compare different statistics models
> > (lowest MSE = best model), but not in giving an
> > abslotute value like R-squared.
>
> You've got it backwards. R-square is relative, not
> absolute. It depends on the size of the errors,
> relative to the overall variability.
> MSE is an absolute measure of the size of the errors,
> although you should be looking at sqrt(MSE), the
> root-mean-square error, because it's in the right
> units.

I was referring to the fact that MSE is meaningful when compared to other MSEs (thus relative). But yes, you're right.

> Similarly, if you insist on using a relative
> measure, you should look at sqrt(1 - R^2),
> the rms error relative to the overall SD.

Yes, a relative measure is what I need because I only have one prediction. I need a value of the "fitness" of the prediction (without needing to compare it against other predictions).
But I cannot rely on R^2 since I get different (and strange) values depending on the equation (eg R^2 = SSR/SST; R^2 = 1 - (SSE/SST)). So, I don't think I can use sqrt(1 - R^2), though statistics is not anywhere near my field of expertise.

Thanks!

Richard Ulrich

unread,

Nov 17, 2007, 8:55:11 PM11/17/07

to

On Fri, 16 Nov 2007 09:03:30 EST, jantunes <jasan...@gmail.com>
wrote:

> > > This is what I do:
> > > 1) get 200 data points (x is a time series; y is
> > > CPU usage)
> > > 2) do linear regression based on those 200 points,
> > > resulting in some y'=a + bx
> > > 3) get R-squared (R^2=0.96) for the y'
> >
> > At this point -- Are you doing anything to get rid of
> > "spurious" correlations based on simple trends, etc.?
> No. I'm sorry but I don't even know what that is.
>
> > OLS regression may give useful results, in some
> > sense, but it won't give legitimate tests, and values
> > such as R-squared might be approximately useless,
> > even when they are as big as 0.96, if the whole thing
> > represents a simple trend.
> What do you mean by a simple trend?

Ooh. You really need to read some basics.
You could Google groups <group:sci.stat.* spurious > .

For a time series, the obvious spurious correlations involve
simple linear trends in separate variables. Or cycles.
If two variables separately have a similar trend, they will
have a positive correlation.

If Sequence-number correlates with your raw data, you
have a potential problem.

>
> > > Then, I want to validate that trendline/prediction
> > > by comparing it with more real data:
> > > 4) get more data points, past the 200 points
> > > (eg 10000)
> >
> > Wow! I though only astronomers used 50 times the
> > baseline for projections. What *is* this problem,
> > anyway?
> I'm trying to predict the resource usage for a given computer task (x = number of times the taks is repeated). So, I get a different y and y' (prediction) for a different type of resource (CPU, memory, etc).

This *sounds* like a matter of bench marking. For that,
the "time series" aspect should be incidental and irrelevant.
Each separate "experiment" should give the same results
regardless of when it is run. Is there *any* sort of proper
carry-over between experiments?

One really onerous way that these data could resemble a time
series is if you recorded the x and y as cumulative counters,
and never subtracted in order to find the data for the separate
experiments.

That would create a strong correlation that is "spurious"
and essentially useless.

Why is there any carry-over between experiments?

[snip, rest]

jantunes

unread,

Nov 20, 2007, 5:29:06 AM11/20/07

to

> For a time series, the obvious spurious correlations
> involve simple linear trends in separate variables.
> Or cycles.
> If two variables separately have a similar trend,
> they will have a positive correlation.
>
> If Sequence-number correlates with your raw data, you
> have a potential problem.

> > I'm trying to predict the resource usage for a
> > given computer task (x = number of times the taks
> > is repeated). So, I get a different y and y'
> > (prediction) for a different type of resource (CPU,
> > memory, etc).
>
> This *sounds* like a matter of bench marking. For
> that, the "time series" aspect should be incidental
> and irrelevant.
> Each separate "experiment" should give the same
> results regardless of when it is run. Is there *any*
> sort of proper carry-over between experiments?
>
> One really onerous way that these data could resemble
> a time series is if you recorded the x and y as
> cumulative counters, and never subtracted in order to
> find the data for the separate experiments.
>
> That would create a strong correlation that is
> "spurious" and essentially useless.
>
> Why is there any carry-over between experiments?

There is no separate experiment. This is a one experiment only, which consists in repeating the same task (e.g., a client/server request) several times.
But yes, there is a natural cumulative data because I'm measuring the current resource usage of the application (memory, disk, etc).

Thanks

Richard Ulrich

unread,

Nov 20, 2007, 4:35:24 PM11/20/07

to

On Tue, 20 Nov 2007 05:29:06 EST, jantunes <jasan...@gmail.com>
wrote:

[snip]
RU > >

> > This *sounds* like a matter of bench marking. For
> > that, the "time series" aspect should be incidental
> > and irrelevant.

Okay, for computer server benchmarks, the background
information might predict the speed of response. Response
will be faster when you are not competing with a couple of
users who are downloading movies, for example.

RU > >

> > Each separate "experiment" should give the same
> > results regardless of when it is run. Is there *any*
> > sort of proper carry-over between experiments?
> >
> > One really onerous way that these data could resemble
> > a time series is if you recorded the x and y as
> > cumulative counters, and never subtracted in order to
> > find the data for the separate experiments.
> >
> > That would create a strong correlation that is
> > "spurious" and essentially useless.
> >
> > Why is there any carry-over between experiments?
>

ja >

> There is no separate experiment. This is a one experiment only,
> which consists in repeating the same task (e.g., a client/server
> request) several times.

Why is it not true that each measurement is a separate experiment,
with its own separate timing?

Why is your application different from every other bench marking
I've read of?

> But yes, there is a natural cumulative data because I'm measuring
> the current resource usage of the application (memory, disk, etc).

SEE what I wrote last time, above.