Regards,
A
and vice versa.
In your SUBJECT, you specifically mentioned Pearson's correlation, but
in a simple OR multiple regression, you can easily have a high value
for
the multiple R (which is Pearson correlation between the observed Y and
the fitted Y) where you have a small MSE which is the estimate of the
error variance.
The numerical example in the 1975 SPSS Manual for Multiple Regression
is a beautiful example for that. For multiple regression it had a
multiple R
close to 1; even for simplle regression of Y (Investors' Index) on
GNP, the
multiple R (which is the same as the correlation between X and Y in
that
case) was something in the order of well over .95.
But the MSE of those fitted models were so large that it made the
prediction intervals virtually worthless. That numerical example was
discussed at length in sci.stat.math in 2005.
This kind of phenomenon is actually quite OFTEN when you have
two increasing (or decreasing) time series with one regressed on
the other. What results even has the name of "spurious correlation".
in the sense that the relation is NOT real. It is the artifact that
they
are BOTH increasing.
An example of the annual number of babies born in some Scandinavian
country had a correlatino of .98 with the number of stork sighting
reported. Reason: BOTH populations were strongly increasing.
If you find any other strongly increasing variable in that country
during
the same years. they WILL have very high correlations with either of
those variables.
A less obvious example was the initial data on the consumption of
cigarettes and the rate of LUNG cancer being significantly positively
correlated, leading people to suspect lung cancer was CAUSED by
cigarette smoking. The strongest argument against that faulty
conclusion (you cannot draw causal inference from correlatinos)
was the during the same years, the incident of STOMACH cancer
was strongly NEGATIVE correlated with the same smoking figures.
(Not even the Quacks in those days dared suggest that cigarette
smoking is a CURE for Stomach cancer).
There are many, many numerical association between two variables
that are SPURIOUS -- they are what happened to be observed.
-- Reef Fish Bob.
Dear Adil Raja,
You are most welcome. I was indeed my pleasure to help someone
who was genuinely puzzled by what he encountered in statistical
PRACTICE, to receive an answer which is seldom discussed in any
statistical textbooks OR in any discussion group, and UNDERSTOOD
it in the first round.
-- Reef Fish Bob.
Regards and thanks indeed,
Adil Raja
All else equal, R^2 is inversely related to the MSE. That's because:
R^2 = 1 - MSE * dfE / SSTOT,
where dfE stands for the error degrees of freedom and SSTOT stands for
sums of square total (i.e., if you were to regress one variable on the
other). So, as the MSE increases, R^2 decreases (and therefore R
decreases) and vice-versa.
m00es
As usual, m00es MIISSED all of the points in Raja's QUESTION on
how BOTH R^2 and MSE large at the same time.
As usual, all m00es can find is a mathematical trivia that does not
explain ANYTHING about that phenomenon in practice, which is
the notion due to Spurious Correlations.
That is how said this group is, to be constantly bombarded by
one m00es who are wrong in ALL of his posts in the past several
weeks ALL because he had no experience whatsoever with
the APPLICATION of statistics.
In that regard, the OP is infinitely more experienced and
knowledgeable about APPLIED statistics than m00es ever will.
-- Reef Fish Bob.
>
> m00es wrote:
> > Raja wrote:
> > > Hi all,
> > > I have a question? Is it possible to have a high value of MSE and
> > > and a higher R value (and low mse and low R value)? In my opinion,
> > > intuitively, when one increases the other should decrease. However, in
> > > certain tests I am observing the converse. Could someone suggest that
> > > if it is natural to happen? Any rationale would be appreciated. Could
> > > someone explain it from the mechanics of the two equations?
> >
> > All else equal, R^2 is inversely related to the MSE. That's because:
> >
> > R^2 = 1 - MSE * dfE / SSTOT,
> >
> > where dfE stands for the error degrees of freedom and SSTOT stands for
> > sums of square total (i.e., if you were to regress one variable on the
> > other). So, as the MSE increases, R^2 decreases (and therefore R
> > decreases) and vice-versa.
> >
> > m00es
>
> As usual, m00es MIISSED all of the points in Raja's QUESTION on
> how BOTH R^2 and MSE large at the same time.
"As usual" ... Reef Fish Bob missed the statement,
"when one increases, the other should decrease."
As usual, Reef Fish is offended when anyone adds
to whatever Reef Fish has posted. This is a pattern.
It is possible that Bob lucked out, and the Original poster did
not mean that phrase literally. Or, not.
Yes, that has to happen, for one set of data: the MSE
is smaller when the R^2 is larger.
>
> As usual, all m00es can find is a mathematical trivia that does not
> explain ANYTHING about that phenomenon in practice, which is
> the notion due to Spurious Correlations.
>
> That is how said this group is, to be constantly bombarded by
> one m00es who are wrong in ALL of his posts in the past several
> weeks ALL because he had no experience whatsoever with
> the APPLICATION of statistics.
>
> In that regard, the OP is infinitely more experienced and
> knowledgeable about APPLIED statistics than m00es ever will.
Reef Fish Bob proves himself obnoxious, again.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
No, you and m00es both missed Raja's question! If he is still around,
he should tell you both that he wasn't asking the question you thought
he asked.
>
> As usual, Reef Fish is offended when anyone adds
> to whatever Reef Fish has posted. This is a pattern.
>
> It is possible that Bob lucked out, and the Original poster did
> not mean that phrase literally. Or, not.
He first asked about if both can be HIGH, and not asking about
their mathematical relation which he seems to be aware.
Then he said he had observed the converse -- that is unmistakable
that he wasn't talking about the mathematical relation -- which has
no converse.
The converse, as he explained, was BOTH HIGH or BOTH LOW,
with unspecified meaning of HIGH or LOW.
I knew exactly what he was asking. He acknowledged that it was
and my answer helped him.
>
>
> Yes, that has to happen, for one set of data: the MSE
> is smaller when the R^2 is larger.
>
That wasn't Raja's question.
>
> >
> > As usual, all m00es can find is a mathematical trivia that does not
> > explain ANYTHING about that phenomenon in practice, which is
> > the notion due to Spurious Correlations.
> >
> > That is how said this group is, to be constantly bombarded by
> > one m00es who are wrong in ALL of his posts in the past several
> > weeks ALL because he had no experience whatsoever with
> > the APPLICATION of statistics.
> >
> > In that regard, the OP is infinitely more experienced and
> > knowledgeable about APPLIED statistics than m00es ever will.
>
> Reef Fish Bob proves himself obnoxious, again.
I am always obnoxious to the obnoxious Quacks -- you are one of
them without doubt, Richard Ulrich.
-- Reef Fish Bob.
The OP wrote: "Could someone explain it from the mechanics of the two
equations?". That's what I did. I showed how R^2 (i.e., R) and MSE are
related to each other.
m00es
The phenomenon he OBSERVED cannot possibly be explained from the
mechanics of the equation.
That's the only thing you know, which did NOT explain what he observed.
You have NO experience in APPLIED regression. You don't know how
to be a consultant, to understand what a client ASKED.
You are a misguided mathematical statistics PEDANT.
-- Reef Fish Bob.
More insults. Sigh. Why are you getting so flustered? I did not say
anything about your post or disagreed with anything you wrote. I just
provided a bit more insight by pointing out the relationship between
R^2 (and R) and the MSE.
m00es
> m00es
If you're insulted, it's only because those are FACTS about your lack
of
experience in Applied Statistics. If you hadn't been making NOISE
about
your own errors, the insult wouldn't be necessary.
You keep coming back with your same errors INVITING me to tell you
the same.
-- Reef Fish Bob.
No, the fact that you have to revert to insults only shows that you
lack common decency and that you are dodging the questions, because you
are unable to admit that you are wrong.
m00es
Why didn't you READ what he said he observed? That both R^2 and MSE
are "high" contrary to his expectation (using the known identity m00es
knew).
That's why and how I explained the phenomenon of "spurious
correlations"
together with explcit reference to the SPSS Example to explain Raja's
observed phenomenon.
Raja was satisfied. He posted his thanks TWICE, making no mistakes
that he understood what I explained.
> > > >
> > > > You have NO experience in APPLIED regression. You don't know how
> > > > to be a consultant, to understand what a client ASKED.
> > > >
> > > > You are a misguided mathematical statistics PEDANT.
That was when m00es stepped in, not content with his ERRORS in
the simple regression vs correlation VALIDATION problems, to add
his pedantic trivia of quoting another formula that EVERYONE knows.
> >
> > If you're insulted, it's only because those are FACTS about your lack
> > of experience in Applied Statistics. If you hadn't been making NOISE
> > about your own errors, the insult wouldn't be necessary.
-- Reef Fish Bob.
Reef Fish wrote:
> That's why and how I explained the phenomenon of "spurious correlations"
Actually, your explanation is nonsense. When there is a strong spurious
correlation between two variables, then the observed correlation
between the two variables is high. So the squared correlation is high.
And when you regress one variable on the other, the MSE will then be
small, since R^2 is inversely related to the MSE.
m00es
That is correct.
> And when you regress one variable on the other, the MSE will then be
> small, since R^2 is inversely related to the MSE.
>
> m00es
Ignorant and BLIND PEDANT. Read the thread about the SPSS
multiple regression example from the 1975 Manual, discussed at
length before your UGLY presence was seen in sci.stat.math.
-- Reef Fish Bob.
I don't need to read some silly SPSS manual. There are things called
facts. All else being equal, R^2 is inversely related to the MSE.
That's one of those facts. That has nothing to do with spurious
correlations.
m00es
It's not the SPSS Manual you need to read. It's the INSTRUCTIVE
material I posted on how the SPSS Example was WRONG that
explained the phenomenon of "spurious correlation".
>
> m00es
You are not just IGNORANT. You REFUSE to pay heed to any
detailed discussion that had already taken place that would have
TAUGHT you something you never knew before.
All you know are the pedantic FORMULAS.
That's ALL your "education" is -- a completely USELESS repeat of
pedantic formulas, with absolutely NO THOUGHT and NO SENSE
that is required by anyone analyzing DATA.
-- Reef Fish Bob.
Oh, thanks! Whew, that was a close one.
Reef Fish wrote:
> It's the INSTRUCTIVE material I posted on how the SPSS Example was WRONG that
> explained the phenomenon of "spurious correlation".
The OPs question was not about spurious correlations. It was about R^2
and the MSE. There is an inverse relationship between the two. You
babbled on about spurious correlations, completely missing the boat.
Well, not even the boat -- more like the planet.
Reef Fish wrote:
> You are not just IGNORANT. You REFUSE to pay heed to any
> detailed discussion that had already taken place that would have
> TAUGHT you something you never knew before.
Because it's not relevant to understand the relationship between R^2
and the MSE.
Reef Fish wrote:
> All you know are the pedantic FORMULAS.
Hey dummy, this is STATISTICS! It's based on formulas!
My education is in applied and theoretical statistics. And formulas can
show what the inverse relationship between R^2 and the MSE is. No
applied statistics is need for that. You don't even need actual data
for that. It's a simple fact.
m00es
And Spurious Correlation, as seen in the SPSS Example, based on
the 1975 Manual DATA, and discussed and explained in 2005, long
before m00es raised his ugly head in polluting these groups.
> Hey dummy, this is STATISTICS! It's based on formulas!
That's exactly what makes you an IGNORANT PEDANT, who sees
the formulas and cannot THINK with them.
> Reef Fish wrote:
> > That's ALL your "education" is -- a completely USELESS repeat of
> > pedantic formulas, with absolutely NO THOUGHT and NO SENSE
> > that is required by anyone analyzing DATA.
-- Reef Fish Bob.
Nope, the OP asked about R^2 and MSE. Not about spurious correlations.
Reef Fish wrote:
> m00es wrote:
> > Hey dummy, this is STATISTICS! It's based on formulas!
>
> That's exactly what makes you an IGNORANT PEDANT, who sees
> the formulas and cannot THINK with them.
I know quite well when a formula can provide an insight into the exact
relationship between R^2 and the MSE. You, on the other hand, know
quite well how to babble on about completely silly things.
m00es
m00es
I am not a statistician. I just like statistics and like to learn it by
intuition. Given this I cannot really argue about anything as well as I
am not even familiar with the jargon and terminology.
Reefs, explanation helped me a bit. I couldnot understand him FULLY but
thats my fault as I dont have the right background. However, he made
his point and I got what I needed to know. Now what m00es said in his
formula baffles me again. It seems that if R^2 is high, mse should
always be low and vice versa. However, I have a numeber of datasets on
all of which I observed the converse. Some were multiple regression
problems and some were ordinary single variable regression problems.
This was quite strange to me and reef managed to help. m00es's equation
is confusing me again (although I still have to fully conceive the
equation).
Regards,
Adil Raja
You CAN learn from my LATEST two posts, about the Preview to
Reef Fish Statistics for Dummies, and look at the numerical examples
that illustrated what ASSUMPTION VALIDATION is all about, which
is the one single topic m00es has erred in everything he had ever
posted - with no exception!
> But I am sure that u guys are good friends actually and having some
> sort of fun through this dialogue.
I am100% sure you are WRONG, just as I am 100% sure all the
wrong things m00es posted. :-)
>
> I am not a statistician. I just like statistics and like to learn it by
> intuition. Given this I cannot really argue about anything as well as I
> am not even familiar with the jargon and terminology.
That is understandable. You asked a very good question about
whether MSE and R can both be "large" in a thread you started
and had since been thoroughly polluted by m00es.
In particular, m00es gave you the WRONG answer to your
intended question, citing formulas -- that's the only thing he knows.
He knows NOTHING about Applied Statistics and what one sees
and does in it.
> Reefs, explanation helped me a bit. I couldnot understand him FULLY but
> thats my fault as I dont have the right background. However, he made
> his point and I got what I needed to know. Now what m00es said in his
> formula baffles me again. It seems that if R^2 is high, mse should
> always be low and vice versa.
That's what I mean by he knows ONLY formulas.
> However, I have a numeber of datasets on
> all of which I observed the converse. Some were multiple regression
> problems and some were ordinary single variable regression problems.
You observed something contrary to that formula relation in DIFFERENT
regression problems. That was what I understood you to mean and
explained to you the reasons.
You can still review what I said in the SECOND post of this thread,
following your Opening Post:
http://groups.google.com/group/sci.stat.math/msg/193ac98990cad9a5
This example was given in my Preview Post to Validation:
This is the result:
X Y FITTED RESIDU
** *** ************ ******
1 1 -11 12
2 4 3.5527E-15 4
3 9 11 -2
4 16 22 -6
5 25 33 -8
6 36 44 -8
7 49 55 -6
8 64 66 -2
9 81 77 4
10 100 88 12
RF> It even has an R^2 that would have impressed Richard Ulrich. :-)
RF> R_SQ = .94976 R = .97456
RF> But the regression model is TOTALLY wrong,
I didn't mention the MSE of that example was SSE/9 = 58.667,
which was HIGH relative to the data. For X = 1, the fitted
error was 1,200%. For X = 10, the error was 120%,
> This was quite strange to me and reef managed to help. m00es's equation
> is confusing me again (although I still have to fully conceive the
> equation).
m00es understood only that one equation within the same problem.
You can also review my comment on his post that misled you:
http://groups.google.com/group/sci.stat.math/msg/46372c717005b9ae
In that post, I even gave your the compliment you deserved:
RF> In that regard, the OP is infinitely more experienced and
RF> knowledgeable about APPLIED statistics than m00es ever will.
You noticed the phenomenon caused by "spurious correlations" and
you asked about it. I am sure I've pointed to the 1975 SPSS Manual
example in which the R's (in simple or multiple regression) were
ALL very high (in the high .9s) while the MSE's are also high
(in practical terms to render the fitted models USELESS).
That's the kind of experience m00es LACKS totally.
He has NO experience in applied statistics whatsoever.
He has some training in mathemtical statistics, and that's all he
knows, some formulas that are NOT applicable in application
contexts.
YOUR problem about MES and R is one of them. The simple
regression test and the validation of assumptions is the other.
m00es has scored a total of ZERO in those topics. He scored
minus infinity in the amount of noise and pollution he has
created.
> Regards,
> Adil Raja
-- Reef Fish Bob.
P.S. Raja's OP is in Post #1 of Google's thread:
"Relationship between pearson's correlation coeffcient and sigma"
My initial reply was Post #2. m00es's NOISE started at Post #6
of the same thread and lasted till Post #24. Raja's current post
is Post #25. My present reply should be Post #26.
MSE(first dataset) = 1
dfE(first dataset) = 50
SSTOT(first dataset) = 200
so: R^2 in the first dataset = .75
MSE(second dataset) = 1.2
dfE(second dataset) = 50
SSTOT(second dataset) = 300
so: R^2 in the second dataset = .80.
So, R^2 in the second dataset > R^2 in the first dataset, but at the
same time, the MSE in the second dataset is also greater than the MSE
in the first dataset.
However, all else equal (i.e., dfE in the first model is the same as
dfE in the second model and the amount of variability in the dependent
variable is the same in both datasets), then
R^2 in the first dataset > R^2 in the second dataset IMPLIES that the
MSE in the first dataset must be smaller than in the second dataset.
m00es
m00es's current NOISE is Post #27.
m00es wrote:
> When talking about TWO DIFFERENT datasets (in which we may even fit
> different regression models), then R^2(first dataset) > R^2(second
> dataset) does not imply ANYTHING about the relation of the MSE(first
> dataset) and the MSE(second dataset). That's because R^2 = 1 - MSE *
> dfE / SSTOT for each dataset/model.
That part is at least indicative of what Raja OBSERVED, while
previously all m00es told Raja was (Post #3 and subsequent NOISE):
m00es> All else equal, R^2 is inversely related to the MSE. That's
because:
m00es> R^2 = 1 - MSE * dfE / SSTOT,
m00es> So, as the MSE increases, R^2 decreases (and therefore R
m00es> decreases) and vice-versa.
completely ignoring what Raja ASKED that he observed BOTH to
increase in many problem he encountered.
I explained that could happened in SPURIOUS correlations. Also,
that could happen in the analysis of the SAME data set, and pointed
to the 1975 SPSS Manual example. m00es NEVER reads anything
I referenced. He finally contrived two simple examples unlikely to be
the kind Raja ever OBSERVED to explain the obvious.
Here's the DATA in SPSS Manual Example dataset, on the fitting
of INVDEX to 3 variables in the SPSS Manual.
m00es, you HAVE used a regression program, have you? A simple
regression program, perhaps? Why don't you show us any of
your fitted models so that we can DISCUSS in REAL terms
with respect to the REAL analysis of data with REAL data?
-- Reef Fish Bob.
> > > INVDEX GNP C.PROF C.DIVD
> > > 76.4 7678 269 216 (1935)
> > > 99.5 8022 351 251 (1936)
> > > 105.9 8820 403 250
> > > 86.7 8871 362 290
> > > 83.7 9536 541 304
> > > 70.7 10911 619 317
> > > 61.7 12486 801 273
> > > 58.7 14816 917 243
> > > 76.3 15357 882 233
> > > 76.6 15927 858 211
> > > 91 15552 852 195
> > > 105.8 15251 966 230
> > > 96.8 15446 1008 286
> > > 102.8 15735 908 240
> > > 100 16343 851 278
> > > 120.3 17471 1065 361
> > > 153.8 18547 1034 300
> > > 158.2 20027 1081 296
> > > 146.5 20794 1089 287
> > > 165.6 20186 953 282
> > > 212.7 21920 1206 321
> > > 245.9 23811 1313 340
> > > 236 24117 1202 364
> > > 218.8 24397 1242 371
> > > 242.6 25242 1378 388
> > > 256.9 15849 1295 397
> > > 326.1 25615 1314 436
> > > 314.4 28287 1422 470
> > > 336 29740 1525 511
> > > 394 31650 1718 583
> > > 433.1 33814 1836 629 (1965)
> > > 408.5 35822 1762 655 (1966)
Several empirical reasons had been given for Raja's OBSERVED
phenomenon. This is Post #6 in the thread, the debut by m00es,
giving an answer to Raja that is not even mathematically correct! :-)
Reason? It is my strong belief (given m00es's various assertions
in hundreds of posts) that m00es had NEVER done any regression
problem on real data.
m00es made more noise in Post #27. I had proposed in Post #28
for him to do some analysis of the 1975 SPSS Manual data, given
in that post, so that we may discuss a few things of APPLIED
statistical substance in this thread that Raja started and re-appeared
to ask more questions. This was Raja's latest question:
Raja> It seems that if R^2 is high, mse should always be low
Raja> and vice versa. However, I have a numeber of datasets on
Raja> all of which I observed the converse. Some were multiple
Raja> regression problems
It had just occurred to me, on re-reading Raja's question, that I have
overlooked ONE explanation that simultaneously showed that m00es
was WRONG in his assertion in his very FIRST post in the thread,
post #6:
>
> All else equal, R^2 is inversely related to the MSE. That's because:
>
> R^2 = 1 - MSE * dfE / SSTOT,
>
> where dfE stands for the error degrees of freedom and SSTOT stands for
> sums of square total (i.e., if you were to regress one variable on the
> other). So, as the MSE increases, R^2 decreases (and therefore R
> decreases) and vice-versa.
>
> m00es
That's an ERROR even in the interpretation of his mathematical
identity!!
m00es> So, as the MSE increases, R^2 decreases
What Raja probably had observed, at times, were in the
variable-selection
problem in Multiple Regression where it is in fact QUITE COMMON for
BOTH the R-square and MSE to increase!
It is so common that I had even written about it in my "Elbow Rule" in
"model building", that when BOTH increase, it means the variable
selection procedure had gone too far, to where it should NEVER be.
That's a point of gross "overfitting". So, this may be the example
that is more meaningful to Raja than the rest I suggested.
I mentioned the Hald Data in Draper and Smith and other Applied
Regression textbooks,
http://www.ndsu.edu/qsar_soc/resource/datasets/hald.htm
in which when the number of independent variables increased from
3 to 4, the best 3 variable model had
R = .9911 , with MSE = .2562
and the full model (with all 4 X's) had
R = .9911 and MSE = .3058.
Since the R and R^2 necessarily increase from the best three
variables to the model with all four, what we see there is a
case where BOTH R^2 and MSE increase, in the same
regression problem, though the increase in MSE is much more
prominent because of the small sample size (13) of the problem.
There are literally HUNDREDS of examples like that where both
R-square and MSE increase in the thousands of REAL data
sets my students (data chosen by the students themselves) and
were required to analyze in the Data Analysis course projects.
Most of those problems with many independent variables in
multiple regression invariably run across the problem of
over-fitting (whether the fit was good or bad), when using
variable-selection procedures to help sort the MINIMUM
number (and combination) of variables that should be used
in a given problem of that kind.
Now that I think of these examples, perhaps they were
some of the same phenomenon observed by Raja. Perhaps
Raja will come back to confirm or disconfirm.
There are simply many different REASONS why such
phenomenon CAN occur (and often do) in practice.
-- Reef Fish Bob.