The problem is that when I graph X4 vs Y, I got R-square as 0.0003. I tried x^2, Log X, Sqrt (x) but nothing could increase r-square to more than 0.06 .... any comments?
Even when I remove X4 from the model, I still get the standard deviation of the residual > std dependent ..... which indicates a non-linearity problem but all the relationships X1,X2,X3 seems to be linear with Y.
any comments. .... or recommendations?
> I am working on a regression model with four independent variables and one dependent variable.
> To check linearity I graphed each independent variable vs. dependent variable using scatter plots in SPSS and found the r-square. All of the relationships have a r-square between 0.1-0.34
One big purpose of the recommendation to "look at scatter
plots" is that you should *look* at the plots. What do you *see*?
Is there a curved relationship? If the variables are near-normal,
then the small-relationship graph looks like a slightly elliptical
swarm of points.
>
> The problem is that when I graph X4 vs Y, I got R-square as 0.0003. I tried x^2, Log X, Sqrt (x) but nothing could increase r-square to more than 0.06 .... any comments?
"There is no linear relationship" is not the same as
"The relationship is non-linear." Consider the possibility,
"There is no relationship."
>
> Even when I remove X4 from the model, I still get the standard deviation of the residual > std dependent ..... which indicates a non-linearity problem but all the relationships X1,X2,X3 seems to be linear with Y.
> any comments. .... or recommendations?
The residual increases when the variable added to the regression
yields a t-value less than 1.0; that is shown by simple algebra.
Again, "There is no relation" might be your answer.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Not at all. By "the assumption behind regression", one usually means
the PROBABILITY assumption about the ERRORS in a regression
model, regardless whether the functional relationship between Y and
the Xs is linear or nonlinear.
In that regard, Richard Ulrich gave you some erroneous advice about
what to look at in the scatter plots. There is NOTHING you can look
at in the scatter plots of your Y vs any of the Xs that is relevant to
the
"normality" issue of the errors. You can't look at any plot about the
distribution assumption about the errors until AFTER you have
attempted some fit, and have the residuals to look at.
Plotting Y vs each of the X's is one of the items I called "Pitfalls
in
Multiple Regression Analysis", for the reasons I'll explain below.
> Am I suppose just to ignore that the this variable (x4) does not exist and regress Y agianst the other three variables > only?
Absolutely not!
> When I did that, the standrad deviation for the residuals was 0.54 and the standrard deviation for the dependent variablewas 0.468. So that shows that there exists some non-linearity even with three variables only. But when I looked at the scatter plot fot these three varaibles, there R-square ranged between 0.1 and 0.3 (a modest linear relationship) ... what am I suppose to do?
What you're supposed to do is to first learn the theory and practice
of doing a multiple-linear regression from a creditable source, and
not from some computer manual or the malpractice statistical
Quacks in newsgroups.
In terms of VISUAL examination, If you want to see the functional
relation between Y and a single indep var. X, you can do a scatter
plot.
If you want to see the functional relation between Y and (X1, X2), you
need to do a 3-D plot of Y vs X1, and X2, to look at the SURFACE of Y.
When you try to visualize the functional relationship between Y and X1,
X2, X3, and X4, it's beyond the present graphical technology as well
well as your own visual experience to look at a 5-dimensional plot!
So, you have to resort to some analytic means to help you uncover
the high-dimensional functional relation.
Plotting Y vs each of the Xs is pretty much futile (calculating Rs are
even worse). Y may be perfectly linearly related to (X1, X2, X3, X4)
in the same sense as multicollinearity, or perfectly nonlinearly
related
to (X1, X2, X3, X4) in the sense of nonlinear surfaces in two to four
dimensions in the Xs, and yet the scatterplot of Y vs any of the X's
may resemble a nonexistent bivariate functional relationship with a
correlation near zero, for each and every one of the individual X.
The is where the ART of model building in multiple-linear regression
takes over from the SCIENCE of the methodology, because there is
nothing in the science that is adequate in telling one what to do other
than some "guided iterations" of trial and error, as discussed in
George
Box's JASA article on "Science and Statistics", and the course material
in "Data Analysis" taught by many statisticians, including myself.
I your case, I strongly advice taking an elementary course in applied
data analysis in model building, and not try to do-it-yourself after
reading a few too-simplistic articles or posts in newsgroups.
-- Bob.
Taking a few minutes off At Sea Day on a Caribbean cruise.
> sehwail wrote:
> > Does "saying there is no linear relationship" violates the assumption behind regression?
>
> Not at all. By "the assumption behind regression", one usually means
> the PROBABILITY assumption about the ERRORS in a regression
> model, regardless whether the functional relationship between Y and
> the Xs is linear or nonlinear.
>
> In that regard, Richard Ulrich gave you some erroneous advice about
> what to look at in the scatter plots. There is NOTHING you can look
> at in the scatter plots of your Y vs any of the Xs that is relevant to
> the
> "normality" issue of the errors. You can't look at any plot about the
> distribution assumption about the errors until AFTER you have
> attempted some fit, and have the residuals to look at.
Bless you, too, Bob. However, I said nothing
about checking the normality assumption, so I guess
you could have read my Reply (and the other posts)
more closely.
IF he is expecting something non-linear, then having
unusual shapes in the bivariate scatters is something to
look for, and having skewness or outliers is another thing.
I wish that he had reported what he saw.
>
> Plotting Y vs each of the X's is one of the items I called "Pitfalls
> in
> Multiple Regression Analysis", for the reasons I'll explain below.
>
>
> > Am I suppose just to ignore that the this variable (x4) does not exist and regress Y agianst the other three variables > only?
>
> Absolutely not!
>
>
> > When I did that, the standrad deviation for the residuals was 0.54 and the standrard deviation for the dependent variablewas 0.468. So that shows that there exists some non-linearity even with three variables only. But when I looked at the scatter plot fot these three varaibles, there R-square ranged between 0.1 and 0.3 (a modest linear relationship) ... what am I suppose to do?
>
> What you're supposed to do is to first learn the theory and practice
> of doing a multiple-linear regression from a creditable source, and
> not from some computer manual or the malpractice statistical
> Quacks in newsgroups.
Say, Bob, you didn't notice that telling oddity, the
size of residuals, did you?
I suppose the risk of posting to Newsgroups is that he will
get no useful advice, or else that he will get slammed by
some dyspeptic grouch who occasionally likes to preen
and pose.
[snip, instructive lecture on plotting.]
> -- Bob.
>
> Taking a few minutes off At Sea Day on a Caribbean cruise.
> Does "saying there is no linear relationship" violates the assumption behind regression?
Does saying "there is no difference between two means" violate
an assumption of the t-test? No, answering that was a purpose
of doing the test. Similarly, the test in regression tells you
whether there is any linear relationship.
> Am I suppose just to ignore that the this variable (x4) does not exist and regress Y agianst the other three variables only?
> When I did that, the standrad deviation for the residuals was 0.54 and the standrard deviation for the dependent variablewas 0.468.
Here is a minor puzzle to me. You report that the SD
of the residuals is 0.54, for the 3 variables, right? And
the SD of the dependent variable is 0.468. Right off,
this says that you have a lousy regression; the linear
regression explains nothing.
But beyond that, I have to ask, WHERE does this estimate of
the SD of the residuals come from? WHY is it larger than
the raw SD? Was this presented by the regression program?
- Most beginners will not know that the square root of the
Mean Square-error is the proper estimate of the standard deviation,
but, instead, with save the residuals and compute their SD,
which will always be smaller because
it uses (N-1) for the Degrees of freedom, instead of (N-4) (here).
- This estimate will differ much only when the N is small.
Is that the case here, where N is less than 10 or so?
- The other possibility for larger residuals is when the
regression is not the usual OLS regression, but has been
forced through the origin. That is *not* a good practice,
except for very special purposes. If that was your case,
start all over.
> So that shows that there exists some non-linearity even with three variables only.
Get a grip. Once again. Having "no linear relation" is not the same
as having a "non-linear relation." You got nothing.
> But when I looked at the scatter plot fot these three varaibles, there R-square ranged between 0.1 and 0.3 (a modest linear relationship) ... what am I suppose to do?
If you have only 10 variables, and 4 predictors, the total
R-squared predicted by chance is 0.44; anything less will
give residuals with more error than the original, when estimated
with the proper d.f. What you mention for the other three
variables looks like chance.
I have 145 responses and I have 4 IV and 1 dependent variable. I am trying to show that there is a positive relationship between X1-X4 with Y, and I don’t care about the value of the relationship, all I am interested to show is there is a high (positive relationship).
I am using SPSS in my analysis, so when I used the linear regression option I got the standard deviation for the residuals was 0.54 and the standard deviation for the dependent variable was 0.44. but the standardized std the predicted vale was 1 and the standardized for the error was 0.99 (not sure if I should use the standardized )
A. Normality
I started with checking the normality of the IVs and DV and got the following numbers:
Variable Skewness Std Error Kurtosis St error
X1 -.823 0.201 0.609 0.400
X2 0.041 0.201 -0.465 0.400
X3 -0.539 0.201 -0.259 0.400
X4 -0.956 0.201 0.994 0.400
Y -0.542 0.201 0.173 0.400
For this data I concluded that the data is normal
B. I ran the regression model (step wise) and got an R-square of 0.4 using X1,X2,X4
From the regression output I checked the regression assumptions:
1. Mutlicolinearity
Since all tolerances<0.1, VIF<4, Eigenvalue >0 and condition <15 and also looking at the correlation matrix between IV was less than 0.5 ….. OK
2. Outliers
I have used the leverage statistics and cooks distance to check for outliers (all my value were below 0.5 so I am fine)
Looking at the Leverage statistic (less than 0.5) and Cooks distance (<1)- OK
3. Independence of the error term
Durbin-Watson coefficient 1.923 (between 1.5 and 2.5  independent obs)
The Durbin-Watson statistics is used to test for the presence of serial correlation among residuals. As a general rule of thumb, the residuals are not correlated if the statistics is around 2, and an acceptable range is 1.5-2.5
4. Normaility of the error term
Look at the regression standardized residual histogram …. Ok
5. Homoskedasity (predicted vs. residual)
OK
6. Linearity
I used the partial regression plot generated by the SPSS regression and I also plotted each IV versus Y and got the following r-square:
X1 vs. Y R-square: 0.22
X2 vs. Y R-square: 0.23
X3 vs. Y R-square: 0.014
X4 vs. Y R-square: 0.4
When I re-did the regression using the Enter option and not the step wise option I noticed that the t-test significance for X1,X2,X4 was less than 0.04 but for X3it was 0.962
I tried to do all sort of transformation for variable X3 but it was never significant and the R-square of the relationship between X and Y was always below 0.1
Am I doing this right?
What am I suppose to do with X3?!!
Note:
Rich …. Thanks for all your help ….. but I have read several books on linear regression and infinite number of articles and presentation but I am still not sure if what I am doing correct or not … so thanks for all your help.
I have used the leverage statistics and cooks distance to check for outliers (all my value were below 0.5 so I am fine)
.
On Mon, 07 Nov 2005 16:44:12 EST, sehwail <seh...@hotmail.com> wrote:
> Does "saying there is no linear relationship" violates the assumption behind regression?
Does saying "there is no difference between two means" violate
an assumption of the t-test? No, answering that was a purpose
of doing the test. Similarly, the test in regression tells you
whether there is any linear relationship.
> Am I suppose just to ignore that the this variable (x4) does not exist and regress Y agianst the other three variables only?
> When I did that, the standrad deviation for the residuals was 0.54 and the standrard deviation for the dependent variable was 0.468.
Ulrich> If the variables are near-normal,
Ulrich> then the small-relationship graph looks like a slightly
elliptical
Ulrich> swarm of points.
Just exactly what RELEVANCE that has to the multiple linear regression
problem posed? NONE of the variables, Y, X1, ..., X4 need to be
normally or bivariate normally distributed, as AGGREGATED variables
used in the regression.
What you said was nothing more than your usual Quackery.
> IF he is expecting something non-linear, then having
> unusual shapes in the bivariate scatters is something to
> look for, and having skewness or outliers is another thing.
More statement reflecting your usual MALPRACTICE than advice that
is ever useful, in ANY multiple regression problem!
He was doing scatter plots of Y vs the Xs. NOTHING had been fitted.
He wasn't plotting any residuals.
You missed my point (which was obviously too advanced for you, even
though I spelled it out explicitly for the OP) that you CANNOT see any
multivariate surface fitted by Y to X1, ...X4, by looking at only the
bivariate scatters, even if the 4-dimensional fit is perfect.
> I wish that he had reported what he saw.
It wouldn't have mattered! No matter what he saw, it would NOT have
been indicative of whatever 4-dimensional surface he was seeking to
fit.
> > Plotting Y vs each of the X's is one of the items I called "Pitfalls
> > in
> > Multiple Regression Analysis", for the reasons I'll explain below.
And Richard Ulrich has added other well-known pitfalls of what NOT to
do,
in his present post.
> >
> >
> > > Am I suppose just to ignore that the this variable (x4) does not exist and regress Y agianst the other three variables > only?
> >
> > Absolutely not!
> >
> >
> > > When I did that, the standrad deviation for the residuals was 0.54 and the standrard deviation for the dependent variablewas 0.468. So that shows that there exists some non-linearity even with three variables only. But when I looked at the scatter plot fot these three varaibles, there R-square ranged between 0.1 and 0.3 (a modest linear relationship) ... what am I suppose to do?
> >
> > What you're supposed to do is to first learn the theory and practice
> > of doing a multiple-linear regression from a creditable source, and
> > not from some computer manual or the malpractice statistical
> > Quacks in newsgroups.
>
> Say, Bob, you didn't notice that telling oddity, the
> size of residuals, did you?
The size of what residual. I did not look at any of his data or any of
his plots or anything else because given his FAULTY methodolgy
there was nothing that could be learned from doing his useless plots
and calculations.
> I suppose the risk of posting to Newsgroups is that he will
> get no useful advice, or else that he will get slammed by
> some dyspeptic grouch who occasionally likes to preen
> and pose.
Just because YOU (the proven malpractice Quack of this newsgroup)
did not see my post as sound advice to him (and anyone else reading
this thread), it's your problem.
> [snip, instructive lecture on plotting.]
Much more than that. Instructive advice on HOW to see a high
dimensional fit of Y vs several Xs, using model-building methods in
multiple linear regression.
>
> > -- Bob.
> >
> > Taking a few minutes off At Sea Day on a Caribbean cruise.
-- Bob.
Now taking a few minutes off sightseeing in Ocho Rio, Jamaica.
See my comments to Rich Ulrich's follow-up on my post. Trust me. I
have read
enough of Richard's posts on regression and linear model to know
without a
shadow of a doubt that he is unqualified to advise anyone on the
subject, and he
is no better of than you are, given what you have further clarified in
this post.
Richard has led many others astray in this newsgroup in what were clear
cases
of the blind following the Blind.
>
> I have 145 responses and I have 4 IV and 1 dependent variable. I am trying to show that there is a positive relationship between X1-X4 with Y, and I don't care about the value of the relationship, all I am interested to show is there is a high (positive relationship).
>
> I am using SPSS in my analysis,
You should have first leaned HOW an analysis should be done before
using any
computer program to do any "analysis" that are misguided,
inappropriate, and wrong!
Garbage In, Garbage Out is not what anyone should practice in their use
of SPSS
or any other statistical package.
> A. Normality
> I started with checking the normality of the IVs and DV
WHY? There is absolutely NOTHING in any multiple-linear-regression
model
that requires ANY of the Ind. vars. or the Dep. Var. (which is an
aggregate of
n observations from n DIFFERENT distributions, even if you assume each
of
them is normal) to be Normal.
Therefore, you work below is precisely Garbage In, Garbage Out, because
all the numbers are 100% irrelevant to any regression model you are
trying
to fit. It's a complete waste of computer and human time.
> and got the following numbers:
> Variable Skewness Std Error Kurtosis St error
> X1 -.823 0.201 0.609 0.400
> X2 0.041 0.201 -0.465 0.400
> X3 -0.539 0.201 -0.259 0.400
> X4 -0.956 0.201 0.994 0.400
> Y -0.542 0.201 0.173 0.400
> For this data I concluded that the data is normal
If you had learned your regression analysis methodolgy from a
creditable
source, you would NOT have done or needed to look at any of the
garbage above. They are completely, totally, worthless for your
problem!
> B. I ran the regression model (step wise) and got an R-square of 0.4 using X1,X2,X4
A major methodological error already. You don't have anything even
CLOSE
to what's needed to explore the functional relationship you are seeking
between
Y and your Xs.
To explore even the quadratic surface fit of Y on X1, ..., X4, you need
to
START with all of these terms as your independent variables:
X1, X2, ...X4, and all squared terms of Xi, and all cross-product
terms
such as X1X2, X1X3, ... X3X4, because ALL of them may be needed
for the quadratic surface your data may fit. More likely than not,
many of
those terms are NOT needed in your fit, but you have to start with them
before you know which you need and which you don't, instead of just
blindly keeping the three linear terms in X1, X2, and X4.
The rest of your checks
> From the regression output I checked the regression assumptions:
> 1. Mutlicolinearity
> 2. Outliers
> 3. Independence of the error term
> 4. Normaility of the error term
> 5. Homoskedasity (predicted vs. residual)
> 6. Linearity
are completely useless because you have made such gross errors
of omission in your START of the problem. You were merely paying
lip service to some items in the computer manual of some textbook
whose substance you did not comprehend.
> I tried to do all sort of transformation for variable X3 but it was never significant and the R-square of the relationship between X and Y was always below 0.1
> Am I doing this right?
> What am I suppose to do with X3?!!
My advice to you in the first round stands:
RF> I your case, I strongly advice taking an elementary course in
applied
RF> data analysis in model building, and not try to do-it-yourself
after
RF> reading a few too-simplistic articles or posts in newsgroups.
> Note:
> Rich .... Thanks for all your help ..... but I have read several books on linear regression and infinite number of
> articles and presentation but I am still not sure if what I am doing correct or not ... so thanks for all your help.
That shows your do-it-yourself learning failed! You needed GUIDANCE,
from those in the know, on those books and articles you said you've
read.
You OBVIOUSLY missed EVERYTHING you needed to have learned from
those books and articles, to have not even gotten to the First Base, in
tackling a very straightforward problem in seeking a functional
relation between
Y and four other fitting/predicting variables.
>
> I have used the leverage statistics and cooks distance to check for outliers (all my value were below 0.5 so I am fine)
Learn to walk before you try to run. None of Cook's material is
relevant
or useful to anything you've done to your problem.
As for advice from anyone in this newsgroup, I can say definitively and
unequivocally that Richard Ulrich is about the WORST person you could
have listen to for advice.
-- Bob.
On 10 Nov 2005 17:35:15 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
>
> sehwail wrote:
> > Rich,
> > Thanks for your comments, I will try to explain my situation better.
>
> See my comments to Rich Ulrich's follow-up on my post. Trust me. I
> have read
> enough of Richard's posts on regression and linear model to know
> without a
> shadow of a doubt that he is unqualified to advise anyone on the
> subject, and he
> is no better of than you are, given what you have further clarified in
> this post.
>
> Richard has led many others astray in this newsgroup in what were clear
> cases
> of the blind following the Blind.
[snip, some advice that the OP might or might not find useful.]
> As for advice from anyone in this newsgroup, I can say definitively and
>
> unequivocally that Richard Ulrich is about the WORST person you could
> have listen to for advice.
Bob was last in this 'rant' mode was back in late July or so.
At that time, within just a couple of weeks, he totally blew
a *regression* question, no less -- one thing he does know
something about, though he doesn't grasp the social sciences
applications. That was about the specific use of the two
definitions of R-squared for an intercept fixed at zero; Bob
seemed to have never heard of it.
That was in a debate with someone other than me.
Also at that time, he screwed up a problem on analyzing data
where subsets consisted of binomial samples, by ignoring
the possible heterogeneity among samples. Several of us
recommended something other than his (limited) approach
with its limited robustness.
At that time, he wrote several Replies to posts of mine, where
his reaction to statistical power analysis was -- "gibberish",
I think, was what he repeated, over and over. Within posts,
and between posts. It was the closest thing I've seen, in these
stats groups, to a nervous breakdown.
Bob had spend several months insulting a dozen other
people, but especially *me*, since I am a most frequent
poster, and also because I challenged him; and he incidentally
screwed up, over and over, trying to prove me wrong at
one thing and another. He even *admitted* a couple of the
errors, though apologizing seems to be outside of his experience.
He is a bright guy, who has made fine contributions; but he
often reads posts too quickly, and is far too quick with "putting
down" people who write naive questions, or ones who write
Replies that don't agree entirely with his own perceptions.
- When he withdrew from insulting at that time, I figured
that it was (a) out of embarrassment, and (b) from recognizing
the untenable positions that he had left himself in, for
defending himself, if I did choose to respond any further.
Rather suddenly, he stopped the nasty posts; so, at that time,
we all silently welcomed his apparent reformation.
Bob -- Why don't you go back to enjoying your vacation?
No, you did not respond to ANY of the substance of what I advised the
OP on how he went badly astray and how he should have proceeded.
I have given him the roadmap that ANYONE doing the same or similar
problem can, and should, follow.
> [snip, some advice that the OP might or might not find useful.]
>
> Bob was last in this 'rant' mode was back in late July or so.
Instead of addressing the substance of my post, Richard Ulrich went
into HIS version of issues I had discussed, pointing out HIS errors,
and how he had malpracticed statistics.
Of course Richard Ulrich in his self-serving version completely
distorted the truth.
I would URGE readers to go back to the archives and read whatever
I have posted, and see that Ulrich was at best LYING, in an attempt
to conceal his blunders.
>
> At that time, within just a couple of weeks, he totally blew
> a *regression* question, no less --
That only shows the LACK of credibility in anything Richard Ulrich
says about me. I was having a discussion with Jerry Dallal, who
does know infinitely more about regression than Richard Ulrich.
The only ERROR in that thread was when Richard Ulrich stuck
his nose into our conversation and made a blunder HIMSELF,
miscopying what Jerry had posted.
It's all in the archives.
It would be a waste of everyone's time for me to rebut Ulrich on
anything else because it's ALL readily retrievable in the archives.
Note that Ulrich NEVER cited any specifics, only HIS verbal
version of distortion -- which is Ulrich's modus operandi, in his
attempt to defend his Quackery and Malpractice in statistics.
It's all deja vu.
< Richard Ulrich's self-serving distortions snipped -- interested
readers can, and SHOULD, read those threads themselves!!
Without any coaching from Richard, or ME. >
> Bob -- Why don't you go back to enjoying your vacation?
I AM! But I am not going to let Quacks like you get away so
easily with your LIES and bad advice to the OP, while going into
your baseless attack of me, without addressing ANY of the
model-building and multiple regressions issued I discussed in
THIS thread.
-- Bob.
At Sea Day in the Caribbean, having left Ocho Rio, Jamaica.
> Rich,
> Thanks for your comments, I will try to explain my situation better.
>
> I have 145 responses and I have 4 IV and 1 dependent variable. I am trying to show that there is a positive relationship between X1-X4 with Y, and I don’t care about the value of the relationship, all I am interested to show is there is a high (positive relationship).
Well, I'd say that "high" has to be anchored in values.
As your observed N gets larger, then whatever is "statistically
significant" becomes correspondingly smaller.
But I have more serious problems with your terminology,
or how you are reading or describing your computer output.
>
> I am using SPSS in my analysis, so when I used the linear regression option I got the standard deviation for the residuals was 0.54 and the standard deviation for the dependent variable was 0.44. [snip, rest of sentence about standardized values].
This is frankly impossible, for your OLS regression.
The Sum of squares around the residual will *not*
be 33% larger than the SS around the overall mean.
- For each, SS = df* SD^2 ; and the d.f.'s are about the same.
Unless, as I warned against last time, you could be forcing
the regression through the origin, and *then* residuals could
be larger.
[snip, most of the stuff about normality testing, etc.
These are not essential to regression or to the Question
being asked, so long as you have confirmed that all the
data values are legitimate and reasonable, with no big
outliers to confuse whatever is going on.]
>
> B. I ran the regression model (step wise) and got an R-square of 0.4 using X1,X2,X4
>
[snip, more]
> 6. Linearity
> I used the partial regression plot generated by the SPSS regression and I also plotted each IV versus Y and got the following r-square:
> X1 vs. Y R-square: 0.22
> X2 vs. Y R-square: 0.23
> X3 vs. Y R-square: 0.014
> X4 vs. Y R-square: 0.4
I note here that the R-square for X4 vs. Y seems to be exactly
the same as the R-square for X1,X2,X4 vs. Y -- unless you were
rounding off to say 0.4 for each.
These results indicate that X4 and X1,X2 are *confounded*
in predicting Y; X1 and X2 add approximately nothing to
the prediction achieved by X4. The tests on the coefficients
in the full equation are tests on "partial regression coefficients",
which are measures of how much is added *uniquely* by each
variable.
- This has nothing at all to do with "non-linearity".
>
> When I re-did the regression using the Enter option and not the step wise option I noticed that the t-test significance for X1,X2,X4 was less than 0.04 but for X3it was 0.962
If X1 and X2 each add about 0.015 as unique contribution
to the R-squared, then they each will test with a t-test that
has p-value < 0.05.
From your numbers, it looks like they should add less than that.
>
> I tried to do all sort of transformation for variable X3 but it was never significant and the R-square of the relationship between X and Y was always below 0.1
> Am I doing this right?
> What am I suppose to do with X3?!!
Why do you need to do anything with X3, except report it?
You can check for any arcane shape of relationship between
raw values by looking at the scatter plot of X3 with Y.
You can save the residuals of the X1,X2,X4 regression, and
plot X3 against those, to show that there is not any arcane
shape of relation between X3 and what is left after the rest
of the prediction.
[snip, preceding posts]
>
> Richard Ulrich wrote:
> > Sorry, folks, Bob Ling is ranting again.
> > Responding briefly to that --
>
> No, you did not respond to ANY of the substance of what I advised the
> OP on how he went badly astray and how he should have proceeded.
Bob Ling is not paying attention again. And he doesn't give
a damn, which is possibly why he isn't paying attention.
Bob even goes on to *quote* my full response to him on
the "substance of what [he] advised" --
>
> I have given him the roadmap that ANYONE doing the same or similar
> problem can, and should, follow.
RU >
> > [snip, some advice that the OP might or might not find useful.]
That's it -- the OP might or might not find the advice useful.
I agree that the OP showed ignorance, *and* so Bob's high-tech
details are apt to be beyond him. I've said, more than once,
I think, that Bob is basically fine on regression, especially the
subset of technical things. Bob is not paying attention.
> >
> > Bob was last in this 'rant' mode was back in late July or so.
>
> Instead of addressing the substance of my post, Richard Ulrich went
> into HIS version of issues I had discussed, pointing out HIS errors,
> and how he had malpracticed statistics.
>
> Of course Richard Ulrich in his self-serving version completely
> distorted the truth.
>
> I would URGE readers to go back to the archives and read whatever
> I have posted, and see that Ulrich was at best LYING, in an attempt
> to conceal his blunders.
> >
> > At that time, within just a couple of weeks, he totally blew
> > a *regression* question, no less --
>
> That only shows the LACK of credibility in anything Richard Ulrich
> says about me. I was having a discussion with Jerry Dallal, who
> does know infinitely more about regression than Richard Ulrich.
> The only ERROR in that thread was when Richard Ulrich stuck
> his nose into our conversation and made a blunder HIMSELF,
> miscopying what Jerry had posted.
Okay, here are some links in Google-groups. - You can
access the whole threads by clicking on the "Subject".
Regression, an R-squared defined as negative. I thought
Bob could have recognized his error, since everyone else
disagreed with him.
http://groups.google.com/group/sci.stat.math/msg/130aa115f8d5a075
Jerry Dallal never did accept Bob's argument. See J Dallal's #22
(July 14, last post in the first sequence). Also see his #37 on
July 17. I see that at the end, #23 on July 14, Bob points back
to his first post in the thread, where BOB mentioned Ordinary
Least Squares Regression -- which was *never* the subject of
the thread.
Today I conclude that Bob was (again) not paying attention,
and being permanently off-topic was his error. Or, maybe
he is so far away from social science that he *still* does
not recognize the convention....
The link above is to Radford Neal's post, #33. It includes,
"It's just speculation on my part, but maybe (just a possibility, mind
you), Richard Ulrich might perhaps (I know it's silly, but bear with
me), have in his mind that the right definition of R^2 to use is one
that actually, you know, means something useful."
Why I thought that Bob could have felt bad about his
contribution on "proportions" was similar, but more so.
Hypothesis testing with proportions: There were multiple posters.
http://groups.google.com/group/sci.stat.edu/msg/75f4cdd001dd442b
This is another Radford Neal post, July 27. It ends with a paragraph,
'Ignore the advice from "Reef Fish", who is a raving idiot.'
My own summary on wider issues is July 31. Reef Fish had
jumped to erroneous conclusions about the nature of the
proportions, and made his own unstated, necessary assumptions
about homogeneity, and eventually suggested forcing the problem
into a binomial-proportion mold.
My other point about July had to do with power analysis.
In the original thread, I cited statistical power analysis in
response to a question about making use of Type 1 and
Type 2 error. The original thread is at
http://tinyurl.com/b7u84
- That is Bob Ling's response, which mainly says to me
that Bob has never heard of power analysis. Again, click
on the Subject to see the rest.
The problem is not strictly "paying attention." Or maybe
it is. Bob does not seem to notice that "power analysis"
is not something I am inventing on the spot, regardless of
what I say. My own original terminology was somewhat
informal,
Bob's rant, which I figured he might feel ashamed of, was
not in the original thread. The URLs he cites are himself,
saying exactly the same thing.
http://groups.google.com/group/sci.stat.math/msg/b562d869497323dd
This has just about no useful content.
[snip, rest. See next response.]
On 11 Nov 2005 08:05:11 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
[snip, what was just responded to.]
>
> It's all in the archives.
>
> It would be a waste of everyone's time for me to rebut Ulrich on
> anything else because it's ALL readily retrievable in the archives.
> Note that Ulrich NEVER cited any specifics, only HIS verbal
> version of distortion -- which is Ulrich's modus operandi, in his
> attempt to defend his Quackery and Malpractice in statistics.
I should say, Bob Ling has been notable for seldom giving
any specifics with his insults and name-calling. - In this
most recent post, I could detect that he wanted details on
why I thought he should have backed off from criticizing
back in July -- I think I've given that to him now.
Bob says, "Look in the archives." But they don't show
what he believes, just like the texts he cites don't say
what he believes. Bob is not paying enough attention.
And he doesn't give a damn. It must be that he quit
posting in July because he became bored.
I seem to defend my -- our? -- quackery rather successfully.
Word twisting? In comparison to Bob's inarticulate front, yes.
But he has the difficulty of preaching a message that
no one around here has any inclination to believe.
Most of the time, that has had to do with approaches to
regression, and *his* contention that just about nothing,
(no meaning, that is) can be inferred from observational
studies. That includes epidemiology, and that includes
the "notion" that statistical evidence is worthwhile in showing
that smoking causes cancer.
Bob lays the title 'quack" on everyone in the social sciences,
and doesn't notice that no one agrees with him, in general or
in particulars. In the last go-round, he gave his reference
from the Tukey book, and a gloss on Box, and not a person
sides with his *reading* of those sources.
Accusing of "lying" .... Disagreeing on perceptions is not lying.
Posting error, with hostility and malice, and persevering
despite strong evidence presented surely gives an appearance
of "lying" and that is what Bob has been guilty of before, in
trying to show my incompetence and "errors". See his exploits
on arguing competency in using Google.
I don't lie. That's practically a lie, to assert it again, given
the whole history. I *do* present arguments that Bob is
unwilling or unable to meet. I *do* see that nobody hardly ever
agrees with Bob when he calls me names, and other people
do call him troll and net.kook, and recommend kill-filing
him.
>
> It's all deja vu.
>
> < Richard Ulrich's self-serving distortions snipped -- interested
> readers can, and SHOULD, read those threads themselves!!
> Without any coaching from Richard, or ME. >
>
>
> > Bob -- Why don't you go back to enjoying your vacation?
>
> I AM! But I am not going to let Quacks like you get away so
> easily with your LIES and bad advice to the OP, while going into
> your baseless attack of me, without addressing ANY of the
> model-building and multiple regressions issued I discussed in
> THIS thread.
Bob surely have been letting us get away with it, hasn't he?
Is he fooling himself? not paying enough attention?
I can't think of a single piece of bad advice that he has
ever prevented *me* from "getting away with".
It's all rhetoric for him, and he doesn't give a damn how
sad he looks at it, but he doesn't pay enough attention
to realize how bad he looks at it.
Let those readers who wanted to find out, read it for themselves and
decide for themself, without any coaching from Ulrich or me:
> > < Richard Ulrich's self-serving distortions snipped -- interested
> > readers can, and SHOULD, read those threads themselves!!
> > Without any coaching from Richard, or ME. >
That should have been the END of it.
Now, Richard Ulrich is back, in two tedious replay of his same self-
serving tune, distorting everything that's in the archives for anyone
to see how wrong he was.
After I had already given the OP the roadmap about where he went
wrong and how he should have proceeded, Richard Ulrich then
followed my script TWO DAYS LATER (though not quite successfully,
because Richard obviously didn't understand it himself) to give the
sehwail further plagiarized advice, in which Ulrich's muddled mind
showed!
RU> [snip, most of the stuff about normality testing, etc.
RU> These are not essential to regression or to the Question
RU> being asked, so long as you have confirmed that all the
RU> data values are legitimate and reasonable, with no big
RU> outliers to confuse whatever is going on.]
"not essential"? Absolutely unnecessary and irrelevant!
I had already said it plainly and unambiguously to the OP two days
before Richard's attempted (but misplayed) plagiarism:
OP> A. Normality
OP> I started with checking the normality of the IVs and DV
RF> WHY? There is absolutely NOTHING in any
multiple-linear-regression
RF> model that requires ANY of the Ind. vars. or the Dep. Var. (which
is an
RF> aggregate of n observations from n DIFFERENT distributions, even
if
RF> you assume each of them is normal) to be Normal.
RF>
RF> Therefore, you work below is precisely Garbage In, Garbage Out,
RF> because all the numbers are 100% irrelevant to any regression
model
RF> you are trying to fit. It's a complete waste of computer and
human time.
Let me emphasize the "100% irrelevant" to any model and any problem
of the type the OP was attempting.
After bungling my advice about the absolute lack of necessity om doing
any
plot looking for normality, Richard Ulrich improvised this part
himself:
RU> with no big outliers to confuse whatever is going on.]
This is an unmistakable manufestation of Ulrich's OWN confusion about
those plots sehwail did, and my advice that they were 100% useless.
Outlier in WHAT? Any outlier in ANY of the independent OR the
dependent
variable is 100% irrelevant to the probability model about the ERRORS
(residuals) of the model, because no model had been fitted at that
stage!
That's the kind of Quakery characteristic of Ulrich's comment on any
linear model or regression problem. It sounds learned and reasonable
in his muddled mind, but absolute GARBAGE to anyone in the know.
I read Richard's bungled advice soon after he posted it, but decided
to let it go. Until Richard Ulrich felt compelled to bring out some
DEAD HORSESl that were entirely irrelevant to the PRESENT
THREAD or the PRESENT PROBLEM, and dwelled on his own
distorted fantasy on them, rather than addressing the actual problem
at hand or let the readers find his alleged fault of mine in his Dead
Horses.
Richard Ulrich's FINAL critique, comment, and advice (after having seen
mine) would have gotten him a grade of no better than 30 out of 100 if
he were in my data analysis course, for not knowing, and failing to
consider many of the RELEVANT steps in exploring nonlinear surfaces,
and for his absolute lack of technical substance in his "advice to
sehwail".
Anyone who is half way competent in data analysis and model building
could read Richard Ulrich's FINAL analysis for sehwail, and see for
themselves why Richard Ulrich is an incompetent Quack on the
subject.
Richard Ulrich, give it up.
You are bringing in your distorted Dead Horses only as your way of
trying to hide your inadequacy and malpractice in sehweil's problem.
OR, it is actually quite possible that you don't even KNOW how
ridiculously erroneous your adive to sehweil was! Or WHY!
I had told you many times before: your time will be better spent
LEARNING statistics and multiple regression yourself FIRST, with
the hours of tedious excuses posted for your own rationalization
and excuse.
You should pay need to the First Law of Holes,
"When you find yourself in a hole, stop digging."
That's my advice to you, RIchard Ulrich, the sci.stat.math resident
Quack
of statistics.
-- Bob.
Taking a few minutes from the cruise away from Montego Bay,
Jamaica, as my public service to combat statistical Quackery and
malpractice, exemplified by Mr. Richard Ulrich, formerly
Assistant Professor of Psychiatry, last verified (a few months ago)
as no longer employed in said department.
The public service would not have been taken had Richard Ulrich
not chosen to post his two lengthy, distorted, repeat of his Dead
Horses entirely false and faulty as well as irrelvant to the present
thread and discussion!
On 17 Nov 2005 19:35:39 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
> Richard Ulrich wrote:
> > On 11 Nov 2005 08:05:11 -0800, "Reef Fish"
> > <Large_Nass...@Yahoo.com> wrote:
> >
> >
> > [snip, what was just responded to.]
> > >
> > > It's all in the archives.
> > >
> > > It would be a waste of everyone's time for me to rebut Ulrich on
> > > anything else because it's ALL readily retrievable in the archives.
From formal debating, and from ideas about 'conflict resolution',
I try to make my criticisms explicit and grounded in tangibles.
Bob -- very apparently -- has the opposite prejudice.
He states gross generalities, and says "see the archives"
as if that will support him.... As one reader, I would like
to know, "How?"
What *I* see in the archives is mostly what I saw in the
first pass, where Bob is a jerk, insulting folks, and ending
up with the unpopular opinion.
> > > Note that Ulrich NEVER cited any specifics, only HIS verbal
> > > version of distortion -- which is Ulrich's modus operandi, in his
> > > attempt to defend his Quackery and Malpractice in statistics.
>
> Let those readers who wanted to find out, read it for themselves and
> decide for themself, without any coaching from Ulrich or me:
hmm. It looks to me like when I point to particular threads with
a specific reading, I'm not being specific, but distorting. But Bob
can say "archives" as if that is magical.
==== Some lines not commented on directly. Perhaps, indirectly.
======= back to commentary --
> I read Richard's bungled advice soon after he posted it, but decided
> to let it go. Until Richard Ulrich felt compelled to bring out some
> DEAD HORSESl that were entirely irrelevant to the PRESENT
> THREAD or the PRESENT PROBLEM, and dwelled on his own
> distorted fantasy on them, rather than addressing the actual problem
> at hand or let the readers find his alleged fault of mine in his Dead
> Horses.
>
> Richard Ulrich's FINAL critique, comment, and advice (after having seen
> mine) would have gotten him a grade of no better than 30 out of 100 if
> he were in my data analysis course, for not knowing, and failing to
> consider many of the RELEVANT steps in exploring nonlinear surfaces,
I think of Bob's "dead horses" more like "sleeping dogs",
and they may be "dogs" instead of "horses", even if he admits
error for conclusions and apologizes for abusive language.
I believe I would have been less likely to mention them if
I could have figured out what he was talking about. On
the other hand, if he is describing our own relative "competency
in responding", three examples were perfectly pertinent, and
never before discussed as that.
However, here we can agree on a couple of things. For one,
there are posts that don't need comment, because they do no
harm. For another, my "critique" surely would be inadequate
for an answer on response-surface fitting in a data analysis course,
BECAUSE that is not what I was doing. It is rather unlikely, in
my opinion, that this poster wanted or needed to do non-linear
fitting. It became evident, in my probing, that he isn't reading
his regression output correctly, and when he mentioned "nonlinear",
it was probably just grasping at straws.
When I think of the student-professor metaphor for these posts,
I think of myself as the professor. I've thought before this,
in respect to another thread, that Bob seemed to respond as
a professor in his office, responding to a bright student, and
then he was criticizing the rest of us for enticing the student
with clues: We were treating the question as a professor might,
when it was posed in the classroom, and we wished to elicit
student responses.
Bob wrote a fine reply, technically. I said that it might or
might not be useful [to the OP] -- My notion was, that
answer was far above his head, and not at all what he needed
to know right now.
> and for his absolute lack of technical substance in his "advice to
> sehwail".
- Similarly, I condemn Bob Ling for his absolute
lack of empathy and understanding in giving
statistical consultation to a struggling beginner.
I went on to point out where the OP had serious problems
in reading what he had on hand. And, wrong ideas about
what regression ought to be giving him.
>
> Anyone who is half way competent in data analysis and model building
> could read Richard Ulrich's FINAL analysis for sehwail, and see for
> themselves why Richard Ulrich is an incompetent Quack on the
> subject.
Let's see -- Bob Ling does not have the empathy to be
much of an educator, or, especially, to be a consultant
to people who are asking the wrong question.
I can leave the other Sleeping dogs lie, for now.
====== no comments on the rest.
>
>
> Richard Ulrich, give it up.
>
> You are bringing in your distorted Dead Horses only as your way of
> trying to hide your inadequacy and malpractice in sehweil's problem.
> OR, it is actually quite possible that you don't even KNOW how
> ridiculously erroneous your adive to sehweil was! Or WHY!
>
> I had told you many times before: your time will be better spent
> LEARNING statistics and multiple regression yourself FIRST, with
> the hours of tedious excuses posted for your own rationalization
> and excuse.
>
> You should pay need to the First Law of Holes,
>
> "When you find yourself in a hole, stop digging."
>
> That's my advice to you, RIchard Ulrich, the sci.stat.math resident
> Quack
> of statistics.
>
> -- Bob.
>
> Taking a few minutes from the cruise away from Montego Bay,
> Jamaica, as my public service to combat statistical Quackery and
> malpractice, exemplified by Mr. Richard Ulrich, formerly
> Assistant Professor of Psychiatry, last verified (a few months ago)
> as no longer employed in said department.
>
> The public service would not have been taken had Richard Ulrich
> not chosen to post his two lengthy, distorted, repeat of his Dead
> Horses entirely false and faulty as well as irrelvant to the present
> thread and discussion!
--
[snip, main body of message]
> Taking a few minutes from the cruise away from Montego Bay,
> Jamaica, as my public service to combat statistical Quackery and
> malpractice, exemplified by Mr. Richard Ulrich, formerly
> Assistant Professor of Psychiatry, last verified (a few months ago)
> as no longer employed in said department.
I will take this opportunity to mention that I am
about to return to the job market. If somebody knows
of a great opportunity for someone of my knowledge,
experience, and temperament, please let me know.
You can find most of my bibliography by using Google's
/more/ page: Click on Scholar, and put in < "RF Ulrich" >.
I see just a couple of "RF Ulrich's" that aren't me, and those
were in the 1970s.
The list seems to be generated from citations, and gives
citation counts. Inconveniently, one article can be split
among alternate spellings, etc. (For a *lot* of duplication,
check "JH Zar", and his textbook.) (For Bob's output, put in
"RF Ling" -- not an applied paper in the bunch, that I can see,
but I wouldn't know if he always has his name the same.)
I've spent the last year or so in "medical retirement,"
I guess I call it -- being treated for cancer of the throat,
and then recovering from the treatment. Surgery removed
half my epiglottis, and then radiation therapy added to
my difficulty in swallowing. Now I'm swallowing well
enough that I am past the main hazard of aspiration-
pneumonia. I hope.
See the "subject"!!
As Yogi would say, "It's deja vu all over again".
> > > > Note that Ulrich NEVER cited any specifics, only HIS verbal
> > > > version of distortion -- which is Ulrich's modus operandi, in his
> > > > attempt to defend his Quackery and Malpractice in statistics.
> >
> > > > < Richard Ulrich's self-serving distortions snipped -- interested
> > > > readers can, and SHOULD, read those threads themselves!!
> > > > Without any coaching from Richard, or ME. >
> >
> > That should have been the END of it.
Now regarding the actual question by the OP and Richard's FINAL
follow-up to the OP, with hte benefit of a roadmap drawn by me.
>
> However, here we can agree on a couple of things. For one,
> there are posts that don't need comment, because they do no
> harm. For another, my "critique" surely would be inadequate
> for an answer on response-surface fitting in a data analysis course,
> BECAUSE that is not what I was doing. It is rather unlikely, in
> my opinion, that this poster wanted or needed to do non-linear
> fitting. It became evident, in my probing, that he isn't reading
> his regression output correctly, and when he mentioned "nonlinear",
> it was probably just grasping at straws.
That was only ONE of the issues in model building!
You conveniently neglected to defend your ERRORS in advising
him to look at normality and outliers in the scatterplots of the
INDEPENDENT varuabkes and the DEPENDENT variable, when
he was looking for a multivariable regression fit, LINEAR OR NOT.
That was the absolutely unnecessary part that you failed ot
understand, as the OP did not know but probably understood
NOW, since by his later account, he certainly seemed to have
read more regression books and journal articles than Richard
Ulrich ever had.
>
> Bob wrote a fine reply, technically. I said that it might or
> might not be useful [to the OP] -- My notion was, that
> answer was far above his head, and not at all what he needed
> to know right now.
Perhaps. But he certainly does not need Richard Ulrich to make
the same blunder he did as in the "normaility" and "outlier" issues,
not to mention the serious omission!
Isn't it convenient to gloss over those errors of yours, Richard?
and this is all Richard could come up this round:
> - Similarly, I condemn Bob Ling for his absolute
> lack of empathy and understanding in giving
> statistical consultation to a struggling beginner.
This comment was actually Richard Ulrich's self-recommendation
as a consultant, because he has lost his job and is looking for one
NOW.
I can give anyone looking for a consultant for STATISTICAL
advice on theory, methodology, and practice that hiring
Richard Ulrich for such a job would be like hiring a drowning
victim in a shallow pool to be a Life Guard in a deep ocean!
I mean it SERIOUSLY, based on the numerous posts by
Richard Ulrich I've read and discussed during a period of
about 6 month's of participation in rec.stat.math/edu groups.
The evidence was AMPLE and Unequivocal!
Everyone (even Richard Ulrich) could see that sehweil was a
"struggling beginner", and I advised him accordingly. after giving
him a LENGTHY explanation of his (sehweil's) errors to sehweil,
in my initial response:
RF> In your case, I strongly advice taking an elementary course in
applied
RF> data analysis in model building, and not try to do-it-yourself
after
RF> reading a few too-simplistic articles or posts in newsgroups.
But Richard Ulrich failed to recognize his OWN "struggling beginner"
status on the subject, and tried his "blind leading the blind trick"
to which Richard has grown accustomed.
> > Anyone who is half way competent in data analysis and model building
> > could read Richard Ulrich's FINAL analysis for sehwail, and see for
> > themselves why Richard Ulrich is an incompetent Quack on the
> > subject.
>
> Let's see -- Bob Ling does not have the empathy to be
> much of an educator, or, especially, to be a consultant
> to people who are asking the wrong question.
People ask the wrong question(s) all the time. Educator or
consultant, there is NO EXCUSE for compounding his wrong act
of trying to look for normality in the scatterplots by adding
YOUR own wrong acts/advice of looking for outliers and that
"normality is not essential" when it was absolutely UNNECESSARY!
>
> I can leave the other Sleeping dogs lie, for now.
>
> ====== no comments on the rest.
My comment to Richard Ulrich stands, and clarified and emplified
in this post, relative to sehweil's problem in the sehweil thread.
-- Bob.
Here's an annotated copy of new lines, with all previous
lines deleted. The deletions don't seem to hurt, to speak of.
==========start of Bob Ling's reply. Annotated briefly.
On 23 Nov 2005 19:56:13 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
=====5 lines intro; previous; 2 lines intro; previous.
> Richard Ulrich wrote:
>
> See the "subject"!!
>
> As Yogi would say, "It's deja vu all over again".
[9 lines previous]
> Now regarding the actual question by the OP and Richard's FINAL
> follow-up to the OP, with hte benefit of a roadmap drawn by me.
[84 lines previous]
=====6 lines useful comment; 6 lines stupid insult.
> That was only ONE of the issues in model building!
>
> You conveniently neglected to defend your ERRORS in advising
> him to look at normality and outliers in the scatterplots of the
> INDEPENDENT varuabkes and the DEPENDENT variable, when
> he was looking for a multivariable regression fit, LINEAR OR NOT.
>
> That was the absolutely unnecessary part that you failed ot
> understand, as the OP did not know but probably understood
> NOW, since by his later account, he certainly seemed to have
> read more regression books and journal articles than Richard
> Ulrich ever had.
>
=====previous; 8 lines repetition of slam; previous.
[5 lines previous]
> Perhaps. But he certainly does not need Richard Ulrich to make
> the same blunder he did as in the "normaility" and "outlier" issues,
> not to mention the serious omission!
>
> Isn't it convenient to gloss over those errors of yours, Richard?
>
>
> and this is all Richard could come up this round:
[3 lines previous]
=====3 lines distortion, seeming to invite reprise of comments on
=====Bob's circumstances; 10 lines insult.
> This comment was actually Richard Ulrich's self-recommendation
> as a consultant, because he has lost his job and is looking for one
> NOW.
>
> I can give anyone looking for a consultant for STATISTICAL
> advice on theory, methodology, and practice that hiring
> Richard Ulrich for such a job would be like hiring a drowning
> victim in a shallow pool to be a Life Guard in a deep ocean!
>
> I mean it SERIOUSLY, based on the numerous posts by
> Richard Ulrich I've read and discussed during a period of
> about 6 month's of participation in rec.stat.math/edu groups.
>
> The evidence was AMPLE and Unequivocal!
=====4 lines curious comment; previous; 3 lines insult; previous.
> Everyone (even Richard Ulrich) could see that sehweil was a
> "struggling beginner", and I advised him accordingly. after giving
> him a LENGTHY explanation of his (sehweil's) errors to sehweil,
> in my initial response:
[3 lines previous]
> But Richard Ulrich failed to recognize his OWN "struggling beginner"
> status on the subject, and tried his "blind leading the blind trick"
> to which Richard has grown accustomed.
>
[8 lines previous]
=====5 lines summary, of a sort; previous; 2 lines humbug; previous.
> People ask the wrong question(s) all the time. Educator or
> consultant, there is NO EXCUSE for compounding his wrong act
> of trying to look for normality in the scatterplots by adding
> YOUR own wrong acts/advice of looking for outliers and that
> "normality is not essential" when it was absolutely UNNECESSARY!
[4 lines previous]
>
> My comment to Richard Ulrich stands, and clarified and emplified
> in this post, relative to sehweil's problem in the sehweil thread.
>
> -- Bob.
[38 lines previous]
=========end of Bob Ling's reply.
I count 6 lines of useful comment, 4 lines of curious comment,
and 5 lines of summary. Plus 32 lines of insult, etc., and a few
lines of miscellaneous. My reader reported that as a 232 line post.
(Does Bob consider himself a professional of some kind?)
I'll reply to the "curious comment" and to the summary.
Here are the 4 lines again --
> Everyone (even Richard Ulrich) could see that sehweil was a
> "struggling beginner", and I advised him accordingly. after giving
> him a LENGTHY explanation of his (sehweil's) errors to sehweil,
> in my initial response:
"... advised him accordingly"?
Bob advised him to take an elementary course in data analysis,
and advised him to ignore my comments and to ignore the
initial R-squareds. I still don't see much more.
I still see Bob's recommendations as beyond the reach of
most struggling beginners. I looked back at it the other day, and
it was more complete than I remembered. It is a nice 'road map'
(as Bob called it) for someone who has been down that road
before, but it is not (I think) self-explanatory to beginners.
If Sehweil had said, "Wow, thanks!" , that would have proven it
as *fitting* advice for this OP. He didn't. I still see it as
advice by someone unaccustomed to such beginners.
Sehweil responded to mine, and gave additional details
that I pointed out (Nov 12) as including totally wrong
readings of whatever listings he had. No replies by him
after that one.
Bob's 5 lines of "summary" --
> People ask the wrong question(s) all the time. Educator or
> consultant, there is NO EXCUSE for compounding his wrong act
> of trying to look for normality in the scatterplots by adding
> YOUR own wrong acts/advice of looking for outliers and that
> "normality is not essential" when it was absolutely UNNECESSARY!
Bob says, I guess, "don't worry about outliers, and normality
is 'absolutely UNNECESSARY!'" For normality, that's posed
against my view that "normality is 'not essential' - but I do
think it is nice to hear about." I think Bob is ill-advised
in his scorn. This makes me wonder what Bob thinks about
Tukey's book on Exploratory Data Analysis. Is there any
really good, purely statistical justification for box-and-whisker
plots?
In the context of consulting beginners, it is my experience
that it is useful to get them to talk about their data, and it is
useful to encourage them to say something statistical so
that you can judge how much they know, and what vocabulary
is available. "Normality" is one good opener. (I never
formulated that before, but it seems accurate. I spent years
up to 1995 as the go-to statistical resource person at WPIC.
All the researchers knew my name because I was also the guy
who approved their main-frame computer projects. I did
1000+ statistical consultations in 20 years. I saw a lot of
rank amateurs and a lot of 'silly mistakes.')
When a single outlier accounts for (say) 50% of the variance,
modeling with any ANOVA technique is hardly ever wise,
and even less so if the modeler doesn't know about it. And
make sure they aren't forcing "-999" as the missing value into
the analysis. So, "describe the scattergram." That's the
relevance of outliers.
Further, it is a neat fact concerning observed data (as opposed
to data sampled to fill a design) that the transformation that
produces normality often will produce (a) homogeneity of
variance of its own errors, and also (b) linearity with various
co-factors. So, normality is nice to know about, because it's
absence elevates the potential for other problems -- either
in modeling or in eventual testing. So, you can get along
without it, and without knowing about it. But for Bob to give
us the reaction that he did, IMHO, is probably a function
of his inexperience. Narrow horizons. Not looking ahead.
"Richard Ultich the sci.stat.math resident Quack is
beating his Dead Horses AGAIN"
Given the above, and MOST of Ulrich's statistical
malpractice were done in the sci.stat.edu newsgroup,
I have included sci.stat.edu in this posting, to WARN
all readers of both groups, to be aware of Richard
Ulrich's incompetence and quackery as a statistician.
Richard Ulrich wrote:
> Well, giving Bob the full previous text to work with
> sure did not work, in terms of obtaining much
> useful commentary. Now, the opposite approach,
> more like I usually do, performed on the Reply.
Don't you think those readers who are interested or capable
know how to READ the post AND thread to see for themselves
what you and I and the OP sehwail have said to decide for
themselves without YOUR rehash, out of context AND
distorted?
You only pasted what *I* wrote, without the CONTEXT of
my comments, and that is "out of context".
Even your SUBJECT is grossly distorted AND out of context.
>
> Here's an annotated copy of new lines, with all previous
> lines deleted. The deletions don't seem to hurt, to speak of.
What do mean "Bob Ling vs normality"? I don't use "Bob
Ling" when I post, because there are hundreds of other
Bob Lings on the internet. For NETIQUETTE, you should
refer to my post as "Reef Fish" which is unique in all
newsgroups since google track posts since 1981; and
anyone who has read this groups more than a thread of
two in which I participated would have know that Reef
Fish is THIS "Dr. Robert F. Ling", which doesn't require
any of your gratuitous change of an ongoing SUBJECT
thread, to make your NOISE, out of context.
By your "vs normality", I was making a point to sehwail
and to Richard Ulrich, that when NORMALITY is NOT
required, or the independent variables, say, in a regression
problem, it's a complete waste of time to examine the
"normaility" of those variables!
> ==========start of Bob Ling's reply. Annotated briefly.
Readers can read my post AND the thread to read what
I said, WITH the contextr in which sehwail AND Richard
Ulrich BOTH erred, in matters of examining independent
variables for normality!
I snipped the rest of Richard's out-of-context quotes
except those relevant to my explanation of the LACK of
NECESSITY in the portions in which normality was
discussed by sehwail and Ulrich, because they are
BOTH very confused and muddled about where the
"normality assumptions" in the usual regression
context occur and NEED to be examined.
> On 23 Nov 2005 19:56:13 -0800, "Reef Fish"
> <Large_Nass...@Yahoo.com> wrote:
THAT's how you should have referred to the post in question,
and earlier posts!
> > Perhaps. But he certainly does not need Richard Ulrich to make
> > the same blunder he did as in the "normaility" and "outlier" issues,
> > not to mention the serious omission!
> >
> > Isn't it convenient to gloss over those errors of yours, Richard?
Richard should have replied HERE, if he has anything relevant
to say this round.
> =====3 lines distortion, seeming to invite reprise of comments on
> > This comment was actually Richard Ulrich's self-recommendation
> > as a consultant, because he has lost his job and is looking for one
> > NOW.
Richard was using his insult of MY reply to brag about HIS
experience in consulting, etc., etc., and immediately followed
by his post about him "looking for a job".
> >
> > I can give anyone looking for a consultant for STATISTICAL
> > advice on theory, methodology, and practice that hiring
> > Richard Ulrich for such a job would be like hiring a drowning
> > victim in a shallow pool to be a Life Guard in a deep ocean!
That's MY Public Service Announcement to the public about
Richard Ultich's lack of statistical knowledge and his proven
record of making statistical BLUNDERS in what he posted.
> >
> > I mean it SERIOUSLY, based on the numerous posts by
> > Richard Ulrich I've read and discussed during a period of
> > about 6 month's of participation in rec.stat.math/edu groups.
> >
> > The evidence was AMPLE and Unequivocal!
>
> > Everyone (even Richard Ulrich) could see that sehweil was a
> > "struggling beginner", and I advised him accordingly. after giving
> > him a LENGTHY explanation of his (sehweil's) errors to sehweil,
> > in my initial response:
> [3 lines previous]
> > But Richard Ulrich failed to recognize his OWN "struggling beginner"
> > status on the subject, and tried his "blind leading the blind trick"
> > to which Richard has grown accustomed.
Examining the INDEPDENT variables for "normality" was ONE
specific example of sehwail and Ulrich's SAME blunder.
> > People ask the wrong question(s) all the time. Educator or
> > consultant, there is NO EXCUSE for compounding his wrong act
> > of trying to look for normality in the scatterplots by adding
> > YOUR own wrong acts/advice of looking for outliers and that
> > "normality is not essential" when it was absolutely UNNECESSARY!
Again, that referred to the examination for "normality" in the
variables that are NOT supposed to be normal. In fact, those
variables can even be categorical or nominal!
> > My comment to Richard Ulrich stands, and clarified and emplified
> > in this post, relative to sehweil's problem in the sehweil thread.
> >
> > -- Bob.
> =========end of Bob Ling's reply. <shown out-of-context>
I added the appropriate CONTEXT of what Richard Ulrich
cited, regarding the relevance to the "normality" issue, that
anyone who READ the post in its entirety would have seen
the same reason I REPEATED this time, to point out how
wrong Richard Ulrich was, and IS.
> I count 6 lines of useful comment, 4 lines of curious comment,
> and 5 lines of summary. Plus 32 lines of insult, etc., and a few
> lines of miscellaneous. My reader reported that as a 232 line post.
> (Does Bob consider himself a professional of some kind?)
Why didn't you spend your time counting lines, and citing
them OUT-OF-CONTEXT, by learning something from textbooks
about Regression Analysis and Model Building?
Reef Fish Bob is a statistical professional, with a Ph.D. (which
Richard does not have) in the subject of Statistics. Reef Fish
was a Full Professor in 1977 (in statistics), while Richard
Ulrich was an Assistant Prof, at age over 50, in a Department
of Psychiatry, and has since lost his job.
Ulrich> (Does Bob consider himself a professional of some kind?)
If Reef Fish Bob didn't, plenty of others do:
Reef Fish Bob was elected Fellow of the ASA by his peers
in 1984.
Reef Fish Bob was recognized by the publishers of "Who's
Who in the World") in the 1984 (Marquis publication) to be
a "statistican educator and consultant" with a citation (in
that Edition) that was LONGER than the citation for
Ronald Reagan (when he was President of the US at
the time), or Bill Clinton (who was only Governor of
Arkansas at that time).
Ulrich> (Does Bob consider himself a professional of some kind?)
Enough of a statistical professional to voice in a public
forum (sci.stat.math and sci.stat.edu) in the subject of
STATISTICS how blunders are made by a few who claim
themselves to be statisticians, while they have shown
nothing more than their statistical Quackery!
>
> I'll reply to the "curious comment" and to the summary.
> Here are the 4 lines again --
>
> > Everyone (even Richard Ulrich) could see that sehweil was a
> > "struggling beginner", and I advised him accordingly. after giving
> > him a LENGTHY explanation of his (sehweil's) errors to sehweil,
> > in my initial response:
>
> "... advised him accordingly"?
> Bob advised him to take an elementary course in data analysis,
> and advised him to ignore my comments and to ignore the
> initial R-squareds. I still don't see much more.
That's because Richard Ulrich's LACK of understanding about
the subject of Regression Analysis!! Period! An elementary
course in Data Analysis taught be a competent statistical
professional, would have set sehwail and Ulrich straight, given
my pointers.
Did Richard Ulrich think I should write a textbook in this
newsgroup to teach HIM and sehwail on how to do a regression
analysis and model building problem PROPERLY.
Ulrich> Bob advised him to take an elementary course in data analysis,
That was Ulrich's OUT-OF-CONTEXT distortion! This was the
context in my ORIGINAL reply to sehwail, regarding the building
of a prediction model with FOUR indep. variables, to sehwail,
AFTER pointing out his errors, including specifically his improper
use of normality:
RF> The is where the ART of model building in multiple-linear
RF> regression takes over from the SCIENCE of the methodology,
RF> because there is nothing in the science that is adequate in
RF> telling one what to do other than some "guided iterations" of
RF> trial and error, as discussed in George Box's JASA article
RF> on "Science and Statistics", and the course material in
RF> "Data Analysis" taught by many statisticians, including myself.
RF> In your case, I strongly advice taking an elementary course
RF> in applied data analysis in model building, and not try to
RF> do-it-yourself after reading a few too-simplistic articles or
RF> posts in newsgroups.
> I still see Bob's recommendations as beyond the reach of
> most struggling beginners. I looked back at it the other day, and
> it was more complete than I remembered. It is a nice 'road map'
> (as Bob called it) for someone who has been down that road
> before, but it is not (I think) self-explanatory to beginners.
What else could I have said, other than pointing him to an
ELEMENTARY course he should have taken? If that
recommendation is beyond sehwail's reach, which I take it
as Ultich's INSULT to sehwail, because sehwail had said:
sehwail> I have read several books on linear regression and
sehwail> infinite number of articles and presentation but I am
sehwail> still not sure if what I am doing correct or not
It appears that sehwail (at least according to HIS claim, apart
from the "infinite" number of articles) that he had read more
about statistics than Richard Ulrich did! He was simply
LACKING the GUIDANCE of a competent statistician -- which
is yet another reason for pointing him to take a COURSE in
Data Analysis.
>
> If Sehweil had said, "Wow, thanks!" , that would have proven it
> as *fitting* advice for this OP. He didn't. I still see it as
> advice by someone unaccustomed to such beginners.
I gave YOU (Richard Ulrich) the same sound advice (for
"beginners" <because you ARE one>) the same advice in
many statistical subject, pointing you to specific books
and journal articles -- which Ulrich never read or understood.
Isn't that the same as giving sehwail the same APPROPRIATE
advice?
>
> Sehweil responded to mine, and gave additional details
> that I pointed out (Nov 12) as including totally wrong
> readings of whatever listings he had. No replies by him
> after that one.
Sehweil thanked Richard (as mere courtesy), to explain
to Richard his further blunders (which he didn't know at
the time),
That was merely a case of the Blind (Ulrich) leading the
Blind (sehwail).
AFTER Richard's FINAL response (which *I* had pointed
out Rich Ulrich's blunders, as well as given sehwail my
detailed response (giving him the "roadmap" which
Richard Ulrich failed to follow), isn't that good reason
"No replies by him <sehwail> after that one"?
Sehwail is welcome to reply to this or other posts of mine
on his topic. But isn't it reasonable to assume that
sehwail UNDERSTOOD and FOLLOWED my advice
better than Richard Ulrich did?
> Bob's 5 lines of "summary" --
> > People ask the wrong question(s) all the time. Educator or
> > consultant, there is NO EXCUSE for compounding his wrong act
> > of trying to look for normality in the scatterplots by adding
> > YOUR own wrong acts/advice of looking for outliers and that
> > "normality is not essential" when it was absolutely UNNECESSARY!
>
> Bob says, I guess, "don't worry about outliers, and normality
> is 'absolutely UNNECESSARY!'" For normality, that's posed
> against my view that "normality is 'not essential' - but I do
> think it is nice to hear about." I think Bob is ill-advised
> in his scorn.
I spoke against ONLY the "normality" check of the INDEPENDENT
variables, or the AGGREGATE of the observed Ys.
Richard Ulrich is merely amplifying his own muddle and called
that "ill-advised".
< Richard Ulrich's self-advertisement as a statistical consultant
snipped>
> I did 1000+ statistical consultations in 20 years. I saw a
> lot of rank amateurs and a lot of 'silly mistakes.')
I had done that over 20 years AGO -- and most of those 1000
were my graduate STUDENTS, in courses in Data Analysis,
in which they were taught how to do it CORRECTLY!
> When a single outlier accounts for (say) 50% of the variance,
> modeling with any ANOVA technique is hardly ever wise,
> and even less so if the modeler doesn't know about it.
Ulrich CONTINUED his muddle!! Outliers in the
INDEPENDENT variables are to be taken care by the
appropriate FITTING MODEL, so that there's no outlier
in the RESIDUALS.
The accomodation could be by an indicator variable of
a one-time occurring event or through other methods.
Richard Ulrich continue to be confused and muddled
about the "normality" assumption on the ERRORS,
rather than on the predictor (independent) variables
themselves.
>
> Further, it is a neat fact concerning observed data (as opposed
> to data sampled to fill a design) that the transformation that
> produces normality often will produce (a) homogeneity of
> variance of its own errors, and also (b) linearity with various
> co-factors. So, normality is nice to know about,
NOT on the independent variables!!!!!!!!!!!!!!!!!!!!!!!!!!
Normality is on the ERRORS, in a fitted model.
Sehwail did NOT discuss the normaility of the residuals,
only his independent variables X1, X2, .. X4.
As the saying goes, "before Richard opens his mouth,
there is doubt to the depth of his ignorance; but as
soon as he opens his mouth, he removed ALL doubts"!
Richard Ulrich's REPEAT of his own blunder, in the
examination of NORMALITY in the independent
variables, which could even be NOMINAL, leaves
absolutely NO DOUBT, that he is an incompetent
Quack (together with his numerous errors and
blunders in other threads in regression analysis
and linear models) in the subject of statistics!
> because it's
> absence elevates the potential for other problems -- either
> in modeling or in eventual testing.
Richard obviously did not read or understand the Box
article on "Science and Statistics" I recommended to sehwail,
and had recommended to Ulrich many times, every time
Richard made some blunder in a regression problem.
> So, you can get along
> without it, and without knowing about it. But for Bob to give
> us the reaction that he did, IMHO, is probably a function
> of his inexperience. Narrow horizons. Not looking ahead.
Coming from an unemployed former Asst Prof. in a Department
of Psychiatry, and champion of posting statistical blunders
in his own malpractice and statistical quackery, Richard
Ulrich thinks I need his endorsement to become a
statistician or dreads his unfounded insults and criticisms.
> --
> Rich Ulrich, wpi...@pitt.edu
> http://www.pitt.edu/~wpilib/index.html
Give it up, Richard. Do the statistical community a FAVOR by
keeping your mouth firmly SHUT, on subjects in which you
know nothing about!
As the saying goes, "before Richard opens his mouth,
there is doubt to the depth of his ignorance; but as
soon as he opens his mouth, he removed ALL doubts"!
-- Bob.
Ph.D. in Statistics and Fellow of the ASA (1984), and
cited as "statistical educator" in Marquis's "Who's Who
in the World" (1984 and some later editions).
> Reef Fish Bob was recognized by the publishers of "Who's
> Who in the World") in the 1984 (Marquis publication) to be
> a "statistican educator and consultant" with a citation (in
> that Edition) that was LONGER than the citation for
> Ronald Reagan (when he was President of the US at
> the time), or Bill Clinton (who was only Governor of
> Arkansas at that time).
Thank you for the reminder (WHICH HAS BEEN POSTED ON USENET FOR THE
24TH TIME SINCE 1994)
How much did the inclusion cost?
For a real measure of self worth, why don't you compare the US
Presidents usenet posting numbers with your own.
mudshrimp had a LIFE TIME posting history of 6 times, all following
my posts, in 5 different newsgroups. That's what I called a
dedicated stalker, whom the rec.scuba group folks called
"worshippers" of the posting author Reef Fish.
> Reef Fish wrote:
Which mudshrimp negliected to note that was my response to
the specific question of Richard Ulrich:
Ulrich> (Does Bob consider himself a professional of some kind?)
to which I responded:
RF> If Reef Fish Bob didn't, plenty of others do:
> > Reef Fish Bob was recognized by the publishers of "Who's
> > Who in the World") in the 1984 (Marquis publication) to be
> > a "statistican educator and consultant" with a citation (in
> > that Edition) that was LONGER than the citation for
> > Ronald Reagan (when he was President of the US at
> > the time), or Bill Clinton (who was only Governor of
> > Arkansas at that time).
>
> Thank you for the reminder (WHICH HAS BEEN POSTED ON USENET FOR THE
> 24TH TIME SINCE 1994)
Really? Why 1994? I've been posting since 1987, and the
cited paragraph above was posted EXACTLY ONCE.
The Reagan and Clinton reference (compared to the length
of my citation in the 1984 citation (not 1994) was posted
EIGHT times, in response to idiots similar to Richard Ulrich
asking the same insulting question and making the same
insulting comments.
> How much did the inclusion cost?
Why can't you be ORIGINAL? Plenty of IDIOTS in rec.scuba,
when they saw the "Who's Who in the World" reference
thought it was something you can BUY, or that it costs the
person cited any money!
The EDITORS of the "Who's Who" sought me out, when they
did the research and THEN ask me to correct any error they
might have made, or suggest additions and deletions.
It cost me absolutely NOTHING to be cited in that or the
dozens of lesser "Who's Who" publications that are not worth
the paper they are printed on.
They are paid for by LIBRARIES throughout the WORLD.
>
> For a real measure of self worth, why don't you compare the US
> Presidents usenet posting numbers with your own.
Because I have stated publicly that the time I spent in my 100,000
newsgroup posts
RF> To put things into proper perspectives: The TOTALITY of
RF> time I spent in those 100,000 posts is considerably LESS
RF> than the time I spent in writing a dozen or so of my
RF> published journal articles, or the books I wrote, or the
RF> time I spent directing the doctoral dissertations of my
RF> Ph.D. students.
In fact, I had said elsewhere that the TOTALITY of time I spent
on those posts was less than the time I spent on any ONE of
several papers I had published in JASA (Journal of the American
Statistical Association).
mudshrimp, why don't you uncloak you chicken-manure
worshipper identity, and speak up like a responsible citizen of
the usenet, such as Reef Fish Bob, which clueless Richard
Ulrich insist to refer to as "Bob Ling" because he had seen
several other IDIOTS in rec.scuba groups did, when they
tried to smear my name in my profession.
If you want to find out about me, why don't you do what
did in researching in the google WEB page, and posted it
in the sci.stat.math group:
and wrote,
*> It turns out that if
*> the Reef Fish's presumed name is entered into Google between
*> quotes and with his middle initial, Google returns "about 159"
*> results, about 70 of which are displayed before "In order to
*> show you the most relevant results, we have omitted some entries
*> very similar to the 70 already displayed" ALL of which are
*> about that one person and show that person's statistical
*> credentials to be beyond reproach.
For YOU, Richard Ulrich, and others intending to smear my professional
name, by insisting to refer me as Bob Ling rather than Reef Fish or
Reef FIsh Bob, you should pay CLOSE attention to
"the Reef Fish's presumed name"
which is NOT "Bob Ling", but my FULL NAME, with a MIDDLE
INITIAL, that is included in many of my publications, together with
my professional e-mail address (which is NOT
Large_Nass...@Yahoo.com)!
mudshrimp and Richard Ulrich, for all I know you may be the SAME
poster because you both SPECIALIZE in posting your impertinence
about me, distorting and misrepresenting ALL factual matters.
Go GET A LIFE, instead of polluting the statistics newsgroups
sci.stat.math and sci.stat.edu in USENET.
-- Reef Fish Bob.
Ph.D. in Statistics (1970) and Fellow of the ASA (1984).
"Whom the Gods wish to destroy . . ."
C
On 29 Nov 2005 09:35:08 -0800, "Reef Fish"
C, completing the following two-question Multiple Choice
Quiz will do you much good, because you will score 100
no matter what choice you make, in the same manner no
matter how wrong Richard Ulrich has been in the
statistical topics on which he had posted, you think he is
going the right think and fear even a WARNING about
his Quackery which seemed to have escaped many
long-time readers in this group.
1. In the Kingdom of the Blind ...
(a) the Blind is eager to declare himself Blind, for
social acceptance.
(b) the Blind often leads the Blind, without anyone
warning about the cliff they are about to walk off.
(c) the One-Eyed Man is King
(d) the One-Eyed Man is happier if he gouges out
his seeing eye, so as not to see what the Blind
in the Kingdom of the Blind would not accept.
2. Fools Rush In Where ...
(a) there is a gathering of unexposed Fools
(b) he thinks he can be a member of the Mutual
Admiration Society of Fools.
(c) Angels fear to tread.
(d) someone DARED to expose some Fool in a thread.
Enjoy your perfect score.
-- Reef Fish Bob.
These are taken out of order, with a few lines quoted
from a 400+ line post.
On 29 Nov 2005 09:35:08 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
(1) Looking at Data.
RU >>
> > Bob's 5 lines of "summary" --
Bob > > >
> > > People ask the wrong question(s) all the time. Educator or
> > > consultant, there is NO EXCUSE for compounding his wrong act
> > > of trying to look for normality in the scatterplots by adding
> > > YOUR own wrong acts/advice of looking for outliers and that
> > > "normality is not essential" when it was absolutely UNNECESSARY!
RU > >
> > Bob says, I guess, "don't worry about outliers, and normality
> > is 'absolutely UNNECESSARY!'" For normality, that's posed
> > against my view that "normality is 'not essential' - but I do
> > think it is nice to hear about." I think Bob is ill-advised
> > in his scorn.
Bob >
> I spoke against ONLY the "normality" check of the INDEPENDENT
> variables, or the AGGREGATE of the observed Ys.
Okay, let us check on some assumptions here.
IF you are wanting to talk about what is inessential:
If no tests are planned, no normality is needed, anywhere.
I'll quote my example of the outlier, and Bob's reply.
RU > >
> > When a single outlier accounts for (say) 50% of the variance,
> > modeling with any ANOVA technique is hardly ever wise,
> > and even less so if the modeler doesn't know about it.
Bob >
> Ulrich CONTINUED his muddle!! Outliers in the
> INDEPENDENT variables are to be taken care by the
> appropriate FITTING MODEL, so that there's no outlier
> in the RESIDUALS.
>
> The accomodation could be by an indicator variable of
> a one-time occurring event or through other methods.
Okay, Bob says he will deal with the problem down-the-road,
whereas I figure my consultees want to deal immediately.
Maybe Bob will confirm if this is precise enough about the
difference between us. Bob is scornful the regression
practices of social sciences, economics, epidemiology and
so on. This was established in the summer, where he
eventually admitted the breadth of his nonconformity.
In consequence: We (social scientists) attempt to model
with *meaning*, and in pursuit of that, we check various
aspects of our variables, early and often. If a variable
isn't measuring what it is supposed to measure, we want to
replace it with something better. If one score contributes
half the sum of squares around the mean, that's probably not
the outcome *or* the predictor that we want to analyze.
One case does not carry enough meaning. Faced with a
question of meaning, it is time to look at whether the
variables are weird or not.
Bob doesn't like *meaning*. I agree, if we ignore meaning
and just want a "fit" -- and there's no choice or chance
of modeling with different numbers -- then there's little
point in checking for odd distributions until you've got
residuals.
Now, I think the Original Post showed a crisis of *meaning*.
The poster was complaining that the regression coefficients
were not what he expected. In retrospect, that sounds
exactly like what Bob was complaining about for months.
(2) Addressing Bob Ling as Bob Ling.
> What do mean "Bob Ling vs normality"? I don't use "Bob
> Ling" when I post, because there are hundreds of other
> Bob Lings on the internet. For NETIQUETTE, you should
> refer to my post as "Reef Fish" which is unique in all
> newsgroups since google track posts since 1981; and
> anyone who has read this groups more than a thread of
> two in which I participated would have know that Reef
> Fish is THIS "Dr. Robert F. Ling", which doesn't require
> any of your gratuitous change of an ongoing SUBJECT
> thread, to make your NOISE, out of context.
I figure I am honoring Netiquette more than Bob does, by
consistently returning a *lesser* version of his assault on
Netiquette. He should have guessed before now -- Bob can stop
*my* transgressions by ceasing his own. He calls me a quack,
or whatnot, and I call him by name. I figure this is about
the least possible violation, in respect to annoying general
readers. Readers won't know it bothers Bob until he tells them.
And Bob threw away all right to polite consideration from me,
long ago.
Similarly, that thread name was changed from something that
many readers would find objectionable to something that
mainly would bother Bob.
Googling groups for <"Bob Ling" group:sci.stat.*>, where
strangers might look for him, is always this Bob Ling/Reef Fish.
How else will strangers learn to find him?
My using his name does nothing to hinder searches on Reef Fish
since that's preserved in every Reply.
(3) Professional behavior, professional failings.
> Ulrich> (Does Bob consider himself a professional of some kind?)
>
> If Reef Fish Bob didn't, plenty of others do:
>
> Reef Fish Bob was elected Fellow of the ASA by his peers
> in 1984.
>
> Reef Fish Bob was recognized by the publishers of "Who's
> Who in the World") in the 1984 (Marquis publication) to be
> a "statistican educator and consultant" with a citation (in
> that Edition) that was LONGER than the citation for
[snip, rest]
Let's see. Bob beats a rhetorical question to death by
taking it literally. Is that a violation of something?
Here are some thoughts on professionalism which parallel
what Bob was inspiring in me.
In a science fiction novella by David Weber, an Ensign
(in a Space Navy) is musing on her superior officer:
"That clearly apparent contempt for anyone he considered
his inferior was the worse of the only two real failings
that Ensign Haverty had so far detected in him.
( /Professional/ failings, that was; the list of things
she detested in him on a personal level grew longer
with each passing day.) The other was a tendency to
ignore the unlikely in his planning and depend on his
natural intelligence and ability--both of which were
considerable, she admitted--to wiggle out of trouble
if it persisted in happening anyway....
"And however much she might dislike that trait, it was
far less disruptive and demoralizing than the contemptuous
(and public) verbal flayings he was in the habit of
handing out."
- page 297 in "The hard way home," collected in Worlds of
Honor (1999).
Contempt, revisited in 'verbal flayings', seems like "Bob all over".
His "ignoring the unlikely in his planning," "winging it,"
is more subtle, but that resonated with me when I read it.
He admits - brags, even - of posting hastily and not
spending much time on his internet participation. IMHO,
it shows. Most recently, it shows in the "flame-war"
syntax and framework of his insults to me. Exaggeration?
Invention? This is while he pretends that he is trying
to improve the content of the group.
If you've read Bob, I hope you recognize it. If you
don't read Bob, I won't encourage trying to catch up.
1) is a legit attempt on Ulrich's part on his rebuttal of what
I considered his BLUNDER in his advice to sehwail, the
OP of this thread about a multiple regression problem.
2) and 3) are entirely gratuitous on Ulrich's part, as his
continued ad hominem attack on Reef Fish Bob.
I am going to address (1) seriously and show why Richard
Ulrich has erred and blundered, AGAIN!
> (1) Looking at Data.
> RU >>
> > > Bob's 5 lines of "summary" --
> Bob > > >
> > > > People ask the wrong question(s) all the time. Educator or
> > > > consultant, there is NO EXCUSE for compounding his wrong act
> > > > of trying to look for normality in the scatterplots by adding
> > > > YOUR own wrong acts/advice of looking for outliers and that
> > > > "normality is not essential" when it was absolutely UNNECESSARY!
Correctly cited, but with a serious omission that the "absolutely
UNNECESSARY" part referred to the "normality check of the
INDEPENDENT variables Xs in a regression problem!
> RU > >
> > > Bob says, I guess, "don't worry about outliers, and normality
> > > is 'absolutely UNNECESSARY!'" For normality, that's posed
> > > against my view that "normality is 'not essential' - but I do
> > > think it is nice to hear about." I think Bob is ill-advised
> > > in his scorn.
Why don't you tell us WHY anyone should check for the "normality"
of any of the INDEPENDENT variables Xs in a multiple regression
problem? That was a well-deserved scorn on your part.
If something is "absolutely unnecessary" to be normal, why should
anyone check for normality? Do you check for normality of an
indicator variable X (which can take only values 0 or 1)? Does
it make sense (other than your own muddle) that it is "not
essential" to check the indicator variable for normality?
That's just ONE of the infinitely many candidates for the
INDEPENDENT (predictor) variables X in a regression that is
absolutely unnecessary for any of them to be "normal" in
distribution!
PERIOD!
You can mouth-dance all you want -- to check for the normality
of an independent variable X in a regression problem is clearly
and unmisitakably a sign of the IGNORANCE of the person
about the meaning of the "normality assumption" in a regression
problem.
> Bob >
> > I spoke against ONLY the "normality" check of the INDEPENDENT
> > variables, or the AGGREGATE of the observed Ys.
>
> Okay, let us check on some assumptions here.
> IF you are wanting to talk about what is inessential:
> If no tests are planned, no normality is needed, anywhere.
What test? The normality assumption pertains to the ERROR
(observable ONLY as residuals, AFTER a model has been fitted
to the data)! Richard Ulrich is continuing to muddle in his
misunderstanding of what is "required" and what is "absolutely
not necessary" in the variables in a multiple regression problem.
>
> I'll quote my example of the outlier, and Bob's reply.
>
> RU > >
> > > When a single outlier accounts for (say) 50% of the variance,
> > > modeling with any ANOVA technique is hardly ever wise,
> > > and even less so if the modeler doesn't know about it.
> Bob >
> > Ulrich CONTINUED his muddle!! Outliers in the
> > INDEPENDENT variables are to be taken care by the
> > appropriate FITTING MODEL, so that there's no outlier
> > in the RESIDUALS.
That stands as stated. No ifs and buts about it!
50% of the variance of WHAT? The independent variable X?
Richard Ulrich is talking through his hat in all his contrived
excuses when none of his alleged violations violated ANY
of the regression or ANOVA (regression via indicator variables)
problems.
The misunderstanding of the "normality assumption" in
a regression problem is so common and tempting that I have
catalogued it as ONE of the common BLUNDERS -- in my
Data Analysis lecture notes, to graduate students. Even the
undergrad students in the first course understood why the
independent variable X in a simple regression can have any
distributional shape WITHOUT violating any of the
distributional assumptions in a regression model!
It's a sad commentary that someone like Richard Ulrich,
who is supposed to have been trained as a statistician, and
who often advise others on statistical problems, cannot
even get past the FIRST STEP in a regression model.
> > The accomodation could be by an indicator variable of
> > a one-time occurring event or through other methods.
>
> Okay, Bob says he will deal with the problem down-the-road,
> whereas I figure my consultees want to deal immediately.
That's an UNMISTABLE admission of your BLUNDER of
looking for normality in the independent variables X.
My "deal with the problem down-the-road" means only ONE
thing -- that one should deal with the "normality assumption"
ONLY after a model has been fitted and there are RESIDUALS
to examine to check the (normality, independence, and
homoskedasticity) assumption imbedded in the probability
model of the ERRORS in a standard regression problem.
Not until then, and certain not before then, as Richard Ulrich
wants to deal with immdiately what is NOT required to be
normal.
> Maybe Bob will confirm if this is precise enough about the
> difference between us.
I definitely confirm, and re-confirmed, and re-re-confirmed
why Richard Ulrich was CONFUSED about the fact that
there is nothing in a probability assumption of a regression
model that requires any of the independent variables Xs
to be normality distributed.
That's the same confusion the OP sehwail had -- and I
said to him that he was wasting computer and human time
to check the normality of the indep vars. X!
I have no reason to believe that sehwail did not understand
what I said. I have every reason to believe, only to be
re-confirmed by Richard Ulrich, that he has repeatedly
ERRED, in his arguments that the variables X's need to
be dealt with immediately (as if there were a REASON
that it should be dealt with!)
> Bob is scornful the regression
> practices of social sciences, economics, epidemiology and
> so on. This was established in the summer, where he
> eventually admitted the breadth of his nonconformity.
Richard Ulrich is a perfect specimen, in sci.stat.math and
sci.stat.edu, to provide concrete evidence of his
MALPRACTICE which is to be scorned.
>
> In consequence: We (social scientists) attempt to model
> with *meaning*, and in pursuit of that, we check various
> aspects of our variables, early and often. If a variable
> isn't measuring what it is supposed to measure, we want to
> replace it with something better. If one score contributes
> half the sum of squares around the mean, that's probably not
> the outcome *or* the predictor that we want to analyze.
NONE of that verbiage has ANYTHING to do with Richard
Ulrich's mistaken notion that an independent vairable X in
a regression problem has to be normally distributed or
absent of outliers! Those are called "high leverage"
points, Richard, in the X-space. There is absolutely
nothing wrong with data with high leverage points. They
MAY or MAY NOT be "influential" in a regression problem.
Of course that kind of discussion is far beyond what Richard
Ulrich is capable of understand, when he is stuck muddling
in misapplying the "normality assumption" in a regression
problem!
< tedious excuses by Ulrich snipped. They were inexcusible!>
>
>
> (2) Addressing Bob Ling as Bob Ling.
>
> > What do mean "Bob Ling vs normality"? I don't use "Bob
> > Ling" when I post, because there are hundreds of other
> > Bob Lings on the internet. For NETIQUETTE, you should
> > refer to my post as "Reef Fish" which is unique in all
> > newsgroups since google track posts since 1981; and
> > anyone who has read this groups more than a thread of
> > two in which I participated would have know that Reef
> > Fish is THIS "Dr. Robert F. Ling", which doesn't require
> > any of your gratuitous change of an ongoing SUBJECT
> > thread, to make your NOISE, out of context.
>
> I figure I am honoring Netiquette more than Bob does, by
> consistently returning a *lesser* version of his assault on
> Netiquette.
What do mean by "honoring Netiquette" when NONE of my
100,000 posts were posted with "Bob Ling" as author, and
nearly ALL of them, since 1992, were posted with the
unmistable posting name of "Reef Fish"?
> He should have guessed before now -- Bob can stop
> *my* transgressions by ceasing his own. He calls me a quack,
> or whatnot, and I call him by name. I figure this is about
> the least possible violation, in respect to annoying general
> readers. Readers won't know it bothers Bob until he tells them.
> And Bob threw away all right to polite consideration from me,
> long ago.
Richard, when I called you a "quack" or said what you did
were examples of "malpractice of statistics", those are NOT
"name calling", but clearly documented and substantiated
statements about your MALPRACTICE of statistics!
>
> Googling groups for <"Bob Ling" group:sci.stat.*>, where
> strangers might look for him, is always this Bob Ling/Reef Fish.
> How else will strangers learn to find him?
That's a feeble excuse for your anti-netiquette behavior.
Google "Bob Ling" under the web section and you'll find
2,000,000 hits -- as I had mentioned before, and probably
no more than a couple of them were about Reef Fish Bob!
> (3) Professional behavior, professional failings.
>
> > Ulrich> (Does Bob consider himself a professional of some kind?)
You asked a (rhetorical) question. I gave a straight factual answer.
What's your beef now?
> >
> > If Reef Fish Bob didn't, plenty of others do:
> >
> > Reef Fish Bob was elected Fellow of the ASA by his peers
> > in 1984.
> >
> > Reef Fish Bob was recognized by the publishers of "Who's
> > Who in the World") in the 1984 (Marquis publication) to be
> > a "statistican educator and consultant" with a citation (in
> > that Edition) that was LONGER than the citation for
> [snip, rest]
>
> Let's see. Bob beats a rhetorical question to death by
> taking it literally. Is that a violation of something?
What's your purpose of asking the rhetorical question then?
>
> Here are some thoughts on professionalism which parallel
> what Bob was inspiring in me.
>
> In a science fiction novella by David Weber,
Irrelevant. Non sequitur.
> If you've read Bob, I hope you recognize it. If you
> don't read Bob, I won't encourage trying to catch up.
But read Richard Ulrich's unsubstantiated, distorted, ad
hominem attacks?
Richard Ulrich, whatever I said were your "quackery"
and "statistical malpractice" were AMPLY substantiated.
What you POST, as in this one, about your malpractice,
is clear and unequivocal, to anyone well-educated in
statistics, and especially in the TOPIC of regression
analysis.
Your statistical incompetence (evidence by what you
POST) is only circumstantially supported by your own
professional credentials of:
uneducated: in Statistics as pointed out in the numerous
times I specifically mentioned your LACK
of education in those topics.
unproven: Nearly 60 years of age, with the highest
degree of MS and highest academic rank
of "Assistant Professor" in a Department of
Psychiatry. Lack of publications of
statistical substance except fo those in
newsgroups much of which had proven to
be quackery and malpractice.
unemployed: A well-deserved status, given your
"professional credentials" above.
-- Reef Fish Bob.
Okay. If that's a serious omission, then perhaps we
do not disagree. Bob just reads me wrong.
That sounds like Bob agrees with me, that it is okay to
look for outliers. That was the tenor of my comments
about checking the independent variables.
I should have mentioned this before, because today
(January) Bob is mis-claiming again.
I don't test for normality. I don't recommend tests for
normality. I didn't recommend *testing* normality to this OP.
I do *like* normality, but that's a different matter. What
I particularly like is to know that my variables are not crazy.
I might use the word "check" in some context without
meaning "use a test."
When Bob doesn't quote, I'm never sure what comment he
is relating to.
[snip]
Bob >
> 50% of the variance of WHAT? The independent variable X?
I was alluding to the size-estimate from a very-simple test
for Outlier.
"By What fraction is the SS of a variable reduced, by
removing the most extreme value?"
Richard, give yourself a break in 2006.
Stop repeating your errors and blunders while trying to make
a case that "we do not disagree" and "Bob just reads me wrong".s
I vehemently disagreed with what you did, which you re-stated
again below as if I agreed with your nonsense in your malpractice!
>
> That sounds like Bob agrees with me, that it is okay to
> look for outliers. That was the tenor of my comments
> about checking the independent variables.
Richard, when are you going to LEARN the most BASIC
material in regression? You comment about checking the
independent variables was PRECISELY the result of your
muddle about the standard regression assumptions, what
can be checked and what is ABSOLUTELY UNCESSARILY
to check -- the independent variables X, for outliers or
anything else other than blatent typo errors.
If there is any outlier in X, it could be a point of ZERO
influence (good or bad) on the fitted model because it
could lie perfectly on the fitted regression model and its
removal would not change the fit in any way!!! That
is why it is a complete waste of time to check the
independent variables for any PROBABILITY assumption
(because there's none about the X's), nor for any of its
values being an outlier in the distribution of that variable
for the reason that until you have fitted SOME model,
there is absolutely and positive NOTHING you can
gain by checking on anything about the distribution of
the data values in the X's
You never understood why you were given sehwail the
bum advice on checking first the "normality" of the X's and
then toned it down to checking for "outliers" when I had
already pointed to sehwail (and YOU) that it was a complete
waste of time to do so before any model had been fitted!
BOTH of these are your blunders, Rich Ulrich, !!
What's left? NOTHING. You have managed in making
blunders in every aspect of regression analysis -- from
the unwarranted checking the independent variables, before
any fit is attempted, to the misinterpretation of the SIGNS
of the regression coefficients, to the use of regression
results on uncontrolled observation data to draw causal
inference, to the use of correlations between Y and X to
draw unwarrented causal inference.
After months and months of futilely arguing (as in your
current post) about what you did were not blunders or
errors, you're just continuing to wallow in your puddle of
ignorance, misrepresentation, and contined obfuscation.
>
>
> I should have mentioned this before, because today
> (January) Bob is mis-claiming again.
Just exactly what I mis-claimed?
>
> I don't test for normality. I don't recommend tests for
> normality. I didn't recommend *testing* normality to this OP.
You told the OP, after I told him that it was absolutely
unnecessary and a waste of time to check for normality of
the X's, that it was "not essential" to check the X's for
normality -- because of your own confusion about what
needs to be check.
Then you talked about checking the X's for outliers,
which as I explained above, AGAIN, that it is just as big
a BLUNDER of yours, as a result of your IGNORANCE
about why there's absolutely nothing you can benefit
or do about such outliers (if you found any) UNTIL you
have fitted some model and then analyse the RESIDUALS
of the distributional assumptions and the leverage or
influence of individual observations (singly or jointly
with other points) in undue influence on the fitted model.
> I do *like* normality, but that's a different matter. What
> I particularly like is to know that my variables are not crazy.
> I might use the word "check" in some context without
> meaning "use a test."
Whether a test is used for the "check" is entirely IRRELVANT.
That's how shallow your understanding of anything is. You
missed the essential LESSON behind my comments and
dwell on inconsequential words such as "check" or "test".
>
> When Bob doesn't quote, I'm never sure what comment he
> is relating to.
If you had quoted what I said to what YOU said (which were
your blunders), then you wouldn't have needed me to waste
my time quoting you.
>
> [snip]
>
> Bob >
> > 50% of the variance of WHAT? The independent variable X?
>
> I was alluding to the size-estimate from a very-simple test
> for Outlier.
Which was absolutely a waste of time and effort, as I repeatedly
stated -- that it's completely USELESS to check for outliers in X!
> "By What fraction is the SS of a variable reduced, by
> removing the most extreme value?"
This is completely OUT OF CONTEXT of what sehwail did. He
was checking (and you mis-advising him) on checking the X's,
before ANY fit to ANY model had been contemplated.
> --
> Rich Ulrich, wpi...@pitt.edu
> http://www.pitt.edu/~wpilib/index.html
A blunder, by any other name, is a blunder! Let the readers
read about your blunders in the archives, without your attempted
confuscation by excusing yourself.
I have explained to you WHY it was a blunder to check for
ANYTHING about the distribution of X, including outliers in it,
and you learned NOTHING from the my previous effort to
educate you.
I am repeating it now for any new readers who have joined
the discussion in the THREE groups (most of them hadn't
seen your blunders made in sci.stat.math) to see why you
were, and are, completely WRONG, and continue to be
oblivious to WHY you were wrong.
In so doing, you have just made it "beyond a shadow of a doubt"
that you were MALPRACTICING the regression theory and
methodology.
I seriously doubt 2006 will be a better year for Richard Ulrich,
in terms of ignorance and malpractice. The saying "you can't
teach an old dog new tricks" presumes you know some old
tricks (of doing something correctly). In your case, you NEVER
did anything in regression correctly, as had been thoroughly
proven in the archives of sci.stat.math in 2005.
-- Reef Fish Bob.
Please refrain from making such uninformed and gratuitous statements.
You should have made your statement to Richard Ulrich, and it would
have been perfectly appropriate.
My post was my reply to Richard Ulrich's FALSE CLAIMS about me:
RU> That sounds like Bob agrees with me, that it is okay to
RU> look for outliers. That was the tenor of my comments
RU> about checking the independent variables.
That was patently FALSE, and I documented why, citing from
posts from the archives.
RU> I should have mentioned this before, because today
RU> (January) Bob is mis-claiming again.
I also documented why that was a completely false statement about
my claim, by Richard Ulrich.
If you can understand the issues, and have ANYTHING to say
about my rebuttal, by all means present it.
Otherwise, you have only proven the pre-kindergarten level of the
readership in these STATISTICS newsgroups, about Statistics.
The reason it's cross-posted in the three groups is because Richard
Ulrich has been making his blunders, as well as his false statements
about what I posted, in all three groups. That is not to mention
since Richard Ulrich has been peddling his quackery and malpractice
in all three groups for many years, without anyone pointed out his
ERRORS.
Irresponsible posts by posters like yourself is what encouraged RU
to fabricatem, obfuscate, and provacate, in his attempts to sweep his
errors under the rug, or claim his blunders NOT to be blunders.
If you have ANY constructive to say about the subject, comment on
the substance of my post on the issues of checking for "normality"
and "ouliers" in the independent variables X in a regression problem.
If you can't find anything wrong on what I posted, then try to learn
the lesson of some COMMON ERRORS, usually committed by
beginning students in the practice of regression analysis, and not
fall prey on Richard Ulrich's malpractice in the future.
That's the best YOU can do. Your present post is completely
inappropriate in ANY newsgroup, because you are making
unsubstantiated allegations, void of any substance of the subject
matter.
For the record and for YOUR information, the case should have
rested (and did rest) since my last post on the subject on Dec 7, 2005,
until Richard Ulrich exhumed it on January 3, 2006.
He not only not let the matter rest, but REPEATED his same errors
while making his claims about what I have posted on the subject!
-- Reef Fish Bob,
> Richard Ulrich wrote:
> > On 7 Dec 2005 04:54:29 -0800, "Reef Fish"
> > <Large_Nass...@Yahoo.com> wrote:
[snip, some high levels of indenting]
> > > Correctly cited, but with a serious omission that the "absolutely
> > > UNNECESSARY" part referred to the "normality check of the
> > > INDEPENDENT variables Xs in a regression problem!
> >
> >
> > Okay. If that's a serious omission, then perhaps we
> > do not disagree. Bob just reads me wrong.
>
> Richard, give yourself a break in 2006.
>
> Stop repeating your errors and blunders while trying to make
> a case that "we do not disagree" and "Bob just reads me wrong".s
> I vehemently disagreed with what you did, which you re-stated
> again below as if I agreed with your nonsense in your malpractice!
I was trying to "cut him a break" by not-imputing to
Bob something that seems just too silly. But he wants
to claim it. This is the first time, I think, that he addressed
the specific issue of outliers and bad data. (If I forgot it
being elsewhere, I apologize.)
If he is totally unwilling to look at his dataset before
analysis, he is either working with pre-cleaned data,
or he is going to waste a lot of time in a lot of
real-world settings. And he still might get it wrong.
That's my view, based on my experience.
IF that remains a real difference between us, I'm
willing to take the side that doesn't trust their data,
as opposed to the side that insists that the independent
measures are sacrosanct inside their "black box", not
to be examined except by direction from regression
diagnostics. Bob goes further in that direction below,
and again in some of what I snip, far below,
> >
> > That sounds like Bob agrees with me, that it is okay to
> > look for outliers. That was the tenor of my comments
> > about checking the independent variables.
>
> Richard, when are you going to LEARN the most BASIC
> material in regression? You comment about checking the
> independent variables was PRECISELY the result of your
> muddle about the standard regression assumptions, what
> can be checked and what is ABSOLUTELY UNCESSARILY
> to check -- the independent variables X, for outliers or
> anything else other than blatent typo errors.
>
> If there is any outlier in X, it could be a point of ZERO
> influence (good or bad) on the fitted model because it
> could lie perfectly on the fitted regression model and its
> removal would not change the fit in any way!!! That
> is why it is a complete waste of time to check the
> independent variables for any PROBABILITY assumption
> (because there's none about the X's), nor for any of its
> values being an outlier in the distribution of that variable
> for the reason that until you have fitted SOME model,
> there is absolutely and positive NOTHING you can
> gain by checking on anything about the distribution of
> the data values in the X's
>
> You never understood why you were given sehwail the
> bum advice on checking first the "normality" of the X's and
I've posted elsewhere, giving in full my post of Nov. 7,
which says LOOK at the data. Normality is only one sort
of baseline; if that wasn't clear immediately, which I thought
it would have been, I made it clear later. The thread starts at
sehwail's post,
http://groups.google.com/group/sci.stat.math/msg/92c8b7453f1a4072
> then toned it down to checking for "outliers" when I had
> already pointed to sehwail (and YOU) that it was a complete
> waste of time to do so before any model had been fitted!
>
So. Bob seems to endorse "black box" acceptance of
whatever the world hands him as predictor variables,
until he has regression diagnostics in hand.
> BOTH of these are your blunders, Rich Ulrich, !!
I'll stand by claim, "misquotation" accounts for the first.
And if that's not enough, I'll restate that I *don't* insist
on normality for predictors. What is Bob trying to prove?
- a "blunder" in how I stated something? I've surely done
that in a number of places, though not, I think, here.
And I'll stand by my own Statement of the Issue, above, for
the second -- as a choice for the reader. If Bob accepts
the statement.
>
> What's left? NOTHING. You have managed in making
> blunders in every aspect of regression analysis -- from
> the unwarranted checking the independent variables, before
> any fit is attempted, to the misinterpretation of the SIGNS
> of the regression coefficients, to the use of regression
hm... Bob never replied to my recent challenge, to
distinguish my argument of SIGNS from the more recent
argument of Jerry Dallal. I consider Bob to be amply "refuted"
if the point is that I am doing something unusual.
If the reader wants to make a choice, it would be between
Bob's view, and "the practice of conscientious social
scientists and epidemiologists everywhere." Right, Bob?
Please, do make it clear that I'm the one who stands for the
good practice of the conventional view.
Then "Bob's errors" as a consultant and data analyst must
include (a) his refusal to look at variables to see that they
have an adequate distribution to support useful inference;
(b) his refusal to accept *any* inference based on observational
data, no matter how well supported by other data and other
arguments. To be petty, I suppose I could add on, Bob's recent
"blunder" in describing the meaning of "independent" in the
phrase "independent variable".
> results on uncontrolled observation data to draw causal
I've always been more aware than most people, about drawing
conclusions from uncontrolled observations. There are
dozens (hundreds?) of examples where I've warned people
about their designs -- as it happens, I did it again today
(about "cataracts").
> inference, to the use of correlations between Y and X to
> draw unwarrented causal inference.
I've never advocated "unwarranted causal inference."
Bob has seldom been willing to discuss what it is that might
warrant causal inference, except for giving two textbook
citations -- whereafter, each time, everyone disagreed with
Bob's reading of them.
(Bob reads them to the effect that it is never warranted.)
Curiously enough, Bob has cited one study of speed limits
where he liked the results, that the gas-saving limits on
Interstates did not save lives. But that was separate from
the question of drawing conclusions.
[snip, additional rehashing and repetition. This post
is already too long.]
> > Richard, when are you going to LEARN the most BASIC
> > material in regression? You comment about checking the
> > independent variables was PRECISELY the result of your
> > muddle about the standard regression assumptions, what
> > can be checked and what is ABSOLUTELY UNCESSARILY
> > to check -- the independent variables X, for outliers or
> > anything else other than blatent typo errors.
RU> I've posted elsewhere, giving in full my post of Nov. 7,
RU> which says LOOK at the data. Normality is only one sort
RU> of baseline; if that wasn't clear immediately, which I thought
RU> it would have been, I made it clear later.
Richard Ulrich's tedious rehash ot his blunder was explained in
my January 5 post: http://tinyurl.com/9bzvu
The essence of Ulrich's blunder can be summarized here:
======= excerpt
The independent variable X in a multiple regression CAN be anyone
of these:
1. An indicator variable with values 0 or 1.
2. A discrete uniform distribution of ranks.
3. A distribution that came from Cauchy or other long tail
distributions that would appear to have outliers (compared
to "normal")
4. The distribution of an observed X can be severely bimodal,
trimodal, left skewed or right skewed ... and it short any
distribution that has ever seen observed in the entire history
of statistical distributions that are NOR NORMAL, can be
the distribution of the X used in any multiple regression.
So, why was sehwail and Richard Ulrich want to check the "normality"
or "outliers" of the the data distributions in the INDEPENDENT
variables X?
========== end excerpt
If any and all of those X's are perfectly valid data for the
independent
variables in a regression problem, why would anyone except those
seriously muddled, like Richard Ulrich would make such NONSENSE
statements:
RU> Normality is only one sort
RU> of baseline; if that wasn't clear immediately, which I thought
RU> it would have been, I made it clear later.
"Normality" is NEVER a part of the baseline for the independent v
ariables X!
-- Reef Fish Bob.
> Richard Ulrich wrote:
> -- Reef Fish Bob.
Now I'm only a long retired amateur, but here's what I used to do:-
On first receiving data, check visually for absurdities if possible.
The first computing operation was to generate Box and Whisker diagrams
for all variables in the set, with the scales arranged independently so
that each BW plot occupies the full screen width. This gives an
immediate oversight of the data, showing up any possibly "ridiculous"
values and of course "strange" ones if one was anticipating roughly
normal data. However, the points from a designed experiment, such as
an RCC design are certainly not going to be normally distributed, which
is totally immaterial as far as regression techniques are concerned.
The next step was to make the entire set of PLOTS of all the variables
two at a time, eg with 6 variables (dependent and independent) there
would be 15 plots. It is useful to have them all on one page (or
screen). This technique highlights possible "bivariate strangeness"
and can provide useful information for the data detective.
"Absurd" data highlighted by these preliminaries can probably be
discarded but only after consulting the owner of the data.
Only then go ahead with fitting the hypothesised model, by linear
regression methods. Be sure to compute the standard regression
diagnostics and pay attention to points that produce the largest
diagonals of the Hat matrix. They might just be important! Re-check
the original data, then report to the owner. I also used to compute
jackknife residuals for my own satisfaction, but found them tricky to
explain to the typical "bench scientist"! They were more likely to be
intrigued by VIFs, which help highlight (multi)collinearity.
Perhaps these days (>20 years on) these notions have been superseded,
but they used to work quite well for me :-)) , or so I fondly believed.
Robin
DAH
But not necessarily correct, and often the wrong thing to do.
> A small post regression fit error may actually be an
> influential data point that should be rejected.
That would be a very naive approach. "Rejecting" or "Discarding" a
data
point without compelling justification (beyond that of "better fit") is
a
statistical crime, IMHO, and that of most statisticians I know.
I mentioned a example by Max Woodbury, in which the influential data
point that DIDN'T fit was the most important piece of data in the model
he was trying to fit to the data.
The "accommodation" of outliers (in the residuals) or influential
observations must be carefully considered on a case by case basis.
> Rousseeuw has done a lot of
> work in these areas. Discarding data based on the smallest determinant of
> the resulting covariance matrix is another preregression method.
If Rousseeuw did not give much more compelling SUBSTANTIVE
reasons for his discard of data, his article would never pass any of
the journals for which I have acted as referee for submitted articles.
> For example
> Rousseeuw's ROBCA (a form of PCA) can be used to identify a robust subspace
> (data points), I would not consider PCA as regression.
PCA regressions have never been justified. Hadi and I (1998)
reasoned that PCA is always the wrong thing to do for whatever
the given reasons are,
http://www.amstat.org/publications/tas/index.cfm?fuseaction=hadi1998
The main, and the only needrf, reason for NOT checking any of the
independent variables X for normality or outliers is the FACT that
in the standard (conditioned on X) regression setup, NOTHING is
assumed about the distribution of X -- that is, ANY distribution is
perfectly valid except for those Xs with known errors in data
recording.
-- Bob.
[much snipped]
> The main, and the only needrf, reason for NOT checking any of the
> independent variables X for normality or outliers is the FACT that
> in the standard (conditioned on X) regression setup, NOTHING is
> assumed about the distribution of X -- that is, ANY distribution is
> perfectly valid except for those Xs with known errors in data
> recording.
>
> -- Bob.
I would not normally post to this thread, but I wanted to recognize
that Reef Fish's statement above is correct and very succinctly put.
To follow-up on "Xs with known errors in data recording, plotting of
independent variables can be valuable for identifying Xs with
systematic error (eg. a set of lab values from one site that
are all out by an order of magnitude).
--
Kevin E. Thorpe
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
> Reef Fish wrote:
>
> [much snipped]
>
> > The main, and the only needrf, reason for NOT checking any of the
> > independent variables X for normality or outliers is the FACT that
> > in the standard (conditioned on X) regression setup, NOTHING is
> > assumed about the distribution of X -- that is, ANY distribution is
> > perfectly valid except for those Xs with known errors in data
> > recording.
> >
> > -- Bob.
>
> I would not normally post to this thread, but I wanted to recognize
> that Reef Fish's statement above is correct and very succinctly put.
Just for clarity's sake -- The reason for "NOT checking" is
not what separates Bob and me. I have agreed several times
that there is not a statistical requirement. (If you want
another name, call mine a requirement for efficient practice.)
And if someone tells you that there is an error in your data,
you should deal with it. Does Bob look, himself, to find
bad values? That seems to be a sneak-peek across his barrier.
I see further wisdom and time-saving utility in various
reasons for *checking*, which I have described elsewhere
(read the last two paragraphs of "Bob Ling vs Normality",
http://groups.google.com/group/sci.stat.math/msg/12dfe550e4c3287e )
and Kevin mentions one more.
> To follow-up on "Xs with known errors in data recording, plotting of
> independent variables can be valuable for identifying Xs with
> systematic error (eg. a set of lab values from one site that
> are all out by an order of magnitude).
As I read him, Bob would insist that you discover that sort
of error from your regression and regression diagnostics.
On 4 Jan 2006 22:01:50 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
>
> Richard Ulrich wrote:
>
>
> > > Richard, when are you going to LEARN the most BASIC
> > > material in regression? You comment about checking the
> > > independent variables was PRECISELY the result of your
> > > muddle about the standard regression assumptions, what
> > > can be checked and what is ABSOLUTELY UNCESSARILY
> > > to check -- the independent variables X, for outliers or
> > > anything else other than blatent typo errors.
>
> RU> I've posted elsewhere, giving in full my post of Nov. 7,
> RU> which says LOOK at the data. Normality is only one sort
> RU> of baseline; if that wasn't clear immediately, which I thought
> RU> it would have been, I made it clear later.
>
> Richard Ulrich's tedious rehash ot his blunder was explained in
>
> my January 5 post: http://tinyurl.com/9bzvu
>
> The essence of Ulrich's blunder can be summarized here:
I guess Bob is concentrating on trying to show that
I stated something ambiguously enough to be "wrong"
and a "blunder" because I surely agree that a predictor
variable can look like anything....
I will continue to think that my writing was clear enough
that only Bob would misread it, unless someone else
mentions it.
>
> ======= excerpt
> The independent variable X in a multiple regression CAN be anyone
> of these:
>
> 1. An indicator variable with values 0 or 1.
>
> 2. A discrete uniform distribution of ranks.
>
> 3. A distribution that came from Cauchy or other long tail
> distributions that would appear to have outliers (compared
> to "normal")
- I'd like to see a discussion of what "Cauchy" errors
can do to a correlation or regression. I don't remember
ever reading or talking about that, anywhere.
>
> 4. The distribution of an observed X can be severely bimodal,
> trimodal, left skewed or right skewed ... and it short any
> distribution that has ever seen observed in the entire history
> of statistical distributions that are NOR NORMAL, can be
> the distribution of the X used in any multiple regression.
>
> So, why was sehwail and Richard Ulrich want to check the "normality"
> or "outliers" of the the data distributions in the INDEPENDENT
> variables X?
> ========== end excerpt
Why check for outliers?
I did explain that, I think, in the post "Bob vs Normality."
http://groups.google.com/group/sci.stat.math/msg/12dfe550e4c3287e
Read the last two paragraphs of the post.
We want decent distributions if we intend to draw
inferences about "similar" sets of data. An extreme
outlier defies easy generalization.
Bob never responded to my argument, except by repeating
the statement that I do agree with. It is not "necessary."
With the hope of deriving or implying "meaning" from a
regression, we should care that the metric is meaningful,
and that the ranges of the variables are meaningful and useful.
Further, I suppose I should add, we should care about
the choice of variables to be used.
And sometimes we have to do the regression with variables
that we don't like. In that case, we use the knowledge of
their distributions to help interpret the residuals, etc.
For instance, a dichotomy as a strong predictor does leave
lumpy residual plots.
Using good variables is part of a process of applying good
"statistical control" to an analysis (for one thing), whenever
a strictly controlled, designed study is not possible.
If you don't care about the variables (what they measure,
and how well they measure it, and the problems observed
in the sample on hand), then you are stuck with Bob's further
position -- that you should never use any inferences derived
from the coefficients.
If you know that the variables are "bad" in some sense, you
also have to be shy about drawing conclusions.
[snip]
Robin, I remember your pariticipation in the re-analysis of the
regression
data set in the 1975 SPSS Manual.
You were so honest in your individual attempt that you mised the BIG
obvious data RECORDING (typo) errors that was missed also by SPSS,
while that was one of the "lessons" intended -- to look for obvious
keypunch or typing errors.
> The first computing operation was to generate Box and Whisker diagrams
> for all variables in the set, with the scales arranged independently so
> that each BW plot occupies the full screen width. This gives an
> immediate oversight of the data, showing up any possibly "ridiculous"
> values and of course "strange" ones if one was anticipating roughly
> normal data.
If the data is to be examined entirely from an "exploratory" view,
those may
well be a part of the routine exploration. But if someone has already
decided to do a fitting of a particular Y on several X that had already
been
chosen for the regression task, then your paragraph below is more to
the
point:
> However, the points from a designed experiment, such as
> an RCC design are certainly not going to be normally distributed, which
> is totally immaterial as far as regression techniques are concerned.
>
> The next step was to make the entire set of PLOTS of all the variables
> two at a time, eg with 6 variables (dependent and independent) there
> would be 15 plots.
Again, that depends on whether it's an unfocused "exploratoy" analysis,
or a much more focus multiple regression task of fitting a high
dimentinoal surface.
In the latter case, the "scatter matrix" is what your pairwise
scatterplots
is called, and it is not very useful toward discovering the fitting
surface
in HIGHER dimensions, just as two-dimentional scatterplots do not
reveal the relation in three dimensions.
> It is useful to have them all on one page (or
> screen). This technique highlights possible "bivariate strangeness"
> and can provide useful information for the data detective.
But not necessarily for one performing a pre-specified regression
with specified variables.
>
> "Absurd" data highlighted by these preliminaries can probably be
> discarded but only after consulting the owner of the data.
No, there hasn't been any indication why your "absurd" data should
be discarded at all! The owner of the data may be the least
qualified person to know that it is a crime to callously throw away
data because they appeared unusual to their untrained eye.
>
> Only then go ahead with fitting the hypothesised model, by linear
> regression methods. < snip >
This is where the iterative "model building" approach in George Box's
JASA paper on "Science and Statistics" takes over.
> Perhaps these days (>20 years on) these notions have been superseded,
> but they used to work quite well for me :-)) , or so I fondly believed.
I wouldn't put it your way. What used to work well (> 30 years ago --
as when I started teaching them; and even earlier by those before my
time) still work well today, and none any better, except for some minor
advances in graphical techiniques.
The problem is that many folks never knew or learned what worked
well and how the model-building process is supposed to work, as is
evident from much that has been posted in these sci.stat groups.
-- Bob.