"Richard Ultich the sci.stat.math resident Quack is
beating his Dead Horses AGAIN"
Given the above, and MOST of Ulrich's statistical
malpractice were done in the sci.stat.edu newsgroup,
I have included sci.stat.edu in this posting, to WARN
all readers of both groups, to be aware of Richard
Ulrich's incompetence and quackery as a statistician.
Richard Ulrich wrote:
> Well, giving Bob the full previous text to work with
> sure did not work, in terms of obtaining much
> useful commentary. Now, the opposite approach,
> more like I usually do, performed on the Reply.
Don't you think those readers who are interested or capable
know how to READ the post AND thread to see for themselves
what you and I and the OP sehwail have said to decide for
themselves without YOUR rehash, out of context AND
distorted?
You only pasted what *I* wrote, without the CONTEXT of
my comments, and that is "out of context".
Even your SUBJECT is grossly distorted AND out of context.
>
> Here's an annotated copy of new lines, with all previous
> lines deleted. The deletions don't seem to hurt, to speak of.
What do mean "Bob Ling vs normality"? I don't use "Bob
Ling" when I post, because there are hundreds of other
Bob Lings on the internet. For NETIQUETTE, you should
refer to my post as "Reef Fish" which is unique in all
newsgroups since google track posts since 1981; and
anyone who has read this groups more than a thread of
two in which I participated would have know that Reef
Fish is THIS "Dr. Robert F. Ling", which doesn't require
any of your gratuitous change of an ongoing SUBJECT
thread, to make your NOISE, out of context.
By your "vs normality", I was making a point to sehwail
and to Richard Ulrich, that when NORMALITY is NOT
required, or the independent variables, say, in a regression
problem, it's a complete waste of time to examine the
"normaility" of those variables!
> ==========start of Bob Ling's reply. Annotated briefly.
Readers can read my post AND the thread to read what
I said, WITH the contextr in which sehwail AND Richard
Ulrich BOTH erred, in matters of examining independent
variables for normality!
I snipped the rest of Richard's out-of-context quotes
except those relevant to my explanation of the LACK of
NECESSITY in the portions in which normality was
discussed by sehwail and Ulrich, because they are
BOTH very confused and muddled about where the
"normality assumptions" in the usual regression
context occur and NEED to be examined.
> On 23 Nov 2005 19:56:13 -0800, "Reef Fish"
> <Large_Nass...@Yahoo.com> wrote:
THAT's how you should have referred to the post in question,
and earlier posts!
> > Perhaps. But he certainly does not need Richard Ulrich to make
> > the same blunder he did as in the "normaility" and "outlier" issues,
> > not to mention the serious omission!
> >
> > Isn't it convenient to gloss over those errors of yours, Richard?
Richard should have replied HERE, if he has anything relevant
to say this round.
> =====3 lines distortion, seeming to invite reprise of comments on
> > This comment was actually Richard Ulrich's self-recommendation
> > as a consultant, because he has lost his job and is looking for one
> > NOW.
Richard was using his insult of MY reply to brag about HIS
experience in consulting, etc., etc., and immediately followed
by his post about him "looking for a job".
> >
> > I can give anyone looking for a consultant for STATISTICAL
> > advice on theory, methodology, and practice that hiring
> > Richard Ulrich for such a job would be like hiring a drowning
> > victim in a shallow pool to be a Life Guard in a deep ocean!
That's MY Public Service Announcement to the public about
Richard Ultich's lack of statistical knowledge and his proven
record of making statistical BLUNDERS in what he posted.
> >
> > I mean it SERIOUSLY, based on the numerous posts by
> > Richard Ulrich I've read and discussed during a period of
> > about 6 month's of participation in rec.stat.math/edu groups.
> >
> > The evidence was AMPLE and Unequivocal!
>
> > Everyone (even Richard Ulrich) could see that sehweil was a
> > "struggling beginner", and I advised him accordingly. after giving
> > him a LENGTHY explanation of his (sehweil's) errors to sehweil,
> > in my initial response:
> [3 lines previous]
> > But Richard Ulrich failed to recognize his OWN "struggling beginner"
> > status on the subject, and tried his "blind leading the blind trick"
> > to which Richard has grown accustomed.
Examining the INDEPDENT variables for "normality" was ONE
specific example of sehwail and Ulrich's SAME blunder.
> > People ask the wrong question(s) all the time. Educator or
> > consultant, there is NO EXCUSE for compounding his wrong act
> > of trying to look for normality in the scatterplots by adding
> > YOUR own wrong acts/advice of looking for outliers and that
> > "normality is not essential" when it was absolutely UNNECESSARY!
Again, that referred to the examination for "normality" in the
variables that are NOT supposed to be normal. In fact, those
variables can even be categorical or nominal!
> > My comment to Richard Ulrich stands, and clarified and emplified
> > in this post, relative to sehweil's problem in the sehweil thread.
> >
> > -- Bob.
> =========end of Bob Ling's reply. <shown out-of-context>
I added the appropriate CONTEXT of what Richard Ulrich
cited, regarding the relevance to the "normality" issue, that
anyone who READ the post in its entirety would have seen
the same reason I REPEATED this time, to point out how
wrong Richard Ulrich was, and IS.
> I count 6 lines of useful comment, 4 lines of curious comment,
> and 5 lines of summary. Plus 32 lines of insult, etc., and a few
> lines of miscellaneous. My reader reported that as a 232 line post.
> (Does Bob consider himself a professional of some kind?)
Why didn't you spend your time counting lines, and citing
them OUT-OF-CONTEXT, by learning something from textbooks
about Regression Analysis and Model Building?
Reef Fish Bob is a statistical professional, with a Ph.D. (which
Richard does not have) in the subject of Statistics. Reef Fish
was a Full Professor in 1977 (in statistics), while Richard
Ulrich was an Assistant Prof, at age over 50, in a Department
of Psychiatry, and has since lost his job.
Ulrich> (Does Bob consider himself a professional of some kind?)
If Reef Fish Bob didn't, plenty of others do:
Reef Fish Bob was elected Fellow of the ASA by his peers
in 1984.
Reef Fish Bob was recognized by the publishers of "Who's
Who in the World") in the 1984 (Marquis publication) to be
a "statistican educator and consultant" with a citation (in
that Edition) that was LONGER than the citation for
Ronald Reagan (when he was President of the US at
the time), or Bill Clinton (who was only Governor of
Arkansas at that time).
Ulrich> (Does Bob consider himself a professional of some kind?)
Enough of a statistical professional to voice in a public
forum (sci.stat.math and sci.stat.edu) in the subject of
STATISTICS how blunders are made by a few who claim
themselves to be statisticians, while they have shown
nothing more than their statistical Quackery!
>
> I'll reply to the "curious comment" and to the summary.
> Here are the 4 lines again --
>
> > Everyone (even Richard Ulrich) could see that sehweil was a
> > "struggling beginner", and I advised him accordingly. after giving
> > him a LENGTHY explanation of his (sehweil's) errors to sehweil,
> > in my initial response:
>
> "... advised him accordingly"?
> Bob advised him to take an elementary course in data analysis,
> and advised him to ignore my comments and to ignore the
> initial R-squareds. I still don't see much more.
That's because Richard Ulrich's LACK of understanding about
the subject of Regression Analysis!! Period! An elementary
course in Data Analysis taught be a competent statistical
professional, would have set sehwail and Ulrich straight, given
my pointers.
Did Richard Ulrich think I should write a textbook in this
newsgroup to teach HIM and sehwail on how to do a regression
analysis and model building problem PROPERLY.
Ulrich> Bob advised him to take an elementary course in data analysis,
That was Ulrich's OUT-OF-CONTEXT distortion! This was the
context in my ORIGINAL reply to sehwail, regarding the building
of a prediction model with FOUR indep. variables, to sehwail,
AFTER pointing out his errors, including specifically his improper
use of normality:
RF> The is where the ART of model building in multiple-linear
RF> regression takes over from the SCIENCE of the methodology,
RF> because there is nothing in the science that is adequate in
RF> telling one what to do other than some "guided iterations" of
RF> trial and error, as discussed in George Box's JASA article
RF> on "Science and Statistics", and the course material in
RF> "Data Analysis" taught by many statisticians, including myself.
RF> In your case, I strongly advice taking an elementary course
RF> in applied data analysis in model building, and not try to
RF> do-it-yourself after reading a few too-simplistic articles or
RF> posts in newsgroups.
> I still see Bob's recommendations as beyond the reach of
> most struggling beginners. I looked back at it the other day, and
> it was more complete than I remembered. It is a nice 'road map'
> (as Bob called it) for someone who has been down that road
> before, but it is not (I think) self-explanatory to beginners.
What else could I have said, other than pointing him to an
ELEMENTARY course he should have taken? If that
recommendation is beyond sehwail's reach, which I take it
as Ultich's INSULT to sehwail, because sehwail had said:
sehwail> I have read several books on linear regression and
sehwail> infinite number of articles and presentation but I am
sehwail> still not sure if what I am doing correct or not
It appears that sehwail (at least according to HIS claim, apart
from the "infinite" number of articles) that he had read more
about statistics than Richard Ulrich did! He was simply
LACKING the GUIDANCE of a competent statistician -- which
is yet another reason for pointing him to take a COURSE in
Data Analysis.
>
> If Sehweil had said, "Wow, thanks!" , that would have proven it
> as *fitting* advice for this OP. He didn't. I still see it as
> advice by someone unaccustomed to such beginners.
I gave YOU (Richard Ulrich) the same sound advice (for
"beginners" <because you ARE one>) the same advice in
many statistical subject, pointing you to specific books
and journal articles -- which Ulrich never read or understood.
Isn't that the same as giving sehwail the same APPROPRIATE
advice?
>
> Sehweil responded to mine, and gave additional details
> that I pointed out (Nov 12) as including totally wrong
> readings of whatever listings he had. No replies by him
> after that one.
Sehweil thanked Richard (as mere courtesy), to explain
to Richard his further blunders (which he didn't know at
the time),
That was merely a case of the Blind (Ulrich) leading the
Blind (sehwail).
AFTER Richard's FINAL response (which *I* had pointed
out Rich Ulrich's blunders, as well as given sehwail my
detailed response (giving him the "roadmap" which
Richard Ulrich failed to follow), isn't that good reason
"No replies by him <sehwail> after that one"?
Sehwail is welcome to reply to this or other posts of mine
on his topic. But isn't it reasonable to assume that
sehwail UNDERSTOOD and FOLLOWED my advice
better than Richard Ulrich did?
> Bob's 5 lines of "summary" --
> > People ask the wrong question(s) all the time. Educator or
> > consultant, there is NO EXCUSE for compounding his wrong act
> > of trying to look for normality in the scatterplots by adding
> > YOUR own wrong acts/advice of looking for outliers and that
> > "normality is not essential" when it was absolutely UNNECESSARY!
>
> Bob says, I guess, "don't worry about outliers, and normality
> is 'absolutely UNNECESSARY!'" For normality, that's posed
> against my view that "normality is 'not essential' - but I do
> think it is nice to hear about." I think Bob is ill-advised
> in his scorn.
I spoke against ONLY the "normality" check of the INDEPENDENT
variables, or the AGGREGATE of the observed Ys.
Richard Ulrich is merely amplifying his own muddle and called
that "ill-advised".
< Richard Ulrich's self-advertisement as a statistical consultant
snipped>
> I did 1000+ statistical consultations in 20 years. I saw a
> lot of rank amateurs and a lot of 'silly mistakes.')
I had done that over 20 years AGO -- and most of those 1000
were my graduate STUDENTS, in courses in Data Analysis,
in which they were taught how to do it CORRECTLY!
> When a single outlier accounts for (say) 50% of the variance,
> modeling with any ANOVA technique is hardly ever wise,
> and even less so if the modeler doesn't know about it.
Ulrich CONTINUED his muddle!! Outliers in the
INDEPENDENT variables are to be taken care by the
appropriate FITTING MODEL, so that there's no outlier
in the RESIDUALS.
The accomodation could be by an indicator variable of
a one-time occurring event or through other methods.
Richard Ulrich continue to be confused and muddled
about the "normality" assumption on the ERRORS,
rather than on the predictor (independent) variables
themselves.
>
> Further, it is a neat fact concerning observed data (as opposed
> to data sampled to fill a design) that the transformation that
> produces normality often will produce (a) homogeneity of
> variance of its own errors, and also (b) linearity with various
> co-factors. So, normality is nice to know about,
NOT on the independent variables!!!!!!!!!!!!!!!!!!!!!!!!!!
Normality is on the ERRORS, in a fitted model.
Sehwail did NOT discuss the normaility of the residuals,
only his independent variables X1, X2, .. X4.
As the saying goes, "before Richard opens his mouth,
there is doubt to the depth of his ignorance; but as
soon as he opens his mouth, he removed ALL doubts"!
Richard Ulrich's REPEAT of his own blunder, in the
examination of NORMALITY in the independent
variables, which could even be NOMINAL, leaves
absolutely NO DOUBT, that he is an incompetent
Quack (together with his numerous errors and
blunders in other threads in regression analysis
and linear models) in the subject of statistics!
> because it's
> absence elevates the potential for other problems -- either
> in modeling or in eventual testing.
Richard obviously did not read or understand the Box
article on "Science and Statistics" I recommended to sehwail,
and had recommended to Ulrich many times, every time
Richard made some blunder in a regression problem.
> So, you can get along
> without it, and without knowing about it. But for Bob to give
> us the reaction that he did, IMHO, is probably a function
> of his inexperience. Narrow horizons. Not looking ahead.
Coming from an unemployed former Asst Prof. in a Department
of Psychiatry, and champion of posting statistical blunders
in his own malpractice and statistical quackery, Richard
Ulrich thinks I need his endorsement to become a
statistician or dreads his unfounded insults and criticisms.
> --
> Rich Ulrich, wpi...@pitt.edu
> http://www.pitt.edu/~wpilib/index.html
Give it up, Richard. Do the statistical community a FAVOR by
keeping your mouth firmly SHUT, on subjects in which you
know nothing about!
As the saying goes, "before Richard opens his mouth,
there is doubt to the depth of his ignorance; but as
soon as he opens his mouth, he removed ALL doubts"!
-- Bob.
Ph.D. in Statistics and Fellow of the ASA (1984), and
cited as "statistical educator" in Marquis's "Who's Who
in the World" (1984 and some later editions).
> Reef Fish Bob was recognized by the publishers of "Who's
> Who in the World") in the 1984 (Marquis publication) to be
> a "statistican educator and consultant" with a citation (in
> that Edition) that was LONGER than the citation for
> Ronald Reagan (when he was President of the US at
> the time), or Bill Clinton (who was only Governor of
> Arkansas at that time).
Thank you for the reminder (WHICH HAS BEEN POSTED ON USENET FOR THE
24TH TIME SINCE 1994)
How much did the inclusion cost?
For a real measure of self worth, why don't you compare the US
Presidents usenet posting numbers with your own.
mudshrimp had a LIFE TIME posting history of 6 times, all following
my posts, in 5 different newsgroups. That's what I called a
dedicated stalker, whom the rec.scuba group folks called
"worshippers" of the posting author Reef Fish.
> Reef Fish wrote:
Which mudshrimp negliected to note that was my response to
the specific question of Richard Ulrich:
Ulrich> (Does Bob consider himself a professional of some kind?)
to which I responded:
RF> If Reef Fish Bob didn't, plenty of others do:
> > Reef Fish Bob was recognized by the publishers of "Who's
> > Who in the World") in the 1984 (Marquis publication) to be
> > a "statistican educator and consultant" with a citation (in
> > that Edition) that was LONGER than the citation for
> > Ronald Reagan (when he was President of the US at
> > the time), or Bill Clinton (who was only Governor of
> > Arkansas at that time).
>
> Thank you for the reminder (WHICH HAS BEEN POSTED ON USENET FOR THE
> 24TH TIME SINCE 1994)
Really? Why 1994? I've been posting since 1987, and the
cited paragraph above was posted EXACTLY ONCE.
The Reagan and Clinton reference (compared to the length
of my citation in the 1984 citation (not 1994) was posted
EIGHT times, in response to idiots similar to Richard Ulrich
asking the same insulting question and making the same
insulting comments.
> How much did the inclusion cost?
Why can't you be ORIGINAL? Plenty of IDIOTS in rec.scuba,
when they saw the "Who's Who in the World" reference
thought it was something you can BUY, or that it costs the
person cited any money!
The EDITORS of the "Who's Who" sought me out, when they
did the research and THEN ask me to correct any error they
might have made, or suggest additions and deletions.
It cost me absolutely NOTHING to be cited in that or the
dozens of lesser "Who's Who" publications that are not worth
the paper they are printed on.
They are paid for by LIBRARIES throughout the WORLD.
>
> For a real measure of self worth, why don't you compare the US
> Presidents usenet posting numbers with your own.
Because I have stated publicly that the time I spent in my 100,000
newsgroup posts
RF> To put things into proper perspectives: The TOTALITY of
RF> time I spent in those 100,000 posts is considerably LESS
RF> than the time I spent in writing a dozen or so of my
RF> published journal articles, or the books I wrote, or the
RF> time I spent directing the doctoral dissertations of my
RF> Ph.D. students.
In fact, I had said elsewhere that the TOTALITY of time I spent
on those posts was less than the time I spent on any ONE of
several papers I had published in JASA (Journal of the American
Statistical Association).
mudshrimp, why don't you uncloak you chicken-manure
worshipper identity, and speak up like a responsible citizen of
the usenet, such as Reef Fish Bob, which clueless Richard
Ulrich insist to refer to as "Bob Ling" because he had seen
several other IDIOTS in rec.scuba groups did, when they
tried to smear my name in my profession.
If you want to find out about me, why don't you do what
did in researching in the google WEB page, and posted it
in the sci.stat.math group:
and wrote,
*> It turns out that if
*> the Reef Fish's presumed name is entered into Google between
*> quotes and with his middle initial, Google returns "about 159"
*> results, about 70 of which are displayed before "In order to
*> show you the most relevant results, we have omitted some entries
*> very similar to the 70 already displayed" ALL of which are
*> about that one person and show that person's statistical
*> credentials to be beyond reproach.
For YOU, Richard Ulrich, and others intending to smear my professional
name, by insisting to refer me as Bob Ling rather than Reef Fish or
Reef FIsh Bob, you should pay CLOSE attention to
"the Reef Fish's presumed name"
which is NOT "Bob Ling", but my FULL NAME, with a MIDDLE
INITIAL, that is included in many of my publications, together with
my professional e-mail address (which is NOT
Large_Nass...@Yahoo.com)!
mudshrimp and Richard Ulrich, for all I know you may be the SAME
poster because you both SPECIALIZE in posting your impertinence
about me, distorting and misrepresenting ALL factual matters.
Go GET A LIFE, instead of polluting the statistics newsgroups
sci.stat.math and sci.stat.edu in USENET.
-- Reef Fish Bob.
Ph.D. in Statistics (1970) and Fellow of the ASA (1984).
"Whom the Gods wish to destroy . . ."
C
On 29 Nov 2005 09:35:08 -0800, "Reef Fish"
C, completing the following two-question Multiple Choice
Quiz will do you much good, because you will score 100
no matter what choice you make, in the same manner no
matter how wrong Richard Ulrich has been in the
statistical topics on which he had posted, you think he is
going the right think and fear even a WARNING about
his Quackery which seemed to have escaped many
long-time readers in this group.
1. In the Kingdom of the Blind ...
(a) the Blind is eager to declare himself Blind, for
social acceptance.
(b) the Blind often leads the Blind, without anyone
warning about the cliff they are about to walk off.
(c) the One-Eyed Man is King
(d) the One-Eyed Man is happier if he gouges out
his seeing eye, so as not to see what the Blind
in the Kingdom of the Blind would not accept.
2. Fools Rush In Where ...
(a) there is a gathering of unexposed Fools
(b) he thinks he can be a member of the Mutual
Admiration Society of Fools.
(c) Angels fear to tread.
(d) someone DARED to expose some Fool in a thread.
Enjoy your perfect score.
-- Reef Fish Bob.
These are taken out of order, with a few lines quoted
from a 400+ line post.
On 29 Nov 2005 09:35:08 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
(1) Looking at Data.
RU >>
> > Bob's 5 lines of "summary" --
Bob > > >
> > > People ask the wrong question(s) all the time. Educator or
> > > consultant, there is NO EXCUSE for compounding his wrong act
> > > of trying to look for normality in the scatterplots by adding
> > > YOUR own wrong acts/advice of looking for outliers and that
> > > "normality is not essential" when it was absolutely UNNECESSARY!
RU > >
> > Bob says, I guess, "don't worry about outliers, and normality
> > is 'absolutely UNNECESSARY!'" For normality, that's posed
> > against my view that "normality is 'not essential' - but I do
> > think it is nice to hear about." I think Bob is ill-advised
> > in his scorn.
Bob >
> I spoke against ONLY the "normality" check of the INDEPENDENT
> variables, or the AGGREGATE of the observed Ys.
Okay, let us check on some assumptions here.
IF you are wanting to talk about what is inessential:
If no tests are planned, no normality is needed, anywhere.
I'll quote my example of the outlier, and Bob's reply.
RU > >
> > When a single outlier accounts for (say) 50% of the variance,
> > modeling with any ANOVA technique is hardly ever wise,
> > and even less so if the modeler doesn't know about it.
Bob >
> Ulrich CONTINUED his muddle!! Outliers in the
> INDEPENDENT variables are to be taken care by the
> appropriate FITTING MODEL, so that there's no outlier
> in the RESIDUALS.
>
> The accomodation could be by an indicator variable of
> a one-time occurring event or through other methods.
Okay, Bob says he will deal with the problem down-the-road,
whereas I figure my consultees want to deal immediately.
Maybe Bob will confirm if this is precise enough about the
difference between us. Bob is scornful the regression
practices of social sciences, economics, epidemiology and
so on. This was established in the summer, where he
eventually admitted the breadth of his nonconformity.
In consequence: We (social scientists) attempt to model
with *meaning*, and in pursuit of that, we check various
aspects of our variables, early and often. If a variable
isn't measuring what it is supposed to measure, we want to
replace it with something better. If one score contributes
half the sum of squares around the mean, that's probably not
the outcome *or* the predictor that we want to analyze.
One case does not carry enough meaning. Faced with a
question of meaning, it is time to look at whether the
variables are weird or not.
Bob doesn't like *meaning*. I agree, if we ignore meaning
and just want a "fit" -- and there's no choice or chance
of modeling with different numbers -- then there's little
point in checking for odd distributions until you've got
residuals.
Now, I think the Original Post showed a crisis of *meaning*.
The poster was complaining that the regression coefficients
were not what he expected. In retrospect, that sounds
exactly like what Bob was complaining about for months.
(2) Addressing Bob Ling as Bob Ling.
> What do mean "Bob Ling vs normality"? I don't use "Bob
> Ling" when I post, because there are hundreds of other
> Bob Lings on the internet. For NETIQUETTE, you should
> refer to my post as "Reef Fish" which is unique in all
> newsgroups since google track posts since 1981; and
> anyone who has read this groups more than a thread of
> two in which I participated would have know that Reef
> Fish is THIS "Dr. Robert F. Ling", which doesn't require
> any of your gratuitous change of an ongoing SUBJECT
> thread, to make your NOISE, out of context.
I figure I am honoring Netiquette more than Bob does, by
consistently returning a *lesser* version of his assault on
Netiquette. He should have guessed before now -- Bob can stop
*my* transgressions by ceasing his own. He calls me a quack,
or whatnot, and I call him by name. I figure this is about
the least possible violation, in respect to annoying general
readers. Readers won't know it bothers Bob until he tells them.
And Bob threw away all right to polite consideration from me,
long ago.
Similarly, that thread name was changed from something that
many readers would find objectionable to something that
mainly would bother Bob.
Googling groups for <"Bob Ling" group:sci.stat.*>, where
strangers might look for him, is always this Bob Ling/Reef Fish.
How else will strangers learn to find him?
My using his name does nothing to hinder searches on Reef Fish
since that's preserved in every Reply.
(3) Professional behavior, professional failings.
> Ulrich> (Does Bob consider himself a professional of some kind?)
>
> If Reef Fish Bob didn't, plenty of others do:
>
> Reef Fish Bob was elected Fellow of the ASA by his peers
> in 1984.
>
> Reef Fish Bob was recognized by the publishers of "Who's
> Who in the World") in the 1984 (Marquis publication) to be
> a "statistican educator and consultant" with a citation (in
> that Edition) that was LONGER than the citation for
[snip, rest]
Let's see. Bob beats a rhetorical question to death by
taking it literally. Is that a violation of something?
Here are some thoughts on professionalism which parallel
what Bob was inspiring in me.
In a science fiction novella by David Weber, an Ensign
(in a Space Navy) is musing on her superior officer:
"That clearly apparent contempt for anyone he considered
his inferior was the worse of the only two real failings
that Ensign Haverty had so far detected in him.
( /Professional/ failings, that was; the list of things
she detested in him on a personal level grew longer
with each passing day.) The other was a tendency to
ignore the unlikely in his planning and depend on his
natural intelligence and ability--both of which were
considerable, she admitted--to wiggle out of trouble
if it persisted in happening anyway....
"And however much she might dislike that trait, it was
far less disruptive and demoralizing than the contemptuous
(and public) verbal flayings he was in the habit of
handing out."
- page 297 in "The hard way home," collected in Worlds of
Honor (1999).
Contempt, revisited in 'verbal flayings', seems like "Bob all over".
His "ignoring the unlikely in his planning," "winging it,"
is more subtle, but that resonated with me when I read it.
He admits - brags, even - of posting hastily and not
spending much time on his internet participation. IMHO,
it shows. Most recently, it shows in the "flame-war"
syntax and framework of his insults to me. Exaggeration?
Invention? This is while he pretends that he is trying
to improve the content of the group.
If you've read Bob, I hope you recognize it. If you
don't read Bob, I won't encourage trying to catch up.
1) is a legit attempt on Ulrich's part on his rebuttal of what
I considered his BLUNDER in his advice to sehwail, the
OP of this thread about a multiple regression problem.
2) and 3) are entirely gratuitous on Ulrich's part, as his
continued ad hominem attack on Reef Fish Bob.
I am going to address (1) seriously and show why Richard
Ulrich has erred and blundered, AGAIN!
> (1) Looking at Data.
> RU >>
> > > Bob's 5 lines of "summary" --
> Bob > > >
> > > > People ask the wrong question(s) all the time. Educator or
> > > > consultant, there is NO EXCUSE for compounding his wrong act
> > > > of trying to look for normality in the scatterplots by adding
> > > > YOUR own wrong acts/advice of looking for outliers and that
> > > > "normality is not essential" when it was absolutely UNNECESSARY!
Correctly cited, but with a serious omission that the "absolutely
UNNECESSARY" part referred to the "normality check of the
INDEPENDENT variables Xs in a regression problem!
> RU > >
> > > Bob says, I guess, "don't worry about outliers, and normality
> > > is 'absolutely UNNECESSARY!'" For normality, that's posed
> > > against my view that "normality is 'not essential' - but I do
> > > think it is nice to hear about." I think Bob is ill-advised
> > > in his scorn.
Why don't you tell us WHY anyone should check for the "normality"
of any of the INDEPENDENT variables Xs in a multiple regression
problem? That was a well-deserved scorn on your part.
If something is "absolutely unnecessary" to be normal, why should
anyone check for normality? Do you check for normality of an
indicator variable X (which can take only values 0 or 1)? Does
it make sense (other than your own muddle) that it is "not
essential" to check the indicator variable for normality?
That's just ONE of the infinitely many candidates for the
INDEPENDENT (predictor) variables X in a regression that is
absolutely unnecessary for any of them to be "normal" in
distribution!
PERIOD!
You can mouth-dance all you want -- to check for the normality
of an independent variable X in a regression problem is clearly
and unmisitakably a sign of the IGNORANCE of the person
about the meaning of the "normality assumption" in a regression
problem.
> Bob >
> > I spoke against ONLY the "normality" check of the INDEPENDENT
> > variables, or the AGGREGATE of the observed Ys.
>
> Okay, let us check on some assumptions here.
> IF you are wanting to talk about what is inessential:
> If no tests are planned, no normality is needed, anywhere.
What test? The normality assumption pertains to the ERROR
(observable ONLY as residuals, AFTER a model has been fitted
to the data)! Richard Ulrich is continuing to muddle in his
misunderstanding of what is "required" and what is "absolutely
not necessary" in the variables in a multiple regression problem.
>
> I'll quote my example of the outlier, and Bob's reply.
>
> RU > >
> > > When a single outlier accounts for (say) 50% of the variance,
> > > modeling with any ANOVA technique is hardly ever wise,
> > > and even less so if the modeler doesn't know about it.
> Bob >
> > Ulrich CONTINUED his muddle!! Outliers in the
> > INDEPENDENT variables are to be taken care by the
> > appropriate FITTING MODEL, so that there's no outlier
> > in the RESIDUALS.
That stands as stated. No ifs and buts about it!
50% of the variance of WHAT? The independent variable X?
Richard Ulrich is talking through his hat in all his contrived
excuses when none of his alleged violations violated ANY
of the regression or ANOVA (regression via indicator variables)
problems.
The misunderstanding of the "normality assumption" in
a regression problem is so common and tempting that I have
catalogued it as ONE of the common BLUNDERS -- in my
Data Analysis lecture notes, to graduate students. Even the
undergrad students in the first course understood why the
independent variable X in a simple regression can have any
distributional shape WITHOUT violating any of the
distributional assumptions in a regression model!
It's a sad commentary that someone like Richard Ulrich,
who is supposed to have been trained as a statistician, and
who often advise others on statistical problems, cannot
even get past the FIRST STEP in a regression model.
> > The accomodation could be by an indicator variable of
> > a one-time occurring event or through other methods.
>
> Okay, Bob says he will deal with the problem down-the-road,
> whereas I figure my consultees want to deal immediately.
That's an UNMISTABLE admission of your BLUNDER of
looking for normality in the independent variables X.
My "deal with the problem down-the-road" means only ONE
thing -- that one should deal with the "normality assumption"
ONLY after a model has been fitted and there are RESIDUALS
to examine to check the (normality, independence, and
homoskedasticity) assumption imbedded in the probability
model of the ERRORS in a standard regression problem.
Not until then, and certain not before then, as Richard Ulrich
wants to deal with immdiately what is NOT required to be
normal.
> Maybe Bob will confirm if this is precise enough about the
> difference between us.
I definitely confirm, and re-confirmed, and re-re-confirmed
why Richard Ulrich was CONFUSED about the fact that
there is nothing in a probability assumption of a regression
model that requires any of the independent variables Xs
to be normality distributed.
That's the same confusion the OP sehwail had -- and I
said to him that he was wasting computer and human time
to check the normality of the indep vars. X!
I have no reason to believe that sehwail did not understand
what I said. I have every reason to believe, only to be
re-confirmed by Richard Ulrich, that he has repeatedly
ERRED, in his arguments that the variables X's need to
be dealt with immediately (as if there were a REASON
that it should be dealt with!)
> Bob is scornful the regression
> practices of social sciences, economics, epidemiology and
> so on. This was established in the summer, where he
> eventually admitted the breadth of his nonconformity.
Richard Ulrich is a perfect specimen, in sci.stat.math and
sci.stat.edu, to provide concrete evidence of his
MALPRACTICE which is to be scorned.
>
> In consequence: We (social scientists) attempt to model
> with *meaning*, and in pursuit of that, we check various
> aspects of our variables, early and often. If a variable
> isn't measuring what it is supposed to measure, we want to
> replace it with something better. If one score contributes
> half the sum of squares around the mean, that's probably not
> the outcome *or* the predictor that we want to analyze.
NONE of that verbiage has ANYTHING to do with Richard
Ulrich's mistaken notion that an independent vairable X in
a regression problem has to be normally distributed or
absent of outliers! Those are called "high leverage"
points, Richard, in the X-space. There is absolutely
nothing wrong with data with high leverage points. They
MAY or MAY NOT be "influential" in a regression problem.
Of course that kind of discussion is far beyond what Richard
Ulrich is capable of understand, when he is stuck muddling
in misapplying the "normality assumption" in a regression
problem!
< tedious excuses by Ulrich snipped. They were inexcusible!>
>
>
> (2) Addressing Bob Ling as Bob Ling.
>
> > What do mean "Bob Ling vs normality"? I don't use "Bob
> > Ling" when I post, because there are hundreds of other
> > Bob Lings on the internet. For NETIQUETTE, you should
> > refer to my post as "Reef Fish" which is unique in all
> > newsgroups since google track posts since 1981; and
> > anyone who has read this groups more than a thread of
> > two in which I participated would have know that Reef
> > Fish is THIS "Dr. Robert F. Ling", which doesn't require
> > any of your gratuitous change of an ongoing SUBJECT
> > thread, to make your NOISE, out of context.
>
> I figure I am honoring Netiquette more than Bob does, by
> consistently returning a *lesser* version of his assault on
> Netiquette.
What do mean by "honoring Netiquette" when NONE of my
100,000 posts were posted with "Bob Ling" as author, and
nearly ALL of them, since 1992, were posted with the
unmistable posting name of "Reef Fish"?
> He should have guessed before now -- Bob can stop
> *my* transgressions by ceasing his own. He calls me a quack,
> or whatnot, and I call him by name. I figure this is about
> the least possible violation, in respect to annoying general
> readers. Readers won't know it bothers Bob until he tells them.
> And Bob threw away all right to polite consideration from me,
> long ago.
Richard, when I called you a "quack" or said what you did
were examples of "malpractice of statistics", those are NOT
"name calling", but clearly documented and substantiated
statements about your MALPRACTICE of statistics!
>
> Googling groups for <"Bob Ling" group:sci.stat.*>, where
> strangers might look for him, is always this Bob Ling/Reef Fish.
> How else will strangers learn to find him?
That's a feeble excuse for your anti-netiquette behavior.
Google "Bob Ling" under the web section and you'll find
2,000,000 hits -- as I had mentioned before, and probably
no more than a couple of them were about Reef Fish Bob!
> (3) Professional behavior, professional failings.
>
> > Ulrich> (Does Bob consider himself a professional of some kind?)
You asked a (rhetorical) question. I gave a straight factual answer.
What's your beef now?
> >
> > If Reef Fish Bob didn't, plenty of others do:
> >
> > Reef Fish Bob was elected Fellow of the ASA by his peers
> > in 1984.
> >
> > Reef Fish Bob was recognized by the publishers of "Who's
> > Who in the World") in the 1984 (Marquis publication) to be
> > a "statistican educator and consultant" with a citation (in
> > that Edition) that was LONGER than the citation for
> [snip, rest]
>
> Let's see. Bob beats a rhetorical question to death by
> taking it literally. Is that a violation of something?
What's your purpose of asking the rhetorical question then?
>
> Here are some thoughts on professionalism which parallel
> what Bob was inspiring in me.
>
> In a science fiction novella by David Weber,
Irrelevant. Non sequitur.
> If you've read Bob, I hope you recognize it. If you
> don't read Bob, I won't encourage trying to catch up.
But read Richard Ulrich's unsubstantiated, distorted, ad
hominem attacks?
Richard Ulrich, whatever I said were your "quackery"
and "statistical malpractice" were AMPLY substantiated.
What you POST, as in this one, about your malpractice,
is clear and unequivocal, to anyone well-educated in
statistics, and especially in the TOPIC of regression
analysis.
Your statistical incompetence (evidence by what you
POST) is only circumstantially supported by your own
professional credentials of:
uneducated: in Statistics as pointed out in the numerous
times I specifically mentioned your LACK
of education in those topics.
unproven: Nearly 60 years of age, with the highest
degree of MS and highest academic rank
of "Assistant Professor" in a Department of
Psychiatry. Lack of publications of
statistical substance except fo those in
newsgroups much of which had proven to
be quackery and malpractice.
unemployed: A well-deserved status, given your
"professional credentials" above.
-- Reef Fish Bob.
Okay. If that's a serious omission, then perhaps we
do not disagree. Bob just reads me wrong.
That sounds like Bob agrees with me, that it is okay to
look for outliers. That was the tenor of my comments
about checking the independent variables.
I should have mentioned this before, because today
(January) Bob is mis-claiming again.
I don't test for normality. I don't recommend tests for
normality. I didn't recommend *testing* normality to this OP.
I do *like* normality, but that's a different matter. What
I particularly like is to know that my variables are not crazy.
I might use the word "check" in some context without
meaning "use a test."
When Bob doesn't quote, I'm never sure what comment he
is relating to.
[snip]
Bob >
> 50% of the variance of WHAT? The independent variable X?
I was alluding to the size-estimate from a very-simple test
for Outlier.
"By What fraction is the SS of a variable reduced, by
removing the most extreme value?"
Richard, give yourself a break in 2006.
Stop repeating your errors and blunders while trying to make
a case that "we do not disagree" and "Bob just reads me wrong".s
I vehemently disagreed with what you did, which you re-stated
again below as if I agreed with your nonsense in your malpractice!
>
> That sounds like Bob agrees with me, that it is okay to
> look for outliers. That was the tenor of my comments
> about checking the independent variables.
Richard, when are you going to LEARN the most BASIC
material in regression? You comment about checking the
independent variables was PRECISELY the result of your
muddle about the standard regression assumptions, what
can be checked and what is ABSOLUTELY UNCESSARILY
to check -- the independent variables X, for outliers or
anything else other than blatent typo errors.
If there is any outlier in X, it could be a point of ZERO
influence (good or bad) on the fitted model because it
could lie perfectly on the fitted regression model and its
removal would not change the fit in any way!!! That
is why it is a complete waste of time to check the
independent variables for any PROBABILITY assumption
(because there's none about the X's), nor for any of its
values being an outlier in the distribution of that variable
for the reason that until you have fitted SOME model,
there is absolutely and positive NOTHING you can
gain by checking on anything about the distribution of
the data values in the X's
You never understood why you were given sehwail the
bum advice on checking first the "normality" of the X's and
then toned it down to checking for "outliers" when I had
already pointed to sehwail (and YOU) that it was a complete
waste of time to do so before any model had been fitted!
BOTH of these are your blunders, Rich Ulrich, !!
What's left? NOTHING. You have managed in making
blunders in every aspect of regression analysis -- from
the unwarranted checking the independent variables, before
any fit is attempted, to the misinterpretation of the SIGNS
of the regression coefficients, to the use of regression
results on uncontrolled observation data to draw causal
inference, to the use of correlations between Y and X to
draw unwarrented causal inference.
After months and months of futilely arguing (as in your
current post) about what you did were not blunders or
errors, you're just continuing to wallow in your puddle of
ignorance, misrepresentation, and contined obfuscation.
>
>
> I should have mentioned this before, because today
> (January) Bob is mis-claiming again.
Just exactly what I mis-claimed?
>
> I don't test for normality. I don't recommend tests for
> normality. I didn't recommend *testing* normality to this OP.
You told the OP, after I told him that it was absolutely
unnecessary and a waste of time to check for normality of
the X's, that it was "not essential" to check the X's for
normality -- because of your own confusion about what
needs to be check.
Then you talked about checking the X's for outliers,
which as I explained above, AGAIN, that it is just as big
a BLUNDER of yours, as a result of your IGNORANCE
about why there's absolutely nothing you can benefit
or do about such outliers (if you found any) UNTIL you
have fitted some model and then analyse the RESIDUALS
of the distributional assumptions and the leverage or
influence of individual observations (singly or jointly
with other points) in undue influence on the fitted model.
> I do *like* normality, but that's a different matter. What
> I particularly like is to know that my variables are not crazy.
> I might use the word "check" in some context without
> meaning "use a test."
Whether a test is used for the "check" is entirely IRRELVANT.
That's how shallow your understanding of anything is. You
missed the essential LESSON behind my comments and
dwell on inconsequential words such as "check" or "test".
>
> When Bob doesn't quote, I'm never sure what comment he
> is relating to.
If you had quoted what I said to what YOU said (which were
your blunders), then you wouldn't have needed me to waste
my time quoting you.
>
> [snip]
>
> Bob >
> > 50% of the variance of WHAT? The independent variable X?
>
> I was alluding to the size-estimate from a very-simple test
> for Outlier.
Which was absolutely a waste of time and effort, as I repeatedly
stated -- that it's completely USELESS to check for outliers in X!
> "By What fraction is the SS of a variable reduced, by
> removing the most extreme value?"
This is completely OUT OF CONTEXT of what sehwail did. He
was checking (and you mis-advising him) on checking the X's,
before ANY fit to ANY model had been contemplated.
> --
> Rich Ulrich, wpi...@pitt.edu
> http://www.pitt.edu/~wpilib/index.html
A blunder, by any other name, is a blunder! Let the readers
read about your blunders in the archives, without your attempted
confuscation by excusing yourself.
I have explained to you WHY it was a blunder to check for
ANYTHING about the distribution of X, including outliers in it,
and you learned NOTHING from the my previous effort to
educate you.
I am repeating it now for any new readers who have joined
the discussion in the THREE groups (most of them hadn't
seen your blunders made in sci.stat.math) to see why you
were, and are, completely WRONG, and continue to be
oblivious to WHY you were wrong.
In so doing, you have just made it "beyond a shadow of a doubt"
that you were MALPRACTICING the regression theory and
methodology.
I seriously doubt 2006 will be a better year for Richard Ulrich,
in terms of ignorance and malpractice. The saying "you can't
teach an old dog new tricks" presumes you know some old
tricks (of doing something correctly). In your case, you NEVER
did anything in regression correctly, as had been thoroughly
proven in the archives of sci.stat.math in 2005.
-- Reef Fish Bob.
Please refrain from making such uninformed and gratuitous statements.
You should have made your statement to Richard Ulrich, and it would
have been perfectly appropriate.
My post was my reply to Richard Ulrich's FALSE CLAIMS about me:
RU> That sounds like Bob agrees with me, that it is okay to
RU> look for outliers. That was the tenor of my comments
RU> about checking the independent variables.
That was patently FALSE, and I documented why, citing from
posts from the archives.
RU> I should have mentioned this before, because today
RU> (January) Bob is mis-claiming again.
I also documented why that was a completely false statement about
my claim, by Richard Ulrich.
If you can understand the issues, and have ANYTHING to say
about my rebuttal, by all means present it.
Otherwise, you have only proven the pre-kindergarten level of the
readership in these STATISTICS newsgroups, about Statistics.
The reason it's cross-posted in the three groups is because Richard
Ulrich has been making his blunders, as well as his false statements
about what I posted, in all three groups. That is not to mention
since Richard Ulrich has been peddling his quackery and malpractice
in all three groups for many years, without anyone pointed out his
ERRORS.
Irresponsible posts by posters like yourself is what encouraged RU
to fabricatem, obfuscate, and provacate, in his attempts to sweep his
errors under the rug, or claim his blunders NOT to be blunders.
If you have ANY constructive to say about the subject, comment on
the substance of my post on the issues of checking for "normality"
and "ouliers" in the independent variables X in a regression problem.
If you can't find anything wrong on what I posted, then try to learn
the lesson of some COMMON ERRORS, usually committed by
beginning students in the practice of regression analysis, and not
fall prey on Richard Ulrich's malpractice in the future.
That's the best YOU can do. Your present post is completely
inappropriate in ANY newsgroup, because you are making
unsubstantiated allegations, void of any substance of the subject
matter.
For the record and for YOUR information, the case should have
rested (and did rest) since my last post on the subject on Dec 7, 2005,
until Richard Ulrich exhumed it on January 3, 2006.
He not only not let the matter rest, but REPEATED his same errors
while making his claims about what I have posted on the subject!
-- Reef Fish Bob,
> Richard Ulrich wrote:
> > On 7 Dec 2005 04:54:29 -0800, "Reef Fish"
> > <Large_Nass...@Yahoo.com> wrote:
[snip, some high levels of indenting]
> > > Correctly cited, but with a serious omission that the "absolutely
> > > UNNECESSARY" part referred to the "normality check of the
> > > INDEPENDENT variables Xs in a regression problem!
> >
> >
> > Okay. If that's a serious omission, then perhaps we
> > do not disagree. Bob just reads me wrong.
>
> Richard, give yourself a break in 2006.
>
> Stop repeating your errors and blunders while trying to make
> a case that "we do not disagree" and "Bob just reads me wrong".s
> I vehemently disagreed with what you did, which you re-stated
> again below as if I agreed with your nonsense in your malpractice!
I was trying to "cut him a break" by not-imputing to
Bob something that seems just too silly. But he wants
to claim it. This is the first time, I think, that he addressed
the specific issue of outliers and bad data. (If I forgot it
being elsewhere, I apologize.)
If he is totally unwilling to look at his dataset before
analysis, he is either working with pre-cleaned data,
or he is going to waste a lot of time in a lot of
real-world settings. And he still might get it wrong.
That's my view, based on my experience.
IF that remains a real difference between us, I'm
willing to take the side that doesn't trust their data,
as opposed to the side that insists that the independent
measures are sacrosanct inside their "black box", not
to be examined except by direction from regression
diagnostics. Bob goes further in that direction below,
and again in some of what I snip, far below,
> >
> > That sounds like Bob agrees with me, that it is okay to
> > look for outliers. That was the tenor of my comments
> > about checking the independent variables.
>
> Richard, when are you going to LEARN the most BASIC
> material in regression? You comment about checking the
> independent variables was PRECISELY the result of your
> muddle about the standard regression assumptions, what
> can be checked and what is ABSOLUTELY UNCESSARILY
> to check -- the independent variables X, for outliers or
> anything else other than blatent typo errors.
>
> If there is any outlier in X, it could be a point of ZERO
> influence (good or bad) on the fitted model because it
> could lie perfectly on the fitted regression model and its
> removal would not change the fit in any way!!! That
> is why it is a complete waste of time to check the
> independent variables for any PROBABILITY assumption
> (because there's none about the X's), nor for any of its
> values being an outlier in the distribution of that variable
> for the reason that until you have fitted SOME model,
> there is absolutely and positive NOTHING you can
> gain by checking on anything about the distribution of
> the data values in the X's
>
> You never understood why you were given sehwail the
> bum advice on checking first the "normality" of the X's and
I've posted elsewhere, giving in full my post of Nov. 7,
which says LOOK at the data. Normality is only one sort
of baseline; if that wasn't clear immediately, which I thought
it would have been, I made it clear later. The thread starts at
sehwail's post,
http://groups.google.com/group/sci.stat.math/msg/92c8b7453f1a4072
> then toned it down to checking for "outliers" when I had
> already pointed to sehwail (and YOU) that it was a complete
> waste of time to do so before any model had been fitted!
>
So. Bob seems to endorse "black box" acceptance of
whatever the world hands him as predictor variables,
until he has regression diagnostics in hand.
> BOTH of these are your blunders, Rich Ulrich, !!
I'll stand by claim, "misquotation" accounts for the first.
And if that's not enough, I'll restate that I *don't* insist
on normality for predictors. What is Bob trying to prove?
- a "blunder" in how I stated something? I've surely done
that in a number of places, though not, I think, here.
And I'll stand by my own Statement of the Issue, above, for
the second -- as a choice for the reader. If Bob accepts
the statement.
>
> What's left? NOTHING. You have managed in making
> blunders in every aspect of regression analysis -- from
> the unwarranted checking the independent variables, before
> any fit is attempted, to the misinterpretation of the SIGNS
> of the regression coefficients, to the use of regression
hm... Bob never replied to my recent challenge, to
distinguish my argument of SIGNS from the more recent
argument of Jerry Dallal. I consider Bob to be amply "refuted"
if the point is that I am doing something unusual.
If the reader wants to make a choice, it would be between
Bob's view, and "the practice of conscientious social
scientists and epidemiologists everywhere." Right, Bob?
Please, do make it clear that I'm the one who stands for the
good practice of the conventional view.
Then "Bob's errors" as a consultant and data analyst must
include (a) his refusal to look at variables to see that they
have an adequate distribution to support useful inference;
(b) his refusal to accept *any* inference based on observational
data, no matter how well supported by other data and other
arguments. To be petty, I suppose I could add on, Bob's recent
"blunder" in describing the meaning of "independent" in the
phrase "independent variable".
> results on uncontrolled observation data to draw causal
I've always been more aware than most people, about drawing
conclusions from uncontrolled observations. There are
dozens (hundreds?) of examples where I've warned people
about their designs -- as it happens, I did it again today
(about "cataracts").
> inference, to the use of correlations between Y and X to
> draw unwarrented causal inference.
I've never advocated "unwarranted causal inference."
Bob has seldom been willing to discuss what it is that might
warrant causal inference, except for giving two textbook
citations -- whereafter, each time, everyone disagreed with
Bob's reading of them.
(Bob reads them to the effect that it is never warranted.)
Curiously enough, Bob has cited one study of speed limits
where he liked the results, that the gas-saving limits on
Interstates did not save lives. But that was separate from
the question of drawing conclusions.
[snip, additional rehashing and repetition. This post
is already too long.]
> > Richard, when are you going to LEARN the most BASIC
> > material in regression? You comment about checking the
> > independent variables was PRECISELY the result of your
> > muddle about the standard regression assumptions, what
> > can be checked and what is ABSOLUTELY UNCESSARILY
> > to check -- the independent variables X, for outliers or
> > anything else other than blatent typo errors.
RU> I've posted elsewhere, giving in full my post of Nov. 7,
RU> which says LOOK at the data. Normality is only one sort
RU> of baseline; if that wasn't clear immediately, which I thought
RU> it would have been, I made it clear later.
Richard Ulrich's tedious rehash ot his blunder was explained in
my January 5 post: http://tinyurl.com/9bzvu
The essence of Ulrich's blunder can be summarized here:
======= excerpt
The independent variable X in a multiple regression CAN be anyone
of these:
1. An indicator variable with values 0 or 1.
2. A discrete uniform distribution of ranks.
3. A distribution that came from Cauchy or other long tail
distributions that would appear to have outliers (compared
to "normal")
4. The distribution of an observed X can be severely bimodal,
trimodal, left skewed or right skewed ... and it short any
distribution that has ever seen observed in the entire history
of statistical distributions that are NOR NORMAL, can be
the distribution of the X used in any multiple regression.
So, why was sehwail and Richard Ulrich want to check the "normality"
or "outliers" of the the data distributions in the INDEPENDENT
variables X?
========== end excerpt
If any and all of those X's are perfectly valid data for the
independent
variables in a regression problem, why would anyone except those
seriously muddled, like Richard Ulrich would make such NONSENSE
statements:
RU> Normality is only one sort
RU> of baseline; if that wasn't clear immediately, which I thought
RU> it would have been, I made it clear later.
"Normality" is NEVER a part of the baseline for the independent v
ariables X!
-- Reef Fish Bob.
DAH
But not necessarily correct, and often the wrong thing to do.
> A small post regression fit error may actually be an
> influential data point that should be rejected.
That would be a very naive approach. "Rejecting" or "Discarding" a
data
point without compelling justification (beyond that of "better fit") is
a
statistical crime, IMHO, and that of most statisticians I know.
I mentioned a example by Max Woodbury, in which the influential data
point that DIDN'T fit was the most important piece of data in the model
he was trying to fit to the data.
The "accommodation" of outliers (in the residuals) or influential
observations must be carefully considered on a case by case basis.
> Rousseeuw has done a lot of
> work in these areas. Discarding data based on the smallest determinant of
> the resulting covariance matrix is another preregression method.
If Rousseeuw did not give much more compelling SUBSTANTIVE
reasons for his discard of data, his article would never pass any of
the journals for which I have acted as referee for submitted articles.
> For example
> Rousseeuw's ROBCA (a form of PCA) can be used to identify a robust subspace
> (data points), I would not consider PCA as regression.
PCA regressions have never been justified. Hadi and I (1998)
reasoned that PCA is always the wrong thing to do for whatever
the given reasons are,
http://www.amstat.org/publications/tas/index.cfm?fuseaction=hadi1998
The main, and the only needrf, reason for NOT checking any of the
independent variables X for normality or outliers is the FACT that
in the standard (conditioned on X) regression setup, NOTHING is
assumed about the distribution of X -- that is, ANY distribution is
perfectly valid except for those Xs with known errors in data
recording.
-- Bob.
[much snipped]
> The main, and the only needrf, reason for NOT checking any of the
> independent variables X for normality or outliers is the FACT that
> in the standard (conditioned on X) regression setup, NOTHING is
> assumed about the distribution of X -- that is, ANY distribution is
> perfectly valid except for those Xs with known errors in data
> recording.
>
> -- Bob.
I would not normally post to this thread, but I wanted to recognize
that Reef Fish's statement above is correct and very succinctly put.
To follow-up on "Xs with known errors in data recording, plotting of
independent variables can be valuable for identifying Xs with
systematic error (eg. a set of lab values from one site that
are all out by an order of magnitude).
--
Kevin E. Thorpe
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
On 4 Jan 2006 22:01:50 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
>
> Richard Ulrich wrote:
>
>
> > > Richard, when are you going to LEARN the most BASIC
> > > material in regression? You comment about checking the
> > > independent variables was PRECISELY the result of your
> > > muddle about the standard regression assumptions, what
> > > can be checked and what is ABSOLUTELY UNCESSARILY
> > > to check -- the independent variables X, for outliers or
> > > anything else other than blatent typo errors.
>
> RU> I've posted elsewhere, giving in full my post of Nov. 7,
> RU> which says LOOK at the data. Normality is only one sort
> RU> of baseline; if that wasn't clear immediately, which I thought
> RU> it would have been, I made it clear later.
>
> Richard Ulrich's tedious rehash ot his blunder was explained in
>
> my January 5 post: http://tinyurl.com/9bzvu
>
> The essence of Ulrich's blunder can be summarized here:
I guess Bob is concentrating on trying to show that
I stated something ambiguously enough to be "wrong"
and a "blunder" because I surely agree that a predictor
variable can look like anything....
I will continue to think that my writing was clear enough
that only Bob would misread it, unless someone else
mentions it.
>
> ======= excerpt
> The independent variable X in a multiple regression CAN be anyone
> of these:
>
> 1. An indicator variable with values 0 or 1.
>
> 2. A discrete uniform distribution of ranks.
>
> 3. A distribution that came from Cauchy or other long tail
> distributions that would appear to have outliers (compared
> to "normal")
- I'd like to see a discussion of what "Cauchy" errors
can do to a correlation or regression. I don't remember
ever reading or talking about that, anywhere.
>
> 4. The distribution of an observed X can be severely bimodal,
> trimodal, left skewed or right skewed ... and it short any
> distribution that has ever seen observed in the entire history
> of statistical distributions that are NOR NORMAL, can be
> the distribution of the X used in any multiple regression.
>
> So, why was sehwail and Richard Ulrich want to check the "normality"
> or "outliers" of the the data distributions in the INDEPENDENT
> variables X?
> ========== end excerpt
Why check for outliers?
I did explain that, I think, in the post "Bob vs Normality."
http://groups.google.com/group/sci.stat.math/msg/12dfe550e4c3287e
Read the last two paragraphs of the post.
We want decent distributions if we intend to draw
inferences about "similar" sets of data. An extreme
outlier defies easy generalization.
Bob never responded to my argument, except by repeating
the statement that I do agree with. It is not "necessary."
With the hope of deriving or implying "meaning" from a
regression, we should care that the metric is meaningful,
and that the ranges of the variables are meaningful and useful.
Further, I suppose I should add, we should care about
the choice of variables to be used.
And sometimes we have to do the regression with variables
that we don't like. In that case, we use the knowledge of
their distributions to help interpret the residuals, etc.
For instance, a dichotomy as a strong predictor does leave
lumpy residual plots.
Using good variables is part of a process of applying good
"statistical control" to an analysis (for one thing), whenever
a strictly controlled, designed study is not possible.
If you don't care about the variables (what they measure,
and how well they measure it, and the problems observed
in the sample on hand), then you are stuck with Bob's further
position -- that you should never use any inferences derived
from the coefficients.
If you know that the variables are "bad" in some sense, you
also have to be shy about drawing conclusions.
[snip]